失效链接处理 |
Kafka the Definitive Guide 2nd PDF 下载
本站整理下载:
相关截图:
![]()
主要内容:
Messa?es and Batches
The unit of data within Kafka is called a message. If you are
approaching Kafka from a database background, you can think
of this as similar to a ro? or a record. A message is simply an
array of bytes as far as Kafka is concerned, so the data
contained within it does not have a specific format or meaning
to Kafka. A message can have an optional bit of metadata,
which is referred to as a key. The key is also a byte array and, as
with the message, has no specific meaning to Kafka. Keys are
used when messages are to be written to partitions in a more
controlled manner. The simplest such scheme is to generate a
consistent hash of the key, and then select the partition number
for that message by taking the result of the hash modulo, the
total number of partitions in the topic. This assures that
messages with the same key are always written to the same
partition.
For efficiency, messages are written into Kafka in batches. A
batch is just a collection of messages, all of which are being
produced to the same topic and partition. An individual
roundtrip across the network for each message would result in
excessive overhead, and collecting messages together into a
batch reduces this. Of course, this is a tradeoff between latency
and throughput: the larger the batches, the more messages that
can be handled per unit of time, but the longer it takes an
individual message to propagate. Batches are also typically
compressed, providing more efficient data transfer and storage
at the cost of some processing power. Both keys and batches
are discussed in more detail in Chapter 4.
Sche?as
While messages are opaque byte arrays to Kafka itself, it is
recommended that additional structure, or schema, be imposed
on the message content so that it can be easily understood.
There are many options available for message schema,
depending on your application’s individual needs. Simplistic
systems, such as Javascript Object Notation (JSON) and
Extensible Markup Language (?ML), are easy to use and
human-readable. However, they lack features such as robust
type handling and compatibility between schema versions.
Many Kafka developers favor the use of Apache Avro, which is
a serialization framework originally developed for Hadoop.
Avro provides a compact serialization format; schemas that are
separate from the message payloads and that do not require
code to be generated when they change; and strong data typing
and schema evolution, with both backward and forward
compatibility.
|