Big Data - Apache Kafka

Back to Course

Lesson Description

Lession - #451 Kafka cluster Architecture

Kafka stores key-value messages that come from randomly many cycles called makers. The information can be apportioned into various "allotments" inside various "points". Inside a parcel, messages are completely requested by their balances (the place of a message inside a segment>
, and ordered and put away along with a timestamp. Different cycles called "buyers" can peruse messages from parcels. For stream handling, Kafka offers the Streams API that permits composing Java applications that consume information from Kafka and compose results back to Kafka. Apache Kafka likewise works with outer stream handling frameworks like Apache Apex, Apache Flink, Apache Spark, Apache Storm and Apache NiFi.

Kafka runs on a bunch of at least one servers (called agents>
, and the parcels of all subjects are conveyed across the group hubs. Also, parcels are recreated to different representatives. This engineering permits Kafka to convey enormous floods of messages in a shortcoming lenient style and has permitted it to supplant a portion of the customary informing frameworks like Java Message Service (JMS>
, Advanced Message Queuing Protocol (AMQP>
, and so forth Since the delivery, Kafka offers value-based composes, which give precisely exactly once stream handling utilizing the Streams API.

Kafka supports two sorts of subjects: Regular and compacted. Normal points can be designed with a maintenance time or a space bound. Assuming there are records that are more established than the predetermined maintenance time or on the other hand assuming that the space headed is surpassed for a parcel, Kafka is permitted to erase old information to free extra room. Naturally, themes are arranged with a maintenance season of 7 days, but at the same time it's feasible to store information endlessly. For compacted subjects, records don't lapse in light of time or space limits. All things being equal, Kafka regards later messages as updates to more seasoned message with similar key and assurances never to erase the most recent message per key. Clients can erase messages altogether by composing a supposed headstone message with invalid incentive for a particular key.

There are five significant APIs in Kafka:

Producer API - Permits an application to distribute surges of records.

Customer API - Permits an application to buy into subjects and cycles surges of records.

Connector API - Executes the reusable producer and purchaser APIs that can interface the points to the current applications.

Streams API - This API changes the information streams over to result and delivers the outcome.

Admin API - used to oversee Kafka topics, agents and other Kafka objects.
Kafka's topics are divided into several partitions. While the topic is a logical concept in Kafka, a partition is the smallest storage unit that holds a subset of records owned by a topic .

The buyer and maker APIs expand on top of the Kafka informing convention and deal a reference execution for Kafka shopper and maker clients in Java. The hidden informing convention is a parallel convention that designers can use to compose their own shopper or maker clients in any programming language. This opens Kafka from the Java Virtual Machine (JVM>

Architecture design

What is log compaction in Kafka?
To eliminate such complexities and make Kafka servers loaded with appropriate data, Kafka producers follow a log compaction process that selectively removes old messages whenever the topic partition is updated with the latest record under the same key.

A message format in kafka is a key-value pair with a small amount of associated metadata. A message set is just a sequence of messages with offset and size information.

Connect Kafka to S3: 6 Easy Steps -