Apache Kafka

By definition, Kafka is a distributed streaming platform. In simple words, it sends data from one node/server to another in real-time. It uses a Pub-Sub mechanism to accomplish this. Here we have a publisher, which publishes the messages to the Kafka cluster and there is a subscriber which receives the messages from the Kafka cluster.

Figure 1. Streaming in Kafka

Components

Topics
Kafka uses topics to deliver messages from the producer to the consumer. A producer sends the messages to the topic and a consumer subscribes to the topic to receive those messages.

Partitions
A topic is divided into partitions. Using this concept the topic is scaled on to multiple brokers in the Kafka cluster. This helps in parallel consumption of messages by the consumer. One partition can have one consumer in a consumer group assigned to it.

Partition replication
The Partitions can be replicated on different broker nodes in the Kafka cluster. This helps in the failover scenarios so that the messages are not lost. The leader partition handles the reads and writes, whereas the replicas are the backups. If the leader dies, one of the replicas takes up the leadership role.

Records
Records are the ones that consist of the messages along with the metadata such as – timestamp, message key.

Offset
Offset is the position committed by the consumer on the partitions. Using this position the consumer knows which message should be read next. If the consumer needs to read any previous messages, then the offset position can be reconfigured to read them.

Producers
A producer is the server that generates the messages and sends those messages to the topic(i.e, partition in the Topic) in Kafka Cluster.

Brokers
Brokers are the servers that form a Kafka cluster. The messages/data are stored on the brokers. A Leader broker node in the cluster will act as a controller that manages the health of the cluster. This controller is responsible for adding or deleting the messages.

Consumers
The consumer is the end system that consumes the messages, by subscribing to the topic in the Kafka Cluster. An acknowledgment is sent by the consumer to the Kafka cluster after the message is read.

Consumer groups
A group of consumers with the same group id form a consumer group. The offset is stored for a consumer group, which applies to the consumers in the group. If there is only a single consumer in the group, then all the partitions in the topic are subscribed to the single consumer by default. If you add more consumers to the group, now the consumers share the partitions.

Zookeeper
Zookeeper is the one that maintains the brokers, topics, partitions, and replicas. This is responsible for coordinating the operations in a Kafka cluster. This plays an important role in letting the producers and consumers know about the cluster status, routing requests to partition leaders. It stores the last and current offset of the consumers.

Kafka source connector
It is used to connect external producers to send messages to the topics in the Kafka cluster. This is a component provided by Apache Kafka.

Kafka sink connector
It is used to connect external consumers to receive messages from the Kafka cluster. This is a component provided by Apache Kafka.

Having this basic understanding of the flow and the components in Kafka helps you decide on the broker count, partition count, replica count, consumer group count, and consumer count. Based on the message publish rate by the producer and consumer subscription rate we decide on these numbers. Having the right configuration we can achieve real-time streaming using Apache Kafka.

Experience the streaming 🙂

Published by Ritesh Kumar Reddy

I(Ritesh) work as a Sr. Cloud Engineer for a living. Learning new technologies has always been my hobby. Why not share it? Here is the brainchild – blogging to share the knowledge. This blog is for those who wish to start or already into the Cloud field. Each article briefly talks about a tool/technology that is used in the Cloud model. Once you read the article, I hope, you get a kick start regarding the specific tool/technology.

2 thoughts on “Apache Kafka

  1. I actually wanted to post a brief comment to be able to express gratitude to you for all of the fantastic suggestions you are posting at this website. My time-consuming internet investigation has now been rewarded with brilliant concept to share with my classmates and friends. I would express that we visitors actually are quite blessed to live in a fabulous network with so many marvellous people with valuable tips. I feel pretty blessed to have come across your website page and look forward to so many more awesome minutes reading here. Thanks again for all the details.

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: