Kafka Basics
On very basic terms Apache Kafka is a distributed messaging system.
By a very basic bird eye's viewApache kafka is a platform where you can push messages by your applications and then retrieve those messages from your application.
Basic Terminologies :
1. Kafka Cluster: In apache kafka we can use multiple servers where kafka is running. We can add and remove these servers as per as our requirement. A cluster denotes all of these servers which work together.
2. Kafka Brokers:- The individual instances of apache kafka are called brokers.
3. Topic:- Every message which is published to apache kafka goes into a designated topic. Topics are required to categorize the data. For example:- if you want to push a message whenever a user logs in your website that message will go in a topic, lets say named as logins and if you want to push a message whenever that users is logging out you will push a message in a separate topic , lets name it logouts. Topics help to separate out different sets of data in a single kafka cluster. You can corelate topics in kafka as a table in rdbms. Multiple applications can retrieve the data from a single topic . I will explain about this consumer mechanism later.
4. Partitions:- In each topic data is divided into multiple partitions.Partitions help us in scaling our kafka application. Kafka maintains offsets for all the consumers(application which retrieves data from kafka ) listening from a partition. That offset denotes how many messages a consumer has consumed till now. Kafka guarantees message orders in a partition i.e., in a single partition messages are retreived in the order that they are pushed. Across partition there can be order mismatch of messages.
5. Kafka Producers:- Kafka producers pushes or publishes messages to kafka topics. A single message can be published to multiple topics. For every message the producer can define a routing key , routing key decides which partition of the topic a message should be published.
6. Kafka Consumers:- Kafka consumers poll data from a topic . A single consumer can only consume messages from a single topic. For scaling of consumption, within a single topic, kafka provides a consumer group mechanism. Every consumer belongs to a consumer group. So if you run multiple instances of consumers belonging to the same consumer group , these consumer consume messages from different partitions hence providing parallelism within a topic.
Note: Only one consumer per consumer group can consume messages from a single partition .Hence no of consumers should be <= no of partitions of the topic.
Bird's Eye View Of Kafka Applications
Hope this article has given you basic ideas about what apache kafka is . Thank you and feel free to ask in case of any doubt.
By a very basic bird eye's viewApache kafka is a platform where you can push messages by your applications and then retrieve those messages from your application.
Basic Terminologies :
1. Kafka Cluster: In apache kafka we can use multiple servers where kafka is running. We can add and remove these servers as per as our requirement. A cluster denotes all of these servers which work together.
2. Kafka Brokers:- The individual instances of apache kafka are called brokers.
3. Topic:- Every message which is published to apache kafka goes into a designated topic. Topics are required to categorize the data. For example:- if you want to push a message whenever a user logs in your website that message will go in a topic, lets say named as logins and if you want to push a message whenever that users is logging out you will push a message in a separate topic , lets name it logouts. Topics help to separate out different sets of data in a single kafka cluster. You can corelate topics in kafka as a table in rdbms. Multiple applications can retrieve the data from a single topic . I will explain about this consumer mechanism later.
4. Partitions:- In each topic data is divided into multiple partitions.Partitions help us in scaling our kafka application. Kafka maintains offsets for all the consumers(application which retrieves data from kafka ) listening from a partition. That offset denotes how many messages a consumer has consumed till now. Kafka guarantees message orders in a partition i.e., in a single partition messages are retreived in the order that they are pushed. Across partition there can be order mismatch of messages.
5. Kafka Producers:- Kafka producers pushes or publishes messages to kafka topics. A single message can be published to multiple topics. For every message the producer can define a routing key , routing key decides which partition of the topic a message should be published.
6. Kafka Consumers:- Kafka consumers poll data from a topic . A single consumer can only consume messages from a single topic. For scaling of consumption, within a single topic, kafka provides a consumer group mechanism. Every consumer belongs to a consumer group. So if you run multiple instances of consumers belonging to the same consumer group , these consumer consume messages from different partitions hence providing parallelism within a topic.
Note: Only one consumer per consumer group can consume messages from a single partition .Hence no of consumers should be <= no of partitions of the topic.
Bird's Eye View Of Kafka Applications
Hope this article has given you basic ideas about what apache kafka is . Thank you and feel free to ask in case of any doubt.
Comments
Post a Comment