Persistance Of Messages In Apache Kafka

Before using Apache Kafka as a message streaming platform for mission critical applications who can't afford to lose a single message , a developer must know what does apache kafka provides in terms of persistence of the messages. If you are new to apache kafka you can go through basics of kafka in the article Kafka Basics.


                                           


In this article we will try to go through following key concepts:

1. How Kafka manages messages.

2. How to manage clean up of old messages in kafka.

3. How to store a message persistently in kafka .




1. How Kafka manages messages

We know that every message is produced to a designated topic. In a topic, each message is pushed in a partition according to the routing key of that message. For every message there is an unique id which is generated at the time the message is written which is called offset. 

This offset is very important as this is how kafka knows about the sequence of the messages, and the next message which is to be  delivered to a consumer. 

2. How to manage clean up of old messages in kafka.

We can configure clean up policy of apache kafka at topic level. This is done by a property named
cleanup.policy. 
This property defines what happens to messages when cleanup is invoked.There are two cleanup policies : 
1. DELETE:- if you set cleanup.policy= delete, the old logs will be completely deleted from the topic when clean up is invoked.

2.COMPACT:- if you set cleanup.policy= compact, only the most latest message is kept per unique key in the topic for the old logs. Read this article for details of Log Compaction.

For example
  

So what should you use as a clean up policy??

There is no perfect choice for this, DELETE will give your machines more space and most of the times having DELETE as a clean up policy works brilliantly for most of the application. But COMPACT gives us the flexibility to restore to a stable state if the cluster crashes by having the most recent value per key till the cluster was up.




3.How to store a message persistently in kafka :-

Apache Kafka provides two configurable properties to retain messages. These properties are provided at per topic level. 

1.retention.bytes:- this property provides the access to developer to put an upper bound on the size a topic can grow.  Once the topic has crossed this threshold apache kafka starts cleaning up the old data based on cleaning policy of that topic. By default there is no upper limit on the size of kafka topics . So if you want to keep messages permanently or delete very old messages this is one way of configuring apache kafka to do that.


2. retention.ms:-   this property provides the access to developer to put an upper bound on the time a message will stay in a topic before being cleaned according to your clean up policy. This limit must be communicated to the applications consuming the data so that they consume a message within this time interval to save any data loss. By default this time is set to 7 days. If you want to keep the messages permanently keep the value to -1.So if you want to keep messages permanently or delete  old messages this is another way of configuring apache kafka to do that. 



Feel free to ask in case of any doubts. Thank you.








Comments

Popular Posts