Log Compaction in Apache Kafka
In my previous article Persistance of messages in Apache Kafka , I explained about clean up policies of messages. In this artice, we will see in details the "compact" clean up policy of Apache Kafka.
What is Log Compaction:- We know that kafka topics keeps messages in a key value semantics,i.e, every message has a key and a value.In the process of log compaction for every unique key the message with the latest offset is maintained .
Lets take an example to make it more clear. Lets say we have the following data set which is eligible for compaction .
After Compaction it will look like the following:-
So , in a way it ensures that for every key Apache kafka knows the most updated value. This is very helpful in scenarios such as system crashes or when your kafka broker goes down.
There are some important conclusions that can be taken out by this example:-
1. Order of messages is always retained.
2. Offset is never lost .
Deletion of messages in compaction policy:-
A message with a null payload in value field is considered as a delete marker , so all the previous messages with that key will also be deleted from the topic. The delete markers are themselves cleared out after a specific time to free up space.
We can set a property min.compaction.lag.ms to ensure the minimum time a message should stay in a topic before being considered for compaction.
Note:- retention.ms is the max time or the upper limit on time a message can stay in topic and
min.compaction.lag.ms is the min time or the lower limit on time a message can stay in topic.
I hope this article gives a good clarity on the concepts of log compaction in apache kafka .
Feel free to comment in case of any doubts.
What is Log Compaction:- We know that kafka topics keeps messages in a key value semantics,i.e, every message has a key and a value.In the process of log compaction for every unique key the message with the latest offset is maintained .
Lets take an example to make it more clear. Lets say we have the following data set which is eligible for compaction .
After Compaction it will look like the following:-
So , in a way it ensures that for every key Apache kafka knows the most updated value. This is very helpful in scenarios such as system crashes or when your kafka broker goes down.
There are some important conclusions that can be taken out by this example:-
1. Order of messages is always retained.
2. Offset is never lost .
Deletion of messages in compaction policy:-
A message with a null payload in value field is considered as a delete marker , so all the previous messages with that key will also be deleted from the topic. The delete markers are themselves cleared out after a specific time to free up space.
We can set a property min.compaction.lag.ms to ensure the minimum time a message should stay in a topic before being considered for compaction.
Note:- retention.ms is the max time or the upper limit on time a message can stay in topic and
min.compaction.lag.ms is the min time or the lower limit on time a message can stay in topic.
I hope this article gives a good clarity on the concepts of log compaction in apache kafka .
Feel free to comment in case of any doubts.
Comments
Post a Comment