Which Compression Type You Should Choose?
So as we have discussed above there are mainly four different kinds of compressions available in Kafka, gzip, snappy, lz4, and zstd. It is recommended to use snappy or lz4 because both have the same optimal speed or compression ratio. On the other hand, Gzip is going to have the highest compression ratio, but it’s not very fast. So choose, and test, it’s super simple. You just change one setting and everything works. There’s not one algorithm that works for everyone, so you just try them based on the kind of plan that you have and see the one that works best for you. And finally. it is highly recommended that always use compression in production, especially if you have a high throughput.
Advantages of snappy over other message compressions:
- snappy is very useful if your messages are text-based, for example, JSON documents or logs
- snappy has a good balance of compression ratio or CPU.
Apache Kafka – Message Compression
Kafka Producers are going to write data to topics and topics are made of partitions. Now the producers in Kafka will automatically know to which broker and partition to write based on your message and in case there is a Kafka broker failure in your cluster the producers will automatically recover from it which makes Kafka resilient and which makes Kafka so good and used today. So if we look at a diagram to have the data in our topic partitions we’re going to have a producer on the left-hand side sending data into each of the partitions of our topics.
So here is another setting that’s so important which is Message Compression. Before that let’s understand the Kafka Message Anatomy first.
Contact Us