Sure. I have heard this now so many time here. To me however Kafka the software ...

jpgvm · on Oct 12, 2020

I think Kafka inherits too much of this "heinously complex" reputation from Zookeeper (which itself is also not heinously complex, just some people have an allergy to either JVM, ZAB or both).

Internally it's one of the simpler pieces of software I rely on. It's base API is small and effective and works "as you would expect". Replication uses the same fetch API consumers use, because why wouldn't it? Controller election uses tried and true ZK patterns, it's log storage is pretty simple, even the compaction logic is understandable.

To be fair I have spent a lot of time with it, patched it and it's pretty much the core of my toolbox but I don't think this is a controversial opinion among data infrastructure engineers. Compared to other software we work with Kafka is some of the simpler, dumber stuff which is refreshing.

On the other hand I really want to learn Pulsar but it definitely is more complex, my hope is that complexity pays off with big architectural advantages but we will see. :)

On the topic of what to use it for chances are if you don't know what it's for you probably don't have the problems it's meant to solve. I don't mean that in a derogatory way just that it's designed for large distributed architectures where many interested parties want to consume the same data. Or where a small number of very high throughput applications need a buffered transport that can take the load and spill to disk etc. I.e it is inherently niche, most companies don't have these problems.

prerok · on Oct 12, 2020

We came for the gems but all we got is shards :)

nullsense · on Oct 12, 2020

It's named Kafka because LinkedIn's data access story was Kafkaesque and they wanted a way to wrangle it, so they wrote a tool to do that and named it Kafka.

It's good if you have a use for it.