Tiered storage won't save Kafka but we built the exact same thing while thinking...

zbentley · 2024-05-06T16:05:35 1715011535

My understanding of WarpStream's approach is that it's fundamentally different from tiering: at a basic level, it's a system that accumulates and persists (to S3) batches of messages before sending produce-acknowledged responses to clients. This means that there's no "tier" of storage before S3.

On top of that, the WarpStream folks have layered extensive mitigations for the worst-case latency costs of the producer-blocking batch-and-ship approach, as well as a fairly sophisticated system to make consumers consistently quick via online continuous storage rewriting, prefetching, and data movement between broker nodes (if you squint at it, this system looks a lot like the familiar index+buffer cache design).

Unaffiliated, just an expert in the space who likes reading WarpStream's blog.