I had an issue with RabbitMQ where I didn't know how my consumer was going to use the data that I was writing to a queue yet (from a producer that was listening on a SocketIO or WebSockets stream), and I was kind of just going to figure it out in an hour or something.
Eventually, my buffer ran out of memory and I couldn't write anything else to it, and it was dropping lots of messages. I was bummed. Is there a way to avoid this in Kafka?
RabbitMQ is the most widely used open source implementation of the AMQP protocol. It is slower but can support complex routing scenarios internally and handle situations were at-least-once-delivery guarantees are important. RabbitMQ supports on-disk persistent queues, which you can tune if you like. Compared to Kafka, RabbitMQ is slow in terms of volume that can be managed per queue.
Kafka is fast because it is horizontally scalable and you have parallel producers and consumers per topic. You can tune the speed and move needle where you need between consistency and availability. However, if you want things like at-least-once-delivery and such, you'll have to use the building blocks kafka gives you, but ultimately you'll have to handle this on the application side.
Regarding storage, by default kafka stores data for 7 days. IIRC the NY Times stores articles from 1970 onwards on kafka clusters. The storage is horizontally scalable and durable. This is a common use case. As many have pointed out, the cluster setup depends highly on you needs. We store data for 7 days in kafka as well and it's in the order of 500GB or more per node.
Looks like you have a configuration issue. You can configure rabbitMQ to store queues on the hard disk and with a quick calculation you can make sure you have enough space for 10 or 150 hours of data. I don't see any reason to switch to kafka, a different tool with different characteristics, just because you need more storage.
Eventually, my buffer ran out of memory and I couldn't write anything else to it, and it was dropping lots of messages. I was bummed. Is there a way to avoid this in Kafka?