Hacker News new | past | comments | ask | show | jobs | submit login

We run 1.5 million messages per minute through our NSQ framework and we're starting to run into architectural limitations i.r.t to the number of workers in each producer/consumer pool, and are now testing/benchmarking Kafka in our staging environment.



I think the biggest question is, do you think its feasible to directly start with Kafka instead of NSQ or does Kafka just require a much stronger/larger team to operate than NSQ?


We debated the same question and went with NSQ for now. We might need some of the guarantees that Kafka makes in the longterm or for some specific use cases, but for a no thrills distributed messaging platform that is incredibly simple to operate at scale, NSQ is pretty fantastic.

Building client libraries is also a joy. I blogged about it a bit here: https://product.reverb.com/how-to-write-an-nsq-consumer-in-g...


Nice one! Just as a info, I wrapped producer/consumer in a pkg https://github.com/rafaeljesus/nsq-event-bus also it exposes request-reply/rpc like


Kafka and NSQ have widely variant promises around things like durability, order, etc.

In most use cases you can get NSQ like behavior out of Kafka and the inverse isn't true. Kafka's performance and added gaurantees come at the expense of being harder to operate.


Can someone summarize the promises? Specifically, would NSQ work well as an easier-to-operate, Kafka alternative, or are there low-throughput use cases it's just not suitable for?


People tend to talk about Kafka in the same conversations as messaging systems because it can support messaging use cases, but that leads to a mental model of what Kafka is (and conversely what messaging systems tend to be) that is incorrect.

Its better to think of Kafka as a distributed log service than a messaging broker. When described this way, don't think log as in "the things humans look at to debug applications coming out of stdout" but "the things machines look at as a storage data structure". Think the write ahead log in a database and not printf statements.

What this means is that under the covers Kafka is a bunch of ordered files being written to by producers. Consumers can specify where in the log they want to start consuming from and then "tail" the log once they are caught up. The architecture is also such that consumers are very light weight (a tcp/ip connection and an offset in the log). This also makes trivial things like the "late joiner" problem in messaging systems and durability. Kafka then layers on high availability and consistency configurations that mean that you can be sure that your published log entries are 1) stored on multiple machines and 2) ordered the same for everyone. The combination of those 2 things is very powerful and makes reasoning about distributed systems application data much simpler. There are also certain classes of problems that need to be solved with those promises, namely things that are not idempotent.

NSQ is a much more traditional buffered messaging system. It has file durability but only as a) an optimization to prevent message loss once memory runs out and b) as a consumer archive. But a hard loss of a node means that those messages that have not been delivered can be lost as there is no promise they are published somewhere else. Further, there is no promise that the order of messages published to a topic and channel is the order of messages received by the consumer. Dealing with late joiners is an application level concern as is archive and replication.

That said, Kafka is complicated. I think its complicated because its solving a complicated problem not because its poorly factored (though I'd love it if they built the consensus stuff in directly and removed the zookeeper dependency).

NSQ isn't complicated and is easy to operate. If your problem set falls into a more traditional messaging domain that fits the NSQ model, you are almost certainly better off with it, but you can likely use Kafka also. If your problem set falls into the write ahead log model, you can't use NSQ (without massive application level logic) but you can use Kafka.


That's a great comparison, thank you. Basically, the "log structure vs in-memory queue" distinction at the start crystallizes the differences, thanks for the reply.


NSQ has been (IMO) far far easier to operate than Kafka has been. With Kafka you need a Zookeeper cluster in addition to your Kafka brokers. Not to mention developing against NSQ is pretty simple whereas Kafka you need to think about partitions and offsets.

If you're worried about data loss, Kakfa can be what you're looking for (but takes a lot to learn how to operate it correctly).


I see, thank you. Honestly, NSQ sounds like a really good solution for 99% of use cases.


I think it's less about throughput and more about durability guarantees. Kafka producers can specify the number of 'acks' (brokers which have written the message to disk) when they produce a message and the request will only return successfully if that can be done.


I see, thank you. Can't NSQ do that, or at least get probabilistically close to it with multiple nodes (i.e. doesn't having more nodes reduce the probability of data loss)?


Absolutely, but you can't rely on it to be persistent like Kafka. We use it extensively and are incredibly happy with it, but we follow best practices around not putting state in messages, making changes idempotent, and ensuring that we can always replay a message if needed.

We've yet to lose any messages in production, but it could happen and we're okay with the tradeoffs between that and the operational complexity of kafka.


Yeah, at-least-once is definitely the way to go, sounds like you've got a good architecture around that.

Another nice thing I've heard with Kafka is that it can store all messages since the beginning of time (if you want) and you can replay them to retrieve all your MQ-related state. Does NSQ do that, do you know?


NSQ does nothing like this out of the box. You could create a channel for each topic that accumulates messages, but those messages would not be distributed across the cluster and after consuming them once they would be effectively gone. If that feature is a hard requirement NSQ wouldn't fit your use case.


I see, thank you. I'm a bit confused by "after consuming them once they would be effectively gone". Surely NSQ supports more than one subscribers, and won't remove messages after they've been delivered to a single client?


It supports more than one subscriber as long as you have a separate channel for each. But if the channel gets created after some messages have been delivered, they will not be delivered to the new channel, only the messages after that.

To do replay with nsq (one way is) you'd attach their file consumer to its own channel as the first operation on a topic. Then in application code you'd need to read from that file whenever you need to do a replay (and you'd need to know when/where to read from/to).


That makes sense, thank you.


With NSQ there is a built in utility nsq_to_file which just becomes one additional consumer you'd use to archive each message topic to disk. It provides dead simple archiving of messages, but doesn't provide any native replay ability.


If you are starting off just use what you/the team is comfortable with. I mean, unless you think you will need capacity for over 25,000 messages per second.


25k messages total, or per second?


Thanks. Fixed.


Kafka is much harder to operate in a production environment, I would only start there if you have a specific reason to.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: