NSQ: Realtime distributed message processing at scale (in Go)

jallmann · on Oct 9, 2012

Was something like MQTT investigated prior to re-inventing the wheel with NSQ? From a cursory examination, I don't see any major advantages other than nice UI tools.

The wire protocol seems more complicated -- why distinct protocols for producers and consumers? Linebreak-delimited headers are error-prone and ought to be banished. Why transmit the hostname with sub requests -- is another system requesting jobs on behalf of workers?

Then the topic/channel distinction seems artificial when wildcards would suffice (and provide much more flexibility), eg topic/*/channels or topic/channels/# in the MQTT parlance. MQTT also has more fine-grained delivery guarantees via its QoS levels. All this with a header structure that's as small as two bytes.

edit: While MQTT is pubsub, as long as the system is under your control, you're free to change the semantics at the broker side from "broadcast messages to all consumers" to "rotate messages among consumers."

matticakes · on Oct 9, 2012

one of the developers here...

The protocols that exist in NSQ now are designed to be the simplest implementation that worked.

You've correctly pointed out some of the issues. At this stage, the distinction of producer vs consumer was mostly so that you could publish at all without having to use the HTTP interface. For our use cases, in particular taking advantage of the /mput endpoint, we aren't even using the TCP based publishing protocol.

Re: your point on sending metadata with SUB commands. I agree its a bit ugly. We actually intend on improving that aspect by instead sending the data in the form of an IDENTIFY type command upon initial connection. That information is used in the various administrative UIs and endpoints.

I'm going to do a bit more reading on MQTT, thanks for the link.

old_sound · on Oct 9, 2012

I'm really curious to know why would you build this tool instead of using, say, RabbitMQ?

DISCLAIMER: I wrote the RabbitMQ in Action book.

matticakes · on Oct 9, 2012

We looked into AMQP based solutions. Our understanding is that slaving, master-slave failover, or other strategies are used to mitigate the fact that there is a broker responsible for a given stream of messages.

We wanted a brokerless, decentralized, and distributed system. NSQ has no broker or SPOF so it is able to route around failures.

That said, I think RabbitMQ is a good tool depending on your requirements. I can imagine a broker proving useful in situations where you may want strict ordering or no-duplication. Those were tradeoffs we were willing to make for fault tolerance and availability.

Also, given the fact that we were already operating infrastructure built on simplequeue (which is also distributed and brokerless) we found it more appealing to evolve rather than replace.

jallmann · on Oct 9, 2012

A broker is an implementation detail; there's nothing stopping you from going peer-to-peer with $protocol and having publishers manage their own subscribers. Some sort of registry is still needed though; nslookupd in your case.

There is a bit of an impedance mismatch since now you have to go out-of-band to fetch information. Not saying it's a bad thing; that's essentially also the role of DNS and LDAP, among others, so it is a fairly common pattern.

rabbitmq · on Oct 9, 2012

Master-slave etc are HA techniques for replicating state eg queues (or "channels" as nsq calls them?). "Decentralized and distributed" has nothing to do with state, master/slave or any of that; it's a topological property of your system. You can certainly build a decentralized and distributed system using a set of brokers, regardless of whether they support HA.

matticakes · on Oct 9, 2012

That's fair, you certainly can use rabbitmq in a distributed and decentralized fashion.

You bring up an important point though... What is important is the topology we're promoting and the platform we've built to support it. The actual queue is less important and it made sense for us to own that piece to achieve our goals.

rabbitmq · on Oct 10, 2012

So, why not just deliver that topology using off the shelf components? Rabbit, Kafka, ZeroMQ, ... the toolbox is deep. This is not meant as a criticism, I am simply trying to understand what I am missing when I look at your design :-)

retrovirus · on Oct 9, 2012

Yes, it would be nice if someone from the team behind NSQ elaborates a bit on why they didn't go with an existing protocol like MQTT.

rabbitmq · on Oct 10, 2012

shameless rabbitmq mqtt plug here :-) http://www.rabbitmq.com/blog/2012/09/12/mqtt-adapter/

matan_a · on Oct 9, 2012

We're currently using Apache Kafka which is working out great, but does include a broker as a middleman. This is good for us since it's designed to accumulate unconsumed data (or even roll it back) and still provide good performance - rather than having that happen on the producers or consumers.

It would be cool to see some stats on throughput and latency relative to # of producers / consumers and amount of data currently accumulated in the producers (since there is no middleman).

bgentry · on Oct 9, 2012

This ensures that the only edge case that would result in message loss is an unclean shutdown of an nsqd process. In that case, any messages that were in memory (or any buffered writes not flushed to disk) would be lost.

I don't understand how you can call it a message delivery "guarantee" when you're susceptible to losing messages when a node dies.

One solution is to stand up redundant nsqd pairs (on separate hosts) that receive copies of the same portion of messages.

OK, delivery is only guaranteed if I run multiple independent sets of NSQd and write messages to both.

Regardless, it looks like an interesting project.

jlgreco · on Oct 9, 2012

How would you design a system that didn't have this problem at the individual node level?

philjr · on Oct 9, 2012

A similar design to TCP/IP - sequence numbers, acknowledgements & resending capability on sender side

SnorkelTan · on Oct 9, 2012

Sequence numbers on messages sent by producers coupled with some sort of persistence by the sender. This way when a gap is detected some sort of resend protocol can be implemented to achieve the delivery guarantee.

jlgreco · on Oct 9, 2012

So the sender just keeps trying until it gets an ACK. Fair enough, though I imagine you'd want to run multiple nodes even with this.

SnorkelTan · on Oct 9, 2012

Correct. Reliability and Availability are two separate but related goals.

onetwothreefour · on Oct 9, 2012

This is a solved problem in most messaging protocols.

jordanlewis · on Oct 9, 2012

It's kind of like Storm. I'd love to see a comparison written by the authors.

matticakes · on Oct 9, 2012

Agreed, a comparison would be helpful for highlighting the tradeoffs behind the choices we made.

We do talk about the evolution of our infrastructure and the genesis of NSQ in our blog post, http://word.bitly.com/post/33232969144/nsq.

wheaties · on Oct 9, 2012

I love seeing how people tackle these sorts of problems. Here, the trade-off is duplicate messages reaching a client. I know others have tried a no-"ack" approach. I'm still working in the land of RabbitMQ solving all my business needs and I like it.

I'd love to see some numbers, reasons for decisions made, and suggested best practices for this solution other than "manual de-dupe."

jehiah · on Oct 9, 2012

Some of the background behind our decisions are on our blog post http://word.bitly.com/post/33232969144/nsq

In terms of manual de-duping, we strive internally for idempotent message processing so it's fairly irrelevant, but to handle cases where it matters, all of our messages have unique id's added to them outside of NSQ.

The actual cases where messages are handled multiple times is limited to when a client disappeared during message processing (a hard restart) or it passed the allowable time window to respond and was given to another client.

If there are any specific numbers you are curious about, please ask.

old_sound · on Oct 10, 2012

When people say "billions of messages per day" do they realize that 1 billion messages =~ 11500 msgs/sec ? RabbitMQ as well as any self respecting message broker is able to handle that on a single machine

jasonmoo · on Oct 9, 2012

Hey it's awesome to see you guys tackling a larger project in Go. I've been writing some smaller services/tools with it in my day job, and I like it a lot.

Thanks for sharing your work.

dikbrouwer · on Oct 9, 2012

How does this compare to ZeroMQ?

jeremyjh · on Oct 9, 2012

ZeroMQ is a library, it is embedded in the application code and can support a peer-to-peer messaging topology. It also has support for Broker patterns but you have to develop the Broker application yourself; you could build a solution such as this one using it, but the zmq library itself really handles lower-level concerns with a high degree of flexibility but no out-of-the-box functionality. zmq can be thought of as something as simple as an improved socket library but with support for async, storing & high-water marks when there are no attached consumers, various distribution strategies, queues and lots more.

dikbrouwer · on Oct 9, 2012

Right, I guess my question more specifically is: ZeroMQ allows you to built applications like this in (seemingly) just a few lines of code. Did you guys consider building it on top of ZeroMQ, and if so, what made your decide to not use it? Nothing against what you've built btw, it looks great, but am just curious

matticakes · on Oct 10, 2012

The truth is we wanted to use ZeroMQ initially. It would have been really awesome to have all that flexibility on the client side.

The ZeroMQ documentation is fantastic as well. It inspired and helped shape some of the design choices we made.

The further along we got from design to implementation it became obvious that it would be important to "own" the socket. Generally speaking, this is exactly what ZeroMQ prevents you from doing (and rightly so, it aims to abstract all of that away).

The choice to use Go had an impact here as well. Language features like channels and the breadth of the standard library made it really easy to translate our NSQ design into working code, offsetting the benefit of ZeroMQ's abstractions.

dikbrouwer · on Oct 10, 2012

Makes a lot of sense, thanks for sharing!

noselasd · on Oct 10, 2012

So - this has nothing to do with NSQ - just my $0.02. Maybe zeromq has changed a lot since I looked at it quite a while back. The trouble we constantly ran into was zeromq is really just a fancy socket library - this means that you have to program all the logic yourself if you want certain things to happen if messages cannot be delivered (e.g. log the fact at a minimum), or get queued abnormally long, or get acks from the other side etc. Which is fine - you have to do all this yourself with plain sockets unless you just need a fire & forget system(albeit with the smarts of retrying and failover and some convenience of pub-sub). However we ran into problems where we could not get the info out of zmq that we wanted, such as the fact a peer had failed, the current rtt, and other things - leaving the gain we'd get from zmq at practically nothing.

overbroad · on Oct 9, 2012

I'm too lazy to install Go just to try this (I would need to, right?) but reading through the docs this looks far more appealing to me than tent.io. Nice work, at least with respect to the general design. Flexible.

jehiah · on Oct 10, 2012

Thanks for the reminder. Getting binary builds up is on our to-do list, so check back soon for those.

In the mean time, installing Go is pretty easy (you can use brew, or the official OSX package (assuming you are on OSX) http://golang.org/doc/install) and we've tried to leave clear steps for building NSQ here https://github.com/bitly/nsq/blob/master/INSTALLING.md (there really are very few dependencies other than go itself)

overbroad · on Oct 10, 2012

Cheers. I have installed Go before and messed around with it. I agree it's pretty good as far as being self-contained and straightforward. I just don't have the energy to keep chasing a moving target. For my purposes, I'm more interested in stuff that is stable and rarely changes.

If you can produce binaries for UNIX (Linux, BSD, etc), OSX and Windows, I'll be impressed. That's something I was interested in doing with Go (is it possible to cross-compile?) but never managed to learn.

4ad · on Oct 10, 2012

Since the Go1 release, Go is not a moving target anymore. Stable releases are rare and there are binary packages available if you are so inclined.

Go compilers, being derived from Plan 9, are always cross compiling. To cross compile you just set and export GOARCH and GOOS to your target, for example:

  GOOS=windows GOARCH=386 CGO_ENABLED=0 go build foo.bar/baz

Would build foo.bar/baz for 32 bit Windows from any system that has Go and ran make.bash --no-clean with those variables set. More interesting is building for ARM:

  GOOS=linux GOARCH=arm CGO_ENABLED=0 go build foo.bar/baz

arjn · on Oct 9, 2012

We use ActiveMQ at work and are investigating other MQs too. I dont see any info or performance graphs w.r.t message size. That would be a useful addition.

jehiah · on Oct 9, 2012

Good idea, we will try to put together some numbers with regards to message size.

interg12 · on Oct 9, 2012

Anyone at Disqus care to weigh in on this?

andreypopp · on Oct 9, 2012

Interesting if bitly guys had a look at beanstalkd before implementing nsqd.

knodi · on Oct 9, 2012

Beanstalkd is not stable it has a lot issues even in it current version. If you doing to be sending couple million messages per day beanstalkd is not for you.

matticakes · on Oct 9, 2012

beanstalkd is a great project and it certainly inspired some of our thinking. I think of beanstalkd as a more fully functional alternative to simplequeue (what we built/used prior to NSQ - more details in our blog post http://word.bitly.com/post/33232969144/nsq).

Some of the goals of NSQ transcended just replacing our specific daemon that buffered and delivered messages (most importantly the interactions with the lookup service). Because of that, we felt that owning that piece would make it easier to achieve those goals.

Additionally, one of the most important properties of nsqd (the queue component of NSQ) is that data is pushed to the client rather than polled (like in beanstalkd).

stevvooe · on Oct 10, 2012

Where is the guy with the comment about generics? Disappointing.

knodi · on Oct 9, 2012

Does this have gob interface?

matticakes · on Oct 9, 2012

it is data format agnostic so you can certainly serialize using the gob package.

knodi · on Oct 9, 2012

Amazing, with this my whole infrastructure will be 100% Go.