I looked at zeromq, but it seemed like it had very bad debug-ability and visibility into internal operations. Their FAQ [0] says things like:
> How do I determine how many messages are in queue?
> This isn't possible. [...] rather than provide incorrect information the library avoids providing any view into this data.
> How can I retrieve a list of all connected peers?
> This is not supported.
Those kinds of decisions are very scary for production environment. We've had to debug network connectivity problems before, and it just possible what all the interfaces Linux kernel provides. I cannot imagine doing it with library which hides stuff from you on purpose.
> At any given time a message may be in the ZeroMQ sender queue, the sender's kernel buffer, on the wire, in the receiver's kernel buffer or in the receiver's ZeroMQ receiver queue. Furthermore, a ZeroMQ socket can bind and/or connect to many peers. Each peer may have different performance characteristics and therefore a different queue depth. Any "queue depth" number is almost certainly wrong
The library doesn't hide stuff from you, it just refuses to show you information that might as well be a random number.
If you want to know how many peers are connected you use heartbeats instead of just blindly trusting that 1 tcp socket = 1 available peer.
The reasonings assumes I don't know what's going on. This approach is precisely why I never tried ZMQ.
How do you know my circumstances? Maybe this system has tuned rmem_max/wmem_max so I don't care about kernel buffers? Maybe my programs have problems with multi-gigabyte queues? Why are you assuming stuff for me?
The right answer of course is you don't expose a single "queue size" value, but you don't hide it either. Instead, expose something like "list of sockets: for each, here is an fd and a size of userspace queue". I'll then query the kernel myself to get sender's and receiver's kernel buffer size and so on.
Life is hard enough without libraries hiding stuff from you!
But how do I implement backpressure in zeromq? I've used it in production and it was great when it worked, but it was scary because its queue of messages could get arbitrarily large. I think I even saw an OoM when a service went down and something just queued indefinitely.
Edit: just checked the FAQ and apparently it will now block or drop messages. I sort of wish it would explicitly error instead of block, so as not to require the caller to queue in a thread or something.
To be fair, when you use regular sockets you also don't get much visibility into what your OS is doing, by default. If zeromq provided the information it would definitely be used by app developers misguided to how they are supposed to use it, exactly.
One thing that I didn't like while using it was not being able to get logs out of it. So, you don't know if the issue is from zeromq or app itself. Most of the time, it was app itself but also found a few a bugs in Zeromq which took quite time to find it.
> How do I determine how many messages are in queue?
> This isn't possible. [...] rather than provide incorrect information the library avoids providing any view into this data.
> How can I retrieve a list of all connected peers?
> This is not supported.
Those kinds of decisions are very scary for production environment. We've had to debug network connectivity problems before, and it just possible what all the interfaces Linux kernel provides. I cannot imagine doing it with library which hides stuff from you on purpose.
[0] http://wiki.zeromq.org/area:faq