An introduction to RabbitMQ

kabacha · on May 22, 2020

I'm really surprised to see so much positivity about rabbitmq here when it's probably the most sweared-at software in the space.

Let me share my anecdote. In my last work place I got onboarded on rabbitmq and it was such a painful software to work with and almost impossible to set up locally that I silently sneaked in simple redis list as queue alternative for my dev environment. The whole rabbitmq and it's pika library was replaced by 3 lines of python and redis server.

One day rabbitmq died and it tooks sys admins few weeks to get it back running. In that time I deployed my simple redis list and never looked back. To this day redis solution works without any friction whatsoever with fraction of resources.

The rabbits AMQP exchange model is severely flawed and convoluted. It's the worst example of corporate software where everything works and doesn't work at the same time.

I wouldn't recommend rabbitmq to my worst enemy yet there's still something attractive about it. Maybe there's a sane alternative? Maybe zeromq?

Lazare · on May 22, 2020

If you were capable of replacing your use of rabbitmq with 3 lines of python and a redis instance, you shouldn't have been using rabbitmq to start with.

That doesn't mean redis is a drop in replacement for any of the valid uses for rabbitmq though...

lowwave · on May 25, 2020

can you share some of those feature where rabbitmq that is useful? Tks.

smoe · on May 22, 2020

Sounds to me like the problem is, that the company taking out the cannon to shoot at sparrows and that nobody seem to have bothered to provide working dev environments?

It feels common that people think they their every problem requires a planetary scale solution or one that handles every conceivable case that could occur before the heat death of the universe. Be that because your a startup and think you are going to need to support a hundred million concurrent users on launch day, or because your an enterprise and think because your are oh so important you need to use the same tools other important companies use.

My own anecdote: When we refactored our early stage app we had a huge mess with the Redis based queue system. It was one of the biggest sources of errors and a massive pain to troubleshoot or even to monitor what is going on. So we investigated in a bunch of different solutions including all the usual contenders, and we ended up with: Let's just ditch the messages / queues altogether for now and just do boring old cron like jobs invoking internal api endpoints in regular intervals. This made 9 out of 10 cases a lot easier to maintain, and for the remainder, while temporary more difficult, we introduced a queue again at a much later stage.

I'm not saying you shouldn't use and you can't benefit of RabbitMQ at small scales or any technology for that matter. But I think too often in tech decision making ones own or the companies perceived importance, what is cool or what would look good on a CV takes precedence over what really fits the problem in context.

oauea · on May 22, 2020

Odd, I've been running it for years and it's been flawless. Integrating new clients is a breeze using their, or any amqp, library.

Mandatum · on May 22, 2020

brew install rabbitmq?

I agree rabbitmq is often adopted in places it shouldn't, or ill-configured.. But to completely get rid of it because you didn't take the time to RTFM when setting it up seems a little extreme.

And redis/rabbitmq have completely different use-cases 80% of the time. Sounds like you were trying to get drunk on kombucha.

dragonsh · on May 22, 2020

You are right in our case redis failed spectacularly with celery queue when tasks in queue exceeded the memory of the machine. It took weeks to diagnose as error message were inconsistent and setting up a Message Queue cluster with redis is a nightmare.

Went back again to tried and tested rabbitMQ and it works so well. Also adding new nodes and removing nodes so easy, just used ansible to setup Erlang cookie and connect the node (thanks to OTP and BEAM). The best part is for important task queue for which we cannot accept failure we built a mechanism for fault tolerance. When you work with high availability and fault tolerant queue rabbitMQ is so good, it can recover from hardware or even VM failures, can’t say the same for redis which was a nightmare even with redis cluster.

st1ck · on May 22, 2020

Does your solution handle the situation when consumer crashes and queue has to be accumulated (while RAM allows) until consumer is up again (maybe 1 day later)? AFAIK ZMQ can't guarantee this.

twic · on May 22, 2020

It must be emphasised that, despite the name, ZeroMQ is not a message queue. It is a networking library. The current blurb says:

> ZeroMQ (also known as ØMQ, 0MQ, or zmq) looks like an embeddable networking library but acts like a concurrency framework.

And the old meme was "ZeroMQ is a replacement for Berkeley sockets".

It's a pretty cool networking library. But it makes no sense to think of it in the same slot as RabbitMQ, or even Redis.

That the GP mentioned it does make me wonder if they don't really understand what they're doing.

EDIT I love that the official guide actually has this diagram in! Pieter Hintjens's death was a sad loss. http://zguide.zeromq.org/page:all#How-It-Began

jen20 · on May 22, 2020

As an aside, the ZeroMQ Guide is the finest piece of technical writing I have ever seen - closely followed by Pieter’s (sadly unfinished) “Scalable C” book. His non-technical writing is also excellent.

francislavoie · on May 22, 2020

I had a fun time implementing the Paranoid Pirate pattern with my coworker a few years back. He wrote the server part in Python, I wrote the client in PHP. We essentially built it as a wrapper to run some C code our boss wrote that we didn't want to write a PHP extension for - we used Python as a broker to allow for some concurrency. Worked super well.

ayush--s · on May 22, 2020

a Redis list (what he's using) would handle this easily (I think Redis' PubSub also handles this)

codeduck · on May 22, 2020

Amusingly, RabbitMQ is the one message broker that HASN'T given me grief in my career.

mister_hn · on May 22, 2020

With docker it takes 5 seconds to set a rabbitmq instance locally.

elisharobinson · on May 22, 2020

Your fault is using pika. I thought the same untill I tried some other python libraries . Try Oslo messaging . It's used in openstack (enough said).

mcsoft · on May 22, 2020

We used both zeromq and its sort-of-descendant nanomsg but eventually moved to NATS because of it's approach to high availability (every node knows the topology of the whole cluster) and zero management overhead.

Some criticize NATS for the absence of message durability in the core technology, but we figured out we can drop this requirement in 95% of cases. Your microservices should be highly available anyway, so there's always a live consumer - and it's better to handle the message to it directly rather than introducing the overhead of storing a message somewhere and dealing with more-than-once delivery.

There's a bit more to that like you should let your microservices exit gracefully and finish processing consumed messages. And of course in 5% of cases, where you can't allow to lose a single message, you have to use NATS Streaming or the likes, but so far we've been greatly impressed by NATS for high load.

kazinator · on May 22, 2020

ZeroMQ compiles to well over 600 kilobytes of machine code (just on 32 bit x86).

  $ size /usr/lib/i386-linux-gnu/libzmq.so
     text    data     bss     dec     hex filename
   662134   12952      24  675110   a4d26 /usr/lib/i386-linux-gnu/libzmq.so

Yikes, is there an operating system kernel with virtual memory, USB support, a few filesystems and a TCP/IP stack hiding in there?

It's like 35% of glibc:

  $ size /lib/i386-linux-gnu/libc.so.6 
     text    data     bss     dec     hex filename
  1917549   11624   11112 1940285  1d9b3d /lib/i386-linux-gnu/libc.so.6

Which is itself heavily bloated, trying to provide as much POSIX as it can.

nathan-io · on May 23, 2020

I'm equally surprised by your anecdote.

I found it quite simple to install RabbitMQ server and its admin panel in my WSL local dev environment.

And the cloud/prod instance took a few clicks (just spun up a DO Marketplace server image) followed by < five minutes of RabbitMQ user and firewall configuration.

It was also dead simple to start using RabbitMQ within my application. I found a well maintained package, installed it, edited a couple lines of my application's config, and everything just worked.

I specifically avoided Redis based on my understanding that it can't guarantee message persistence, so if it crashes, your unprocessed messages are lost.

formerlurker2 · on May 22, 2020

I'd be interested to know what the 3 lines of python were and also more details about the redis server you deployed to replace RabbitMQ during the outage.

kabacha · on May 22, 2020

I'm using a bit of a hyperbole but using redis list as a task queue is a simple as:

    while True:
        msg = r.lpop(key)
        try:
            result = do_something(msg)
        except Exception as e:
            log.error(f'failure for msg "{msg}" got {e} back to queue {key}')
            r.rpush(key, msg)
            continue

Pop a member from a list, if failed plop it back to the end. It's simple, explicit and just works™

lbruder · on May 22, 2020

It just works, until PUSHing the member back fails. At least use RPOPLPUSH and a separate worker queue to make sure you don't accidentally drop packets.

Don't get me wrong, I'm using Redis queues myself and trying to get rid of the last remnants of RabbitMQ in our code, but there are use cases where RabbitMQ means less thinking about stuff, because it just works(tm)...

jlouis · on May 22, 2020

The other classic variant here is when msg is poisonous. Then pushing it back onto the queue means it'll go again at some point, poisoning your system again.

In practice, you need some dead-lettering with a message timeout. Something RabbitMQ does provide out of the box, for all its complexity.

ayush--s · on May 22, 2020

couple of problems with this: (1) RPUSH can fail (2) LPOP's result might never reach your client after redis has executed it.

ayush--s · on May 22, 2020

rabbitmq used to be one of the hardest things to self-host (even at relatively small scale). But it has been pretty stable since moving to Cloudamqp as a managed solution.

MaxBarraclough · on May 22, 2020

I had the same experience. I had particular trouble configuring encryption for connections. It didn't help that they'd switched configuration file-formats, but the documentation and tutorials still used the old format.

Never had any stability issues though. Once it was up and running, it was solid.

webscalist · on May 21, 2020

RabbitMQ has huge learning curve if you're trying to build a worker queue.

First, you'll learn about ack/noack and get the worker ack on success.

Then, you'll learn about dead letter queue ... etc for delayed retries.

Now, you'll have a topic exchange and a bit hairy routing in place using wildcards.

And you mistakenly set dead letter routing key so that expired messages end up in multiple queues (retry queues and actual worker queue ... ).

Then you rewrite your service in python and use Celery or something.

It's nearly impossible to get RabbitMQ working correctly within few months.

And I forgot about HA. Paying for hosted RabbitMQ might be better. But CloudAMQP in particular could be tricky as well. It can run out of AWS IOPS and your production gets hosed.

Also setting up monitoring on queue health, shoveling error queues ... etc take time to learn and apply. Be careful about routing keys when you shovel error queue to a topic.

vasco · on May 21, 2020

Celery can be backed by RabbitMQ, not sure if that's what you meant, but all of what you described can be abstracted away. I didn't have the same experiences with months taken to get up to speed. Moreover, at work RabbitMQ is probably our most stable underlying tool, perhaps toe to toe with Redis. And that's saying a lot, since I consider Redis to almost be a piece of art in how great of a tool it is.

Back to RabbitMQ though, we run a HA 2 node deployment (just one active writer) and have been for over 3 years, requiring minimal changes or any kind of maintenance whatsoever, has scaled to hundred plus queues, going from some with super high numbers of messages per second, some with only tens of messages per day. Some queues stay low and process fast, others are heavy jobs that get enqueued all at once and generate hundreds of thousands of jobs.

Sure, if you have a service that interacts with disks you should have automated a monitor that cover your IOPS consumption, but I don't see how that's specific to RabbitMQ, you should be doing this for all your instances.

All in all, these are two identic instances, one active, one failover, and in a world of Kafkas and Pulsars and understanding the ins and outs of SQS pricing and capacity allocation, RabbitMQ is a tool that I consider simple to administer and allows me to sleep at night.

Interesting how the same tool can evoke such different reactions, but whatever works - works.

tankerdude · on May 21, 2020

You would think, until you get to a split brain issue. The master and failover lose connectivity, and they each then think they're the master.

There's ways to repair it (and it has happened to me one total time in 4 years), but it does happen. I personally try to make my message processing idempotent for the worker to help alleviate these situations.

411111111111111 · on May 22, 2020

haven't encountered it personally, so honest question here: how does a split brain situation become an issue in a message queue?

there are some possible situation from my naive viewpoint:

1. the 'active' queue keeps jumping between, consumers & producers keep reconnecting

=> everything is still consumed, but takes longer as producers write into alternating queues, which are consumed ... albeit slowly whenever the switch happens

2. they're database backed, so they'll try to write into the same table

=> usually software that does this (but cant handle several writers) also creates a `lock` which has to be manually reset before the failover can come up. if its reset, the other node would fail. only one is up, so no issue?

3. producers/consumers dont notice that the 'active' mq changed, and keep running on initial

=> issue manifests as soon as any system is restarted. but only slowly so you got time to handle it with minor service degradation

none of them really sound that bad to me -- but as i said before, i haven't encountered it before, so i might just overlooking something really obvious?

CloudButWhy · on May 22, 2020

There is a reason why you're supposed to run an odd number of nodes so that you will hopefully have a majority in case of a failure.

yawaramin · on May 22, 2020

Once every four years sounds like a no-brainer, to be honest.

mythrwy · on May 21, 2020

I have simple single node deployment and I was floored how easy it was to set up with Celery. Really surprised. I was kicking myself for not using it sooner.

Granted I don't know all the intricacies of RabbitMQ and this was just one step beyond os.popen, but it was painless, like half an hour painless to set up and it has worked really well.

*edit: reading some of the other posts now I'm waiting for the other shoe to drop. but so far it's worked wonderfully.

robbiep · on May 21, 2020

I also got my first queue set up and running within a reasonable period of time with celery. I have no idea of the internals of RabbitMQ and took longer with celery really (back on python 2.7) but that system has been in prod for 6 years now without really needing any maintenance

m3nu · on May 22, 2020

Same experience. Single node with a few clients and Celery. Works well.

My main issue in the beginning were network timeouts now and then. Those went away after tuning some TCP settings.

LeonM · on May 21, 2020

Thank you for this post.

When I first started using RabbitMQ I experienced just about everything you described.

I felt incredibly stupid when a customer would have issues with a queue being stuck or messages that were being dropped, and having no clue on why this was happening.

> It's nearly impossible to get RabbitMQ working correctly within few months.

This is so true. You can get it running in 10 minutes, but it takes weeks of banging your head against the wall and angry customers before you have it running right.

AmericanChopper · on May 21, 2020

I understand where you're coming from, but what you're describing is learning how to use a queue to maintain consistency guarantees across a distributed system. You can get something simple like AWS SQS working with a few clicks, but then you don't have any of those consistency guarantees.

tracker1 · on May 22, 2020

If you don't need crazy throughput, I find that Azure Storage Queues are crazy easy, built in retry and just simple as can be. Though when I've used it in the past, I've created a slightly simpler to use abstraction.

https://docs.microsoft.com/en-us/azure/storage/queues/storag...

Thinking of doing something that works like an async generator so I can just use it like...

    const work = queue.subscribe('somequeue');
    for await (const {item, done} in work) {
      // do something with JSON.parsed item from message
      await done(); // wrapper for the delete/finish
    }

AmericanChopper · on May 23, 2020

Azure Storage Queues are about on par with SQS. That is, easy to use, but lacking strict concurrency control. If you need that level of concurrency control (and stricter serialization), then you’d be better off their (more complicated) service bus product [0].

RabbitMQ isn’t more complicated because it’s been improperly designed. It’s more complicated because it’s doing a much more complicated task.

A fair amount of that complexity lies in the hosting, so a managed service can take some of that off your hands (for an increased price obviously), but part of it is necessarily going to lie with the message consumer (your application logic). If your use case doesn’t need that level of control, then it doesn’t need that level of complexity either, so something like Rabbit would just be the wrong choice.

[0]: https://docs.microsoft.com/en-us/azure/service-bus-messaging...

gtaylor · on May 22, 2020

Depending on your usage patterns, SQS can be significantly more expensive, too.

rlander · on May 21, 2020

I have my own share of objections, mainly concerning the over-engineered nature of RabbitMQ, but most of the “huge learning curve” items that you’ve described can be learned in an afternoon by a motivated software engineer. Besides, she will have to learn those concepts anyway because they apply to most brokers.

emilsedgh · on May 21, 2020

You're right. It's difficult to get right. However, it is totally worth it. Once you get it working it just works.

I wish a standard set of higher abstractions existed on it though. Celery, from what I hear fills that gap very well in the python world but nothing like this exists in Nodejs land which leaves the room open for a bunch of redis-backed solutions which are pretty fragile in comparison.

csours · on May 21, 2020

I'm curious if you are comparing this to a non-queue solution or to a different queue system?

keithnz · on May 21, 2020

weird, I haven't done much digging in to the details of RabbitMQ, but I integrated it in a matter of hours, and have it deployed in production systems (for quite sometime now) and it works really solidly. I haven't tried to get too clever though.

anon176 · on May 21, 2020

You must of followed a good guide on getting it setup. Took me 2 days to get it solid. Then we decided to just use redis.

keithnz · on May 22, 2020

I just used the official docs and guides they had on the website, they seemed pretty good to me. I might have googled a few extra things, but can't really remember, I just remember it being pretty straightforward. I remember they pointed out a number of things you had to take care of.

symplee · on May 21, 2020

Can anyone recommend an easier alternative?

neurostimulant · on May 21, 2020

I switched to redis since several years ago for simple task queue solution. For my usage (low to medium traffic at most in corporate environment) redis is easier to use and has very little cpu and ram footprint compared to rabbitmq (note that I only use redis for message queue, thus low memory consumption). Never got any message dropped so far. RabbitMQ uses too much memory right from starting up, not ideal for use in a resource constrained server.

https://redislabs.com/ebook/part-2-core-concepts/chapter-6-a...

zmmmmm · on May 22, 2020

Surprised to see not much mention of ActiveMQ in these comments, but it's an obvious alternative choice. The general (simplistic) comparison being:

- ActiveMQ more featureful, robust default settings, better integrated with Java/JMS but slower

- RabbitMQ faster, simpler, more "just works"

The defaults of ActiveMQ lean more towards robustness (hence often naive benchmarks will tell you it's slow). However in practice it is pretty damn easy to run, you literally can just download the default cross-platform distribution and type `./bin/activemq` and it will start running.

We use ActiveMQ + Apache Camel which makes a pretty nice combo to achieve lots of generalised messaging and routing functionality.

mcsoft · on May 21, 2020

http://nats.io

ies7 · on May 21, 2020

I heard a lot of praise about Nats, but isn't it more like a kafka alternative? Someone new need to spend sometime grasping the stream concept.

mcsoft · on May 22, 2020

One practical reason we chose Nats over Kafka was that Nats doesn't need zookeeper for HA.

Nats doesn't provide message durability too, luckily it's not required for 95% of our use cases. Also, having NATS already implemented it's a natural move to use NATS Streaming for durability rather than introducing a completely new technology to your stack.

Less pieces - fewer chances something breaks.

jonathanoliver · on May 22, 2020

NATS by itself is designed to be more of an always-on style queuing system (the term they use is "dial tone") but doesn't handle node failures by itself. If you're looking for a Kafka-flavored NATS, there's a new release I saw recently called LiftBridge that adds some durability to the NATS protocol.

shaklee3 · on May 22, 2020

Someone mentioned this already below, but nats streaming also adds durability.

rhodin · on May 22, 2020

Disclaimer: I work for CloudAMQP

Yeah we hear you regarding AWS IOPS: for some type of loads and smaller plans we need to offer an alarm + an easy way to scale IOPS. It is something we're working on.

jv22222 · on May 21, 2020

The biggest annoyance I found with RabbitMQ was that it could take up to 10-15 mins to restart if it had a lot of jobs.

This was back in 2015 - might be better now.

kchr · on May 22, 2020

Sounds like resource (design) issues to me.

ayush--s · on May 22, 2020

isn't Rabbitmq prone to "split brain" problem on HA setups?

akyu · on May 21, 2020

RabbitMQ is great. One of the few pieces of software I've used that "just works".

The only downside is once you get message-queue-pilled, you start seeing opportunities to refactor/redesign with message queues everywhere and it can be hard to resist the urge. It really is remarkable how, when used appropriately, message queues can dramatically simplify a system.

MR4D · on May 21, 2020

> The only downside is once you get message-queue-pilled, ...

I think this is why email will never die. It's basically turned into a huge message queue. Even voice mails come into my inbox.

====== EDIT - I meant to say "huge universal message queue" and left out the word "universal" accidentally

pjc50 · on May 21, 2020

It always was a message queue in a very literal sense.

There's a lot of work in mailer-daemons to ensure that email has as reliable as possible delivery in a store-and-forward system..

MR4D · on May 21, 2020

You're correct - I left out the word "universal" accidentally, which would have made my intent much more clear.

Thanks for catching that.

guptaneil · on May 21, 2020

I think this is what a lot of people who complain about Slack don't get. It's just a better message queue for your business. The fact that you can funnel all your business events, regardless of whether they originate from humans or bots, into one place and then each worker (again, either human or bot) can subscribe/filter/react to relevant events is super powerful. However, if you try to use it as a corporate SMS platform or email replacement, you will very quickly feel overwhelmed because both of those message queues are designed for much lower throughput.

emmelaich · on May 21, 2020

And you can literally use the maildir format for a queue!

https://pypi.org/project/dirq/

Perl had the original implementation, and there are implementations in other languages.

MR4D · on May 22, 2020

I forgot about that - used to be a cool hack!

dkersten · on May 21, 2020

How is it for production deployment? I was considering it for something recently, but got overwhelmed by the documentation on setting up a fault-tolerant production deployment, so have been avoiding it. Was this an overreaction? What is your experience with that?

Also, do you happen to know how well it works in a fault-tolerant way for communicating between services that are in different data centers?

My main use-case is to receive status/change notifications from a service running elsewhere from the API server servicing the UI, in order to avoid polling for new data.

pbhowmic · on May 21, 2020

RabbitMQ, even a single-node RabbitMQ, has a hard time going down. You are more likely to have your server/container go down long before RabbitMQ node goes down. That being said, if you want to have a clustered solution with nodes being in different DCs, configure shoveling (https://www.rabbitmq.com/shovel.html) or for a simpler solution, use a private VPN to interconnect the RabbitMQ nodes. I would go for the latter.

zonk_ · on May 21, 2020

We use it in a fairly big scale for our slack bot system. It was just set up once, as akyu said, and since then it just works. Whenever we had troubles, it was always anything other than RabbitMQ.

I've also looked into other solutions (ActiveMQ, Google PubSub, ...) and RabbitMQ is by far the most straight-forward and quick to set up. There are some edge cases that it doesn't cover as well, for example automatic retries, but there are some "RabbitMQ patterns" to make it work. For a simple message broker/queue system, it's great and the docs are also great.

neeleshs · on May 21, 2020

We use Google Pub-Sub and got the whole thing up and running very quickly with Spring integration. Message durability, automatic consumer load balancing, automatic retries, some easy broadcast patterns - all out of the box and literally a click of a button on the infra side.

Has worked out quite well so far.

dkersten · on May 21, 2020

Was it easy to setup in terms of reliability and failover?

Given what you and larrik are saying, I think I need to give it a trial run, but its a project with a tiny team, so I want to be sure it won't be the cause of sleepless nights when things go wrong. It sounds like RabbitMQ is quite solid and shouldn't be the cause for concern, which is promising!

Is there anything I should keep in mind for running it in production? Any best practices or gotchas, based on your experience (eg don't run in docker, or make sure there's lots of RAM or things like that)? I guess its all in the production checklist. I need to read through it all again!

ekimekim · on May 21, 2020

This came up in several other threads here: Don't use RabbitMQ's clustering. It's surprisingly brittle and hard to recover from.

The accepted wisdom that I've seen is to run a single broker with a completely independent hot spare. But of course switching over to your hot spare will violate most of the guarentees that Rabbit gives you around durability, ordering etc, so you have to be very careful how you use it.

I desperately want to like Rabbit (and have used it heavily in the past) but right now I wouldn't use it if I can get away with anything else, it just has no real HA story.

jackvanlightly · on May 21, 2020

Rabbit dev here. We released quorum queues a few months ago. It's a Raft based replicated queue that addresses all the old problems. https://www.rabbitmq.com/blog/2020/04/20/rabbitmq-gets-an-ha...

sciurus · on May 21, 2020

Thanks for all your work on RabbitMQ, and for your great blog posts about it and other messaging systems.

For anyone who wants to understand the potential complexities of HA RabbitMQ, spend some time reading https://jack-vanlightly.com/blog/2018/8/31/rabbitmq-vs-kafka...

theptip · on May 21, 2020

In my experience RMQ is solid enough that for many use-cases it's reasonable to run it without a standby (especially if you're on Kubernetes where you'll get a replacement instance created automatically if your active instance fails).

A common use-case is for async tasks (Celery) that can tolerate a few minutes of downtime. If you're running a fully evented architecture then this might not apply - though if you're not targeting 4-5 nines of reliability or an RTO of < 5mins, then you might not need a standby even if RMQ is a core part of your architecture. "Avoid single points of failure" is a good heuristic, but "consider the SLAs of your dependencies" is the more granular way of thinking about this, and a single RMQ instance has a very high uptime.

For context I had an RMQ docker container running for almost two years without any interruptions. If you're in a small team then HA might well be overkill.

Fun gotcha - if you're running RMQ in Kubernetes/Docker, make sure you give it an explicit memory limit, else it will try to allocate disk space equal to 40% of your host's memory. (See "memory limits" in https://hub.docker.com/_/rabbitmq). That's a good best-practice for any containerized environment regardless what workload you're running, but this one will cause errors if you're trying to use a small disk volume on a host with lots of memory.

hrpnk · on May 22, 2020

Which Kubernetes operator for RabbitMQ are you using?

theptip · on May 22, 2020

I’m just using the Helm manifests, but when I set this up operators were not a thing. I’d probably look into the operator approach if I was starting from scratch now.

gog · on May 21, 2020

I've been running RabbitMQ for >8 years in production, once even in a fleet of 180 buses where every bus had an instance of rabbitmq running locally.

Never had a single issue in all those years.

But, I must admit that running a HA cluster is something that I've never tried, it sounds complicated and scary once you start digging through the docs.

All my deployments have been to bare metal Ubuntu and Debian machines with durable Qs and messages.

If you need to use transactions, they are really slow, couple of orders of magnitude slower compared to regular AMQP usage.

zonk_ · on May 21, 2020

> Was it easy to setup in terms of reliability and failover?

To be honest, the current setup was set up by colleagues who have even less experience than me and it's still running flawlessly. Iirc, it's just two instances that are behind a load balancer and the consumer just consumes from both, but I'm not super certain on that.

I've tested the cluster functionality to see how to set it up and it worked fine for me, but I have no experience with that in production, but other people in this thread don't seem to be too happy with it, so ymmv.

> Is there anything I should keep in mind for running it in production?

Nothing special that you wouldn't do otherwise, when getting to know a new component/microservice. Just check out the get started section[1] on their page in the appropriate language and play around with a small setup. Get familiar with the libraries to connect and send/queue/fetch stuff and the topics. Make sure to use your brainpower before you set it up to handle all eventualities and is set up exactly how you want it to act (ACK/NACK, what happens if a sender/consumer dies, etc.), because then you set it up once and probably never touch it again.

One thing I'm not really sure on and what I haven't really answered myself yet: there might be some "logic" to your RabbitMQ instance, depending on which metadata you add to each message (e.g. retries). If you have such logic, it might be better to have a service around the RabbitMQ instance, otherwise this logic ends up in your code base of you actual solution and that might not be wanted and maybe harder to maintain. But I'm not so sure myself on that one.

Oh yeah, and check out some patterns for your needs. There are for example multiple ways to implement retries, for example with a queue for queues, etc. But if you API is REST based, everything should be straightforward.

[1]: https://www.rabbitmq.com/getstarted.html

dkersten · on May 21, 2020

Thanks for the detailed response (and to everyone else who responded too!), I appreciate it! I will prototype something and play around and see how it handles different situations when I get time. I've also just bought the book mentioned elsewhere, so hopefully I can get up to speed quickly. It does sound that my original impression about it being complex to run/maintain was perhaps overblown. That's good, because from a features point of view, RabbitMQ seemed like a good fit for the things I want an MQ solution for.

heipei · on May 21, 2020

My go-to solution for fault-tolerant message queues is nsq (https://nsq.io/). nsq works differently from most other message queues in that it's supposed to be run in a distributed fashion, i.e. one nsqd running wherever messages are produced. That way you have a lightweight and fast local message queue that you can push messages to and not worry about network connectivity. You can use nsqlookupd to find the distributed nsqd that hold the topic you want to subscribe to, or you can run an additional nsq-to-nsq process to push messages from one broker to the next. It's a really great and very mature and stable piece of software. I'd say the only downside to using nsq is that you have to invest a little more in monitoring and you have to make sure that network connectivity between your consumer and each nsqd that carries a certain topic is possible.

dkersten · on May 21, 2020

Thanks for the recommendation! That looks pretty nice and “ops friendly” is definitely a plus. I will investigate this further.

ecoqba11 · on May 21, 2020

Been using RabbitMQ for a lot of projects in production. It can handle quite a lot data and this thing never fails. Sometimes it can be running for an entire year and we force restart just because.

rawoke083600 · on May 21, 2020

Yup been my production experience as well ! Super solid system !

ketralnis · on May 21, 2020

We use it for more or less everything at reddit. Almost every user action corresponds to a rabbit queue

akoncius · on May 21, 2020

sounds cool! how big queues are on your setup? how big mq instances (servers) are? do you use HA, replications/failovers?

larrik · on May 21, 2020

We switched from Amazon's SQS to RabbitMQ, because SQS was killing our performance, and wasn't nearly as powerful overall.

RabbitMQ gave us such a performance increase that we killed our database. We ended up having to rate limit RabbitMQ!

jaquers · on May 21, 2020

> I was considering it for something recently, but got overwhelmed by the documentation on setting up a fault-tolerant production deployment, so have been avoiding it. Was this an overreaction?

In general the defaults are pretty good I think. There is a one page production deployment guide: https://www.rabbitmq.com/production-checklist.html that I followed to replace our handbuilt cluster w/ a new automated deployment, plus a few other niceties like docker logs & rmq metrics to cloudwatch and then auto clustering via autoscaling groups lookup.

I thoroughness of the docs can perhaps seem daunting, but I see it as a badge of quality and especially if you are growing it's usage organically it should "just work".

tankerdude · on May 21, 2020

If it's super simple like that and the throughput isn't massive, use something else you don't need to support, like AWS's SQS.

If you're bad at hosting and need the throughput, there's cloudamqp.

So many options for pub/sub systems so use what works for you.

nailer · on May 21, 2020

+1. Discovered RabbitMQ/AMQP around 2010, since then tech went through a 2015-era wave of HTTP microservices that has come, and, largely gone or moved to MQ.

carterklein13 · on May 21, 2020

When you say "gone or moved to MQ" - if not moved to messaging services like RabbitMQ/NATS/etc, where else could things have gone? At least from my experience, HTTP microservices are still very common, especially when using things like AWS Lambdas.

I feel like most continually-running backends will make use of RabbitMQ/NATS/ZeroMQ/etc, or more and more I see lightweight systems going completely serverless and just using lambdas - which are HTTP microservices.

nailer · on May 21, 2020

> When you say "gone or moved to MQ" - if not moved to messaging services like RabbitMQ/NATS/etc, where else could things have gone?

They could have stayed trying to do continually running microservices on HTTP.

> I feel like most continually-running backends will make use of RabbitMQ/NATS/ZeroMQ/etc

I do too.

> more and more I see lightweight systems going completely serverless and just using lambdas - which are HTTP microservices.

Likewise.

But long running HTTP microservices are lame, and everybody realises that now, despite it being a cool idea back in 2015.

carterklein13 · on May 21, 2020

To be fair, I started working post-2015, so I've actually never come face-to-face with a long running HTTP microservice backend... what would something like that even look like? I'm thinking of systems I've worked on that use a messaging queue, but that only rely on HTTP requests - is that what it would be? So like, I'd make a request to a microservice behind an endpoint, which in turn would make requests to 3 more microservices behind other endpoints? If so, I'm certainly glad that idea isn't cool anymore because that seems greatly inefficient :)

fluxsauce · on May 21, 2020

  moved to MQ

Are you referring to IBM MQ?

jonesetc · on May 21, 2020

probably just meant message queues in general.

adamkf · on May 22, 2020

I've had truly terrible experiences with RabbitMQ. I believe that it should not be used in any application where message loss is not acceptable. Its two big problems are that it cannot tolerate network partitions (reason enough to never use it in production systems, see https://twitter.com/antifuchs/status/735628465924243456), and it provides no backpressure to producers when it starts running out of memory.

In my last job, we used Rabbit to move about 15k messages per sec across about 2000 queues with 200 producers (which produced to all queues) and 2000 consumers (which each read from their own queues). Any time any of the consumers would slow down of fail, rabbit would run out of memory and crash, causing sitewide failure.

Additionally, Rabbit would invent network partitions out of thin air, which would cause it to lose messages, as when partitions are healed, all messages on an arbitrarily chosen side of the partition are discarded. (See https://aphyr.com/posts/315-jepsen-rabbitmq for more details about Rabbit's issues and some recommendations for running Rabbit, which sound worse than just using something else to me.)

We experimented with "high availability" mode, which caused the cluster to crash more frequently and lose more messages, "durability", which caused the cluster to crash more frequently and lose more messages, and trying to colocate all of our Rabbit nodes on the same rack (which did not fix the constant partitions, and caused us to totally fail when this rack lost power, as you'd expect.)

These are not theoretical problems. At one point, I spent an entire night fighting with this stupid thing alongside 4 other competent infrastructure engineers. The only long term solution that we found was to completely deprecate our use of Rabbit and use Kafka instead.

To anyone considering Rabbit, please reconsider! If you're OK with losing messages, then simply making an asynchronous fire-and-forget RPC directly to the relevant consumers may be a better solution for you, since at least there isn't more infrastructure to maintain.

tmarice · on May 22, 2020

Rabbitmq blocks producers when it hits memory high watermark (default 40% of available RAM) - https://www.rabbitmq.com/memory.html

dzr0001 · on May 22, 2020

We used to have a pub rate of about 200k msgs/s, from about 400 producers all to a single exchange and had similar issues. However, we were able to mitigate this by using lazy queues.

This worked fine until things got behind and then we couldn't keep up. We were able to work around that by using a hashed exchange that spread messages across 4 queues. It hashed based on timestamp inserted by a timestamp plugin. Since all operations for a queue happen in the same event loop, any sort of backup led to pub and sub operations fighting for CPU time. By spreading this across 4 queues we wound up with 4x the CPU capacity for this particular exchange. With 2000 queues you probably didn't run into that issue very often.

Teknoman117 · on May 22, 2020

We had a similar experience where I work. We just ended up rolling our own queue system because we really just needed point to maybe a few other points.

lukebakken · on May 22, 2020

Kyle's analysis of RabbitMQ is almost 6 years old. Rest assured, things have changed since then.

zbentley · on May 22, 2020

How did the switch to Kafka solve your issue with providing backpressure to producers?

zbentley · on May 22, 2020

I'm glad Kafka is working for you you! Rabbit's HA story has definitely been rough until recently. But I think a few of the issues you describe can be mitigated with a bit better understanding of what's going on.

> Any time any of the consumers would slow down of fail, rabbit would run out of memory and crash, causing sitewide failure.

Not to be glib, but in any brokered system, you to have enough (memory and disk) buffer space to soak up capacity when consumers slow down, within reason. Older (2.x) RabbitMQs did a very poor job rapidly paging queue contents to disk when under memory pressure. Newer versions do better, but you can still run the broker out of memory with a high enough ingress/low enough egress, which brings me to...

It sounds like you did not set your high watermarks correctly (another commenter already pointed this out); RabbitMQ can be configured to reject incoming traffic when over a memory watermark, rather than crash.

However, a couple of things can complicate this: rejection of incoming publishes on already-established connections may not make it back to your clients, if they are poorly behaved (and a lot of AMQP client libraries are poorly behaved) or are not using publisher confirms. Additionally, if your clients do notice that this is happening and continually reattempt to reconnect to RabbitMQ to handle the (actually backpressure due to memory) rejection notification, this connection churn can put massive amounts of strain on the broker, causing it to slow down or hang. In RabbitMQ's defense, connect/disconnect storms will damage many/most other databases as well.

> We experimented with ... "durability", which caused the cluster to crash more frequently and lose more messages

A few things to be aware of regarding durability:

Before RabbitMQ 3-point-something (I want to say 3.2), some poorly chosen Erlang IO-threadpool tunings caused durability to have higher latency than expected with large workloads. Anecdotally, the upgrade from 3.6 to 3.7 also improved performance of disk-persisted workloads.

If you have durability enabled, you should really be using publisher confirms (https://www.rabbitmq.com/confirms.html) as well. This isn't just for assurance that your messages made it; without confirms on, I've seen situations where publishers seem to get "ahead" of Rabbit's ability to persist and enqueue messages internally, causing server hiccups, hangs, and message loss. That's all anecdotal, of course, but I've seen this occur on a lot of different clusters. Pub confirms are a species of backpressure, basically--not from consumers to producers, but from RabbitMQ itself to producers.

When moving a high volume of non-tiny messages (where tiny is <500b), you really need a fast disk. That means the equivalent of NVMe/a write-cache-backed RAID (if on real hardware; ask me about battery relearns killing RabbitMQ sometime ... that was a bad night like the one you described), or paying attention to max-throughput/IOPS if deploying in the cloud (for example, a small EBS gp2 volume may not bring enough throughput, and sometimes you may need to RAID-0 up a pair of sufficiently-sized gp2's to get what you need). And no burst IOPS, ever.

> We experimented with "high availability" mode

You're 100% right about this. RabbitMQ's story in this area was pretty bad until recently. Quorum queues and lots of internal improvements have made the last ~4 years worth of the Rabbit versions behave better in HA configurations. But things can still get really dicey. Always "pause minority" (trade away your uptime for message loss), as the Jepsen article you linked mentioned.

For failure recovery (though it's not that "HA") if you can get single-node durability working well and are using networked disks (e.g. NFS, EBS) or a snapshotted-and-backed-up filesystem, one of the nice things about RabbitMQ's persistence format is that at the instant of crash, all but the very most recent messages are recoverable in the storage layer. That doesn't solve availability, but it does mean you don't have catastrophic data loss when you lose a node (restore a snapshot or reattach the data volume to a replacement server).

sethammons · on May 22, 2020

Wow, that error message! Unless you are Google, network partitions are a thing. With CAP, you don’t get to choose CA.

wegs · on May 21, 2020

My general problem is that it's really hard to figure out which architecture is right for which system.

There's a different architecture for:

* one queue with billions of messages

* a millions of queues with small numbers of messages per queue

* many queues with many messages per queue

There are also different topologies:

* Anyone can send a message to anyone (O(n^2) queues)

* One publisher with millions of subscribers

* One subscribed with millions of publishers

* Complex processing networks, where messages get routed in complex ways between processing nodes.

There are differences in timing:

* More-or-less instant push notifications

* Jobs which run within e.g. 5 minutes with polling

* Jobs which run in hours/days, with a cron-style architecture

And in reliability:

* Messages get delivered 100% of the time, and archived once delivered

* Messages get delivered 99.999% of the time, but might be dropped on a system outage

* ... all the way down to ephemeral pub-subs

... and so on.

I'd give my VP's right eye to get a nice chart of what supports what. For the most part, I've found build to be cheaper than buy due to lack of benchmarks and documentation for my use cases. Otherwise, you build. You benchmark. You optimize. And things melt down.

My use case right now requires a large number of queues (eventually millions). I'd like to have an archival record of messages. Peak volume is moderate (several messages per second per queue), but usage patterns are sporadic (most queues are idle most of the time). Routing is slightly complex but not supper-complex (typically, about 30 sources per sink, at most 200; most sources only go to one sink, but might go to 2-3). Messages are relatively small (typically, around 1k), but isolated messages might be much bigger (still <1MB, but not small).

My experience has been that when I throw something like that into pick-your-queue/pub-sub, things melt down at some point, and building representative benchmarks is a ton of work.

whydid · on May 21, 2020

All software breaks at some point. If you're dealing with this scale of load, it's mandatory to perform synthetic load testing to validate, otherwise you're just guessing what the breakage threshold will be.

CuriousSkeptic · on May 21, 2020

Checkout https://github.com/yevhen/Streamstone Had similar needs, it was a good place to start. It’s just the persistence part of it though, did the messaging part using actors.

sethammons · on May 22, 2020

Fantastic points. We needed millions of queues with millions of items with fair queueing and scheduled release of some items and immediate release of others. 10s of thousands of messages per second. We had to build our own.

vorpalhex · on May 22, 2020

RabbitMQ is highly configurable in this regard but you will hit snags in how you distribute queues across exchanges.

Likewise this configurability makes case specific benchmarks very awkward.

tombert · on May 21, 2020

I feel like RabbitMQ is sort of the "swiss army knife" of message queues, and I mean that in the nicest way possible.

People will compare it to Kafka, claiming that its pubsub is faster than Rabbit's, but that's sort of missing the point: Rabbit thrives because it's easy to set up, will work well for 99% of cases, and handles nearly every kind of distributed problem you're likely to come across.

I recently did a project with Rabbit on my home server, and while the project had some issues, the issues were never Rabbit.

wenc · on May 21, 2020

Rabbit doesn't have Kafka's ability to massively distribute and scale (it does have a distributed story but from what I hear few explore it). But Rabbit also supports more complex use cases than Kafka because its messaging protocol (AMQP) is more intelligent. Unless you're a "web-scale"/s company, Rabbit's scale even on one node is likely enough.

I've been using Rabbit in production for RPC and pub/sub for the past 5 years (single instance running on a non-dedicated VM, medium traffic) and its been pretty easy to setup and has been pretty reliable in practice.

I've always been concerned about losing messages, and I did have to learn to turn on persistence and durability for messages to survive server interruptions, but it was easy enough. Message acknowledgements are also a nice feature, and Rabbit is able to achieve at-least-once messaging semantics.

tombert · on May 21, 2020

Yeah, I don't dispute that for certain usecases, Kafka is definitely the better choice, use the right tool for the right job.

That said, for most small to medium-large tasks, Rabbit will handle things without much trouble, making it a good fit for most common usecases.

fernandotakai · on May 21, 2020

it's also super duper stable even on default configs. it's one of my favorite softwares ever.

hnrodey · on May 21, 2020

I'm very well versed on RabbitMQ. We use it internally in a .NET codebase.

Anyone considering RabbitMQ needs to read up on "network partitions", how to build your cluster to avoid them (odd number of nodes and pause_minority), your recovery strategy for when a network partition occurs (it will occur), your personal/organizational tolerance for message loss and a plan for how you will upgrade your cluster at some later date (ensure you architect your application to handle whatever type of upgrade strategy you will pursue).

There are definitely ways to operate to minimize these failures but you SHOULD KNOW ABOUT THEM before your add this service to your environments.

fmorel · on May 22, 2020

If you're using RabbitMQ on .Net, I highly recommend using NserviceBus. It's made working with queues so easy. It handles maintaining a connection and retrying/acknowledging messages for you.

hnrodey · on May 22, 2020

Hindsight is the best site. That's definitely what I would do if I was starting a new project using RabbitMQ. Although I'll defend myself on this front; I inherited our RabbitMQ project from the developer who left the company 7/8 of the way through the implementation. I had the "make it work" directive and not the decision making luxury he had from the beginning.

jonathanoliver · on May 22, 2020

I worked on NServiceBus years ago (not just used, but actually was an active developer on the project). It's an excellent piece of software and Udi Dahan really knows what he's talking about.

danjc · on May 21, 2020

This. We used Rabbit in our platform for a few years and it was an absolute disaster. Network partitions mainly. In retrospect I'm sure we were doing it wrong but that really wasn't obvious at the time.

polygotdomain · on May 21, 2020

Can you talk a little bit about how you've managed your RabbitMQ infrastructure? Also if you've done any comparison to Azure Message Queues and what were the pros and cons against Rabbit?

I'm looking to pitch adding a message queue to our infrastructure (at a .Net shop on Azure), and I'm sure there will be some questions about the comparisons between the two. Unfortunately, that's been tough to really track down.

hnrodey · on May 21, 2020

>Can you talk a little bit about how you've managed your RabbitMQ infrastructure?

From a ten thousand foot view, two or three node clusters running in non-prod environments on virtual machines running Windows. In Prod, three node clusters on Windows virtual machines.

All work to install and configure RabbitMQ is done manually. Sadly enough.

I'm on the application/architecture side of this equation but I know enough about our infrastructure to perhaps answer follow-ups or more specific questions.

Our application is single tenant (so each customer is deployed in their own isolated area) so we use virtual hosts to isolate each customer within the cluster.

>Also if you've done any comparison to Azure Message Queues and what were the pros and cons against Rabbit?

Definitely looked in to the Azure native queueing options but it's been awhile. Azure Message Queues is an AMQP compliant messaging system that seems fairly robust. To be transparent, I have no production experience with this product. If your company/department is in to managing virtual machines then they might want/prefer to go with RabbitMQ. However, if they're in to PaaS systems then I'd probably roll with Azure Message Queues and never look back.

polygotdomain · on May 21, 2020

Thanks for the response. From the other responses in this thread, it seems like the admin of the nodes/cluster is not overly onerous. Would you agree with that statement? Also, being a .Net shop, the Windows VMs make sense, but is there any tradeoffs to running Rabbit on Windows, as opposed to Linux?

I think part of the sell is how we would manage the admin component of a Message Queue, which tilts things towards Azure Message Queues as it's PaaS. We're mostly IaaS at the moment, and starting to see some of the admin overhead that comes with managing that infrastructure ourselves. We're not ready to jump onto a PaaS solution for the things we've grown accustomed to managing, but for something brand new, I think my company would be open to it.

Architecturally, we'd lean on it initially for background job processing, which is currently at a scale where our homegrown, db-backed solution is starting to show it's weaknesses. Once it's in place though, I think it could leveraged as a key component to decouple subsections our application and give us more flexibility with scaling and deployment.

hnrodey · on May 21, 2020

>From the other responses in this thread, it seems like the admin of the nodes/cluster is not overly onerous. Would you agree with that statement?

Yes.

>is there any tradeoffs to running Rabbit on Windows, as opposed to Linux?

Should be fine to run on Linux assuming you (or you have) people are who are comfortable admin'ing Linux servers. I think that a Windows admin would get frustrated to setup/configure RabbitMQ on a Linux server. There's also a container advantage as RabbitMQ is published to Docker only with officially maintained Linux images.

>We're not ready to jump onto a PaaS solution for the things we've grown accustomed to managing, but for something brand new, I think my company would be open to it.

I'd push you to figure out why Azure Messages Queues would not work for you. If there's no compelling "no" argument then you'll thank yourself later.

>Architecturally, we'd lean on it initially for background job processing, which is currently at a scale where our homegrown, db-backed solution is starting to show it's weaknesses.

We pursued RabbitMQ for very similiar reasons (queueing mechanisms via SQL Server tables and stored procedures). Keep in mind that you still need something to submit the job (initiate the background task). RabbitMQ is not going to automagically schedule anything for you. We have a couple applications that use the tool Hangfire for job scheduling and in one case, the Hangfire job simply sends a message to RabbitMQ.

olikas · on May 21, 2020

RabbitMQ runs perfectly fine on Windows too. As others mentioned in the comments, RabbitMQ supports a great variety of use cases. If you want to reach out for help, you can find my contact in the article.

tracker1 · on May 22, 2020

Download and set aside a copy of the erlang and rabbitmq installers if you're running on windows... I've had issues on many occassions with the erlang installer being unavailable or very slow to download.

hnrodey · on May 21, 2020

Also, if Redis 5 is already part of your stack then you should look at their Streams feature before adding anything like RabbitMQ or Azure MQ.

Unfortunately streams were released after we introduced RabbitMQ to our application and I really wish we could just focus on Redis.

tracker1 · on May 22, 2020

I'll add one comment, if you aren't doing a really large number of queued items (under 50k messages every few minutes), Azure Storage Queues are pretty nice and easiest to use imo.

crad · on May 21, 2020

Using the opportunity to pimp my book, RabbitMQ in Depth: https://www.manning.com/books/rabbitmq-in-depth

:)

LeonM · on May 21, 2020

So weird seeing you post that, as I literally have this book on my desk right now.

Thanks Gavin, I learned a lot from reading it!

crad · on May 21, 2020

That's awesome! I'm glad it was useful!

olikas · on May 21, 2020

It is a great book, indeed. I always recommend it.

crad · on May 21, 2020

Thank you!

rhodin · on May 21, 2020

It is a nice book, highly recommend it!

crad · on May 21, 2020

Thank you!

zerr · on May 21, 2020

This is HN - we need a sales chart! :)

davidcorbin · on May 21, 2020

RabbitMQ has been awesome in my experience. One of the few tools that just works and has a super useful management web interface and Prometheus support among other plugins.

For those noting HA and scalability, it not meant for those use cases where (virtually infinite) horizontal scalability are the biggest concern. If you need horizontal scalability at a massive scale, use Kafka. But for the majority of cases, you can get away with limited scalability and the prod setup, development experience, and reliability of RabbitMQ are unmatched from my experience.

halfmatthalfcat · on May 21, 2020

I've been trying to rationalize using either RabbitMQ or Kafka for something I'm building. High messages per second but with more complex routing topologies.

Rabbit seems to be the right path but I'm worried about scaling out as many sources seem to point as Kafka being more scalable (at least horizontally). I've been looking into Rabbit's Federation but it's still not clear if that will solve the problem down the road.

Can anyone shine some light?

vel0city · on May 21, 2020

I've been running RabbitMQ on pretty small VMs for a long time. RabbitMQ doesn't need a lot of resources per message, even with very small VMs (512MB RAM, single CPU) I've seen it handle peaks of many thousands of messages a second without running into problems. Give it a bit of beefy hardware and it'll probably handle whatever load you were thinking, unless you're saturating 10gig links with messages or something.

RabbitMQ and Kafka are very different struggles when thinking of scaling and performance. Kafka is almost a database itself of messages which have routed through the system. In many configurations clients can come back and demand to replay the message stream from almost any point in time. This means you need to handle _a lot_ of disk and memory access. With RabbitMQ, messages are traditionally very ephemeral. Once a message has been ack'd, its gone. Poof. Not in memory. Not on disk. Nobody is going to come back asking for that message. This leads to a lot more efficiency in handling things per message, but at the cost of not being able to remember the messages that went through the system a few milliseconds ago.

aeyes · on May 21, 2020

CPU usage highly depends on the number of connected clients, not that much on message throughput. You can experiment with the excellent rabbitmq-perf-test tool to get some ballpark numbers.

I have a system that only pushes 5k messages per second but it needs 32 cores.

nacs · on May 21, 2020

What’s the amount of connected clients around for that 32 core setup?

aeyes · on May 21, 2020

around 3000 connections, 2000 queues, 1K message size

airfreak · on May 21, 2020

Yeah that sounds about right. Of course if you had 200 connections and 50 queues you'd more likely be seeing 100000 msg/s. The number of connections and queues has a big effect on total throughput.

addisonj · on May 21, 2020

As someone who has ran a number of messaging systems in production, this is what my current take is in general:

If you are moving to a more "event-sourced" architecture, usually two main concerns (beyond basic operational stuff of uptime, scale, etc) are routing and long-term retention.

RabbitMQ has the routing but not the retention. Kafka can have the retention and the routing, but it can be complex/expensive. Apache Pulsar really shines here as the API is pub/sub but it is underpinned by a log structure that gives you long-term retention (that doesn't need to be manually re-balanced) but it's flexibility does come with some operation complexity when compared to RabbitMQ.

If your needs is pretty much just moving large amounts of data, Kafka is definitely the most mature and has a big ecosystem, but long term-retention is difficult and there are some sharp edges around consumer groups.

If you really really don't need long-term retention and need complex topologies, RabbitMQ is your best bet and is fairly reasonable to operate even up to fairly high message rates (~10k msgs/sec shouldn't be too hard to achieve)

There are a TON more options these days though, older more java solutions like activeMQ and rocketMQ or more "minimal" implementations like NATs, not to mention the hosted services on cloud providers.

Personally, I am a big fan of Apache Pulsar for it's flexibility and some nice design choices, but I don't think there is any silver bullet in this space.

dvlsg · on May 21, 2020

Would you mind expanding on some of the operational complexity you ran into with pulsar?

I think pulsar is wonderful, but I haven't had the chance to use it for anything serious / in production yet, so I'm curious what pain points you had.

jonathanoliver · on May 22, 2020

I'm guessing that the pain points surrounded having to set up a Zookeeper cluster in conjunction with Pulsar. I think Pulsar has the best model of the various queuing systems at the moment for the routing flexibility of RabbitMQ, the high-throughput of Kafka (topic/partitions), as well as the ability to seamlessly integrate with cold storage (S3/GCS) and to recall messages from cold storage without extra code (unlike Kafka), I just wish that ZK wasn't an additional dependency.

Anyone know of any Pulsar hosting providers?

atombender · on May 21, 2020

Adding to what the sibling comment say, be careful about buying into RabbitMQ's clustering; having run it for years, I found it to be extremely brittle.

We often lost entire queues because a small network blip caused RabbitMQ to think there was a network partition, and when the other nodes became visible, RabbitMQ has no reliable way to restore its state to what it was. It has a bunch of hacks to mitigate this, but they don't solve the core problem; the only way to run mirrored queues ("classic mirrored queues", as they're not called) reliably is to disable automatic recovery, and then you have to manually repair RabbitMQ every time this happens. If you care about integrity, you can use the new quorum queues instead, which use a Raft-based consensus system, but they lack a lot of the features of the "classic" queues. No message priorities, for example.

I've never used federation or Shovel, which are different features with other pros/cons.

If you're willing to lose the occasional message under very high load, NATS [3] is absolutely fantastic, and extremely fast and easy to cluster. Alternatively, NATS Streaming [4] and Liftbridge [5] are two message brokers built on top of NATS that implement reliable delivery. I've not used them, but heard good things.

[1] https://www.rabbitmq.com/partitions.html

[2] https://www.rabbitmq.com/quorum-queues.html

[3] https://nats.io/

[4] https://docs.nats.io/nats-streaming-concepts/intro

[5] https://github.com/liftbridge-io/liftbridge

shoo · on May 21, 2020

> lost entire queues because a small network blip caused RabbitMQ to think there was a network partition, and when the other nodes became visible, RabbitMQ has no reliable way to restore its state to what it was

I can offer a similar anecdote: we started seeing rabbitmq reporting alleged cluster partitions in production after enabling TLS between rabbitmq nodes, where manual recovery was needed each time.

After a bit of investigation we noticed that cluster partition seemed to correlate with sending an unusually large message (think something dumb like 30 megs) through rabbitmq when TLS between rabbitmq nodes was enabled. What I believe was happening was Rabbitmq was so busy encrypting/decrypting large message that it delayed sending or receiving heartbeat & then the cluster falsely assumed there has been a network partition.

Mitigated that issue by rewriting system to not send 30 meg messages- there was only one message producer that sent messages anywhere near that large, and after a bit of thought realised it was not necessary to send any message at all in that case (sending large message was to hack around some other old system performance problem that had gotten fixed properly a year back, but the hack that generated a huge message was still in place)

ramchip · on May 21, 2020

Erlang/OTP-22 (released last year) introduced TLS distribution optimizations and message fragmentation which sound very related to the problem you saw:

http://blog.erlang.org/OTP-22-Highlights/

The fragmentation in particular addresses the problem where a large message would block all other messages, including heartbeats, and cause nodes to look “down” when they’re not.

shoo · on May 21, 2020

fantastic. thank you for sharing that -- my anecdote about this problem is slightly dated -- it would have been late 2017 early 2018 we were seeing the issue, which indeed predates OTP 22 release.

airfreak · on May 21, 2020

The old network partition problems people remember about RabbitMQ are solved by quorum queues.

atombender · on May 21, 2020

Yes, but quorum queues don't have many of the features of classic mirrored queues.

fernandotakai · on May 21, 2020

it used to be really bad, that's super true.

nowadays? it's actually quite simple to setup and works pretty well (source: i know two different companies that setup clustering recently and both had good experiences with no downtime).

snapetom · on May 21, 2020

I've used both. I was introduced to Rabbit at one job and at another, was "fed" Kafka during a selection process. At that time, I was definitely not opposed to Kafka because, hey new resume item. I ended up yearning for Rabbit for three reasons.

1) Much easier to implement and maintain for small to medium architectures. However, war stories I've heard is that it starts to become a hassle for large clustering architectures.

2) Because it's a traditional message broker, the input and output ends, which I was responsible for, were much simpler to write because I didn't have to worry about replays when it came back online. Rabbit knows which client it has already routed to and where messages went. Kafka is not that sophisticated in that regard. Kafka has been described as "dumb broker/smart clients" while Rabbit is "smart broker, dumb clients."

3) The scaling. Rabbit is very scalable. Once you get to the Uber/Paypal level (like, a couple of million writes per second), then Kafka becomes the obvious choice. Rabbit handles thousands or writes per second just fine. However, at that second company and like many others, they thought they'd have to suck up all the data, so of course, Kafka was the more scalable tool long term. Spoiler: We were never, ever close to PayPal-level transactions. If the size of the sun represents paypal/Uber transactions, we were basically Manhattan.

lllr_finger · on May 22, 2020

Kafka is one of those things where if you're new to it, especially if you're coming from Rabbit or similar, you might tend to assume the happy path - exactly once delivery. This is a bad mistake (whether that's possible and to what definition is not a debate I'd like to dive into now). What you should expect from Kafka is at least once delivery.

There will be times when you lose offsets or when you actually want to replay every message, so take an hour and figure out what that means to your app. It's usually only a few lines of code in your consumer that compares source timestamps, but it's by far the most beneficial thing you can do when working with Kafka in my experience.

It's also relatively easy to hit "tens of thousands" messages/second, especially in replay or bootstrapping scenarios, and that's when Kafka becomes useful to the non-FAANG companies.

olikas · on May 21, 2020

Author here.

I've seen quite a lot messages going through RabbitMQ. I wouldn't worry too much about scaling, because the possibilities depend very much on the architecture. With some tuning RabbitMQ can take you a long way. I would give clustering a go and see where the limits are before exploring more complicated architectures like federation.

aeyes · on May 21, 2020

Could you explain how RabbitMQ clustering is going to improve performance? For how it works I would expect it to lower performance.

olikas · on May 21, 2020

With clustering, you can have more nodes and you can shard (distribute) your queues over the cluster. You don't need to mirror every queue on every node. But you are right, mirroring alone will add more load.

froderick · on May 21, 2020

Rabbit's federation is a good way to bridge point-to-point connections between geographically distributed systems. I'm not sure that's a great scaling pattern for throughput though.

The clustering might look tempting but it hasn't been resilient for me in the face of janky networks. Split brains and data loss can result.

In the past I've scaled my rabbits for throughput by implementing my own routing/sharding layer.

If you're tempted to use the message persistence and you care about retaining messages, kafka is a bigger but much more capable hammer.

senderista · on May 21, 2020

If you’re trying to “rationalize” a decision, that’s already a red flag. Also, Kafka and RabbitMQ are intended for different use cases. One is (the log component of) a streaming data processing system, the other is a message queue. Figure out which kind of system you need before deciding on a particular system. BTW, if you need to really scale, Apache Pulsar is designed to handle both scenarios.

EdwardDiego · on May 21, 2020

Look into Pulsar, it can function as a message queue or pub/sub like Kafka.

By default it only retains non-acked messages, multiple subscription modes, can use non-persistent messaging, dead letter queue, scheduled delivery, can use Pulsar Functions to implement custom routing etc.

Scales like Kafka (probably better) and has cluster replication built in.

evdev · on May 21, 2020

Rabbit MQ is a traditional message broker; you use it when you have lots of messages you don't particularly want/need to be stored persistently, and where you want/need to take advantage of the routing feature--that you put keyed messages into some topic/exchange and then subscribe to only part of the messages any given application is interested in.

Kafka creates the abstraction of a persistently stored, offset-indexed log of events. You read all events in a topic. Kafka can be used to distribute messages in the way AMQP is used, but is more likely to be the centerpiece of an architecture for your entire system where system state is pushed forward/transformed by deterministically processing the event logs.

spamizbad · on May 21, 2020

If your main concern is scalability: Each queue in rabbit gets its own thread. So if you can spread your workload across multiple different queues you can scale without too many problems.

nojito · on May 21, 2020

Both RabbitMQ and Kafka are extremely simple to stress test with simulated data which will let you make a decision that you will be comfortable with.

pc86 · on May 21, 2020

Are you replacing an existing system that's already at scale?

halfmatthalfcat · on May 21, 2020

No, greenfield

pc86 · on May 21, 2020

Then the odds of you hitting the scale where RabbitMQ v. Kafka is relevant are a million to one. There is a lot of overhead with Kafka compared to RabbitMQ.

Unless you already have Kafka infrastructure, setting up Kafka for a brand new project is crazy unless your only goal is learning how to set up Kafka.

maxdo · on May 21, 2020

you either look at pulsar with rabbit

mc_ · on May 21, 2020

We've used RabbitMQ since 2010 in KAZOO. I would argue, save one or two instances in the intervening 10 years, that RabbitMQ is the most stable piece of the infrastructure. I think it might be the only open-source project we build on that we haven't committed upstream to because we haven't encountered any issues in our usage.

harel · on May 21, 2020

RabbitMQ is one of those pieces of software I usually forget are there. I can't remember having to deal with any rabbit issue in last few years.

rhodin · on May 21, 2020

This might be the Achilles heel of RabbitMQ. It works so well that people forget it for years, and then they have forgotten how to upgrade it, etc. :)

rawoke083600 · on May 21, 2020

Lol this ! WRITE down that rabbitmq-web-admin passwd. After the setup and first few weeks of checking the speed of your queues you will forget about it and try to login in 1 year later :)

eqmvii · on May 21, 2020

We started using RabbitMQ for several projects last year, and it's been a joy.

Some of that joy is surely just moving from older, creakier solutions. But it hasn't let us down, and everyone is eager to use it for new features or refactoring legacy code.

bvm · on May 21, 2020

Using this opportunity to shout out to Rascal (https://github.com/guidesmiths/rascal) which makes using RabbitMQ on Node an absolute joy.

pc86 · on May 21, 2020

Same with MassTransit[0] and .NET. We have several distributed .NET Core services running in our data center, services running on employee PCs, etc all communicating via RMQ with MassTransit and it's great. The primary maintainer is very active (streams every Thursday evening) and the documentation has gone from "pretty bad" to really good in the last few months.

[0] https://masstransit-project.com/

jonathanoliver · on May 22, 2020

MassTransit is awesome! I love what Chris Patterson (the author) did. It essentially allows you to swap out RabbitMQ for SQS or Azure Service Bus or a few others. Pretty cool stuff if you're in .NET land.

rexarex · on May 22, 2020

I have never had a good experience with RabbitMQ wherever I have worked. Often it was buggy and unreliable. It’s almost always been some thing shoehorned into a service, but failed to gain widespread adoption with future services. Furthermore, it’s usually some hot potato no one even wants to deal with. We have written some code around it to make it more reliable. You quickly figure out why there seems to be so many half baked implementations of it wherever you go work.

It’s basically caught between being too bloated and complex for use with smaller systems (as some commenters have poked at people for not being the ‘right’ kind of person to be running it)

While at the same time, it’s not robust and reliable enough to use in prime time.

What’s left is this enticing and sexy sounding message broker called RabbitMQ that actually just sort of sucks.

In my experience someone gets stoked on trying this out but once everything is all implemented it disappoints and the system or service it is apart of is a one off after future services use something more mature the next time around.

For scale I have used NSQ to handle millions of message a second and then for smaller scale AWS services like SQS can handle things much more reliably.

jugg1es · on May 21, 2020

I love RabbitMQ but deploying/managing a cluster can be tricky. We had problems with network partitioning and since we didn't really need a cluster for performance reasons - only availability - we switched to a single node.

airfreak · on May 21, 2020

Try the new quorum queues, they don't have those issues.

nicodjimenez · on May 22, 2020

It's working well for us but we occasionally get blips where for very short periods of time messages get "stuck" in between application code events on different servers and we cannot figure out why. It's very rare. Maybe a burst of 5 messages every 10 million messages.

Any ideas on how to even debug this type of thing? Help! We think it might be a tcp connection failure but we have no idea.

sethammons · on May 22, 2020

Tcpdump and wireshark?

pfarrell · on May 21, 2020

One big thing I’ve appreciated about RabbitMQ is how well it separates publishing, message routing, and subscription concerns. Plus it’s never been the issue in any infrastructure I’ve encountered it.

niffydroid · on May 21, 2020

We use AWS SQS and Rabbit. At our scale, SQS is easy peasy and we can wrapper it to make http calls instead of using SQS, as we're using AWS Beanstalk workers. SQS is generally quicker to get up and running with and we can have metrics out the box. With rabbit we use it for some other stuff and it works just fine, it's when things go into a black hole we struggle, but that's our lack of knowledge.

Depending on your scale, we find SQS is cheaper than a managed rabbit service. Although I'd be interested in using kafka!

spicyramen · on May 22, 2020

Use RabbitMQ for a call center handling thousands of calls per second. It worked fine integrated with Flower, Celery and Python...but once we went production, became a black box which every setting was hard to find documentation or support, we ended up having to build huge Machines with tons of memory and CPU and still saw messages lost no explanation. Ended up moving to PubSub and rebuild the whole app

vorpalhex · on May 22, 2020

Rabbit is not a "turn it on and hope it works" kind of solution and if it's a blackbox to you then you shouldn't use it. AMQP is a relatively fancy protocol and Rabbit is endlessly configurable which is both a pro and a con. You will need to develop expertise in Rabbit to use it well at scale.

fake-name · on May 22, 2020

I've been using rabbitmq heavily for a fairly large hobby project (20-100 messages/sec) for a few years now. I'm generally happy with it, but there are a number of caveats I've learnt.

1. If you have large messages and use keepalives (and you'll need keepalives), you need to write your own message fragmentation.

2. There are no python libs that just work. I'm currently using a vendored version of amqpstorm with a bunch of hacks to handle wedged connections. I have some AMQP connections that are intercontinental, and I've been able to wedge literally every other AMQP library.

3. If you have a single open connection, it will get stuck from time-to-time. With a bunch of both in-band and out-of-band keepalives, I've got it to the point where I don't have things permanently block, but you should expect things getting stuff for ~2x your heartbeat time periodically. This doesn't seem to result in message loss. I've dealt with this by just running LOTS of concurrent connections, and aggregating them client side. This has worked fine.

4. In general, exactly-once delivery isn't a thing. You should design either for at-most-once, or at-least-once delivery modes exclusively. Idempotency is your friend.

5. The tooling /around/ the rabbitmq server is a dumpsterfire.

Basically, I feel like the core server is super durable (note: I'm not running a cluster, so this doesn't generalize to multi-instance cases), but the management stuff is god-awful. The main management CLI tool actually calls the HTTP interface, which is kind of ridiculous. I've occationally run into a situation where I wound up with leaking temporary exchanges, and just flushing bogus exchanges is super annoying.

I don't think there's any other options that can do what rabbitmq does for my use-case, but it's had quite the learning curve.

zbentley · on May 22, 2020

> If you have large messages and use keepalives (and you'll need keepalives), you need to write your own message fragmentation.

I'm confused by what you mean by that. Do you mean "large" as in "take a long time to process in the consumer"? If so, and if your consumer is not issuing heartbeats concurrently with message processing, then that is true.

> There are no python libs that just work.

Completely agree. Having hacked on and patched the code inside Celery, it's really quite a bummer. I think this is because the Python libs try to abstract over things that ... just straight up can't be abstracted away given the semantics of AMQP: specifically connection-drop-detection, "resumption" of a consume (not really possible; this isn't Kafka), and the specific error code classes (connection-closed vs channel-closed vs information).

> If you have a single open connection, it will get stuck from time-to-time.

Are you talking about publishing connections? Consuming connections? One used for both? What does "stuck" mean? I'd be interested in hearing more about this.

> exactly-once delivery isn't a thing

Kinda pedantic, but exactly once delivery is possible in some very restricted situations (see Kafka's implementation of this guarantee: https://www.confluent.io/blog/exactly-once-semantics-are-pos...). Exactly once processing is what's tough-née-impossible. So yeah, idempotence is great.

fake-name · on May 23, 2020

> I'm confused by what you mean by that.

By large, I mean 10+ MByte.

> Completely agree. Having hacked on and patched the code inside Celery, it's really quite a bummer.

I don't understand what the point of celery is. Literally everything I do requires /some/ persistent state in the workers, and there's no way to do that with celery.

> Are you talking about publishing connections? Consuming connections? One used for both? What does "stuck" mean? I'd be interested in hearing more about this.

TCP connections. As in, a connection to the server from a consumer. High latency connections seem to exacerbate the issue.

I think the issue is the state machines server-side and client-side get out of sync, and things just stop until the keep-alives/heartbeat cause the connection to reset, but that's a bunch of time to wait with no messages.

I also ran into the issue that basically every python library had at least one or two locations where `read()` was called without a timeout, but that was at least easier to fix.

> Kinda pedantic, but exactly once delivery is possible in some very restricted situations (see Kafka's implementation of this guarantee: https://www.confluent.io/blog/exactly-once-semantics-are-pos...). Exactly once processing is what's tough-née-impossible. So yeah, idempotence is great.

Well, it isn't really a thing, so you at least shouldn't depend on it being a thing for your architecture if possible.

zbentley · on May 30, 2020

> By large, I mean 10+ MByte.

OK. Did Rabbit or your client libraries bug out when sending single giant messages? What does message fragmentation (by which I assume you mean splitting one logical message up over multiple AMQP messages? Or something else?) have to do with keepalives (and what do you mean by keepalives? Connection heartbeats? TCP keepalives?)?

> Literally everything I do requires /some/ persistent state in the workers, and there's no way to do that with celery.

Sure there is. In-memory caches persist between requests. And there's always sqlite and friends. Celery's more intended for the "RPC/fire-and-forget" case than stateful workloads, but it's not too painful to use those with it. And you get the benefits of its (reasonably) hardened connection/heartbeat management, which may help with some of your other issues.

Basically every time I've seen code that rolled its own bespoke consumer loop for RabbitMQ, it was wrong in some fundamental ways; the state machine on the consumer side did indeed get out of whack, and badly. Best to outsource the "keep the connection alive, establish subscription, detect failures" work to a higher-level library (like Celery) that provides a long-lived consumer so your code can just be occupied with data processing.

carterklein13 · on May 21, 2020

Would anyone be able to explain the benefits of RabbitMQ over NATS? As far as I've seen, it's really just that RabbitMQ is more feature-rich, which I personally feel like isn't that crucial, as frankly many systems are not going to take advantage of those more complex functionalities anyway.

jonathanoliver · on May 22, 2020

Durability. If you need to push messages that don't get lost, RabbitMQ is a pretty solid choice. In years past the clustering situation wasn't great and there was some potential for message lost and that seems to be resolved now with quorum queues, but the biggest different between NATS and RMQ is the durability guarantees and the at-least-once delivery guarantees that RMQ has. NATS is more like ZeroMQ in that it expects the subscribers to be online. There has been some work by others using that NATS protocol to create a Kafka-like system (written in Go, I believe) called LiftBridge. So if you like NATS and it's working for you and you want durability, take a look at LiftBridge.

shaklee3 · on May 22, 2020

This isn't true anymore. Nats streaming has persistence, so the OP's question still remains

jonathanoliver · on May 22, 2020

My understanding is that NATS (a protocol) and NATS streaming were related but separate:

https://github.com/nats-io/nats-site/issues/217

(The issue is from 2017 but illustrates a distinction)

shaklee3 · on May 22, 2020

That's right, but I think at least since both are listed on their website as different ways to run it that it should at least be considered a native feature at this point.

carterklein13 · on May 22, 2020

All very interesting - this is great!

DubiousPusher · on May 21, 2020

Rabbit saved my life. I had a project that involved getting the AMQP Proton library working on the Xbox. Rabbit was so easy to setup and use, it gave me a reliable way to test my work. Getting into AMQP at the time was confusing and poorly documented. Rabbit did imdeed "just work".

posharma · on May 21, 2020

How does RabbitMQ compare with Kafka?

Raidion · on May 21, 2020

I'm more familiar with SQS than RabbitMQ, but have used both, and have chosen between queue and stream based solutions.

Kafka is a stream, and can be replayed (if you have it set up to store stuff). Rabbit is simply a queue, and when the messages are gone, they're gone.

This means that queues are a lot smaller, but can only serve one set of consumers at at time. If you want to have multiple things listening to messages, you have to use fan-out patterns that place messages on multiple queues. Queues can also suffer from less than atomic delivery, especially if the system is distributed. This means you have to jump through some hoops and add an atomic layer somewhere if you want to ensure you're not double processing anything.

Kafka can have infinite retention (if you got the storage/$), and you don't need to have multiple streams to service multiple consumers. Each consumer stores where they are in the stream, and can traverse as needed. You'll need to be careful to make sure that a single consumer is handling a single partition to promise that you'll only process a message once.

Managing streams can be a headache, but less so now if you have money to have Amazon or Confluent manage it for you. They offer pretty much unlimited scalability, and are the production grade solution for a ton of problems.

Queues are really simple to understand and build and still scale pretty dang well. Just make sure your message processing is idempotent and make sure you can handle if something is processed multiple times.

halfmatthalfcat · on May 21, 2020

I've been interested in this question as well. There's a lot of sources online comparing the two but none really definitive.

eternalban · on May 21, 2020

RabbitMQ is not suitable for event sourcing. Kafka is. In general, RabbitMQ is a “river” and Kafka is a “lake”.

RabbitMQ has excellent support for complex message flow topologies. Kafka out of the box does not provide these features.

lukebakken · on May 22, 2020

I highly recommend all of Jack's blog posts about RabbitMQ - https://jack-vanlightly.com/blog/tag/RabbitMQ

Jack works with me on the RabbitMQ core engineering team. We've been hard at work to address a lot of the issues brought up in comments here. It's worth it to try out our latest releases. The engineering team is very active with the community and takes all constructive, helpful (i.e. reproducible) feedback seriously. Feedback is encouraged via the rabbitmq-users mailing list. Thanks.

major505 · on May 22, 2020

I really like RAbbitMQ. But I really dislike that database that it rellies into, Mnesia. I had a client that because of licence issues could only do one operation per time in the ERP software. So I used RabbitMQ to line the requests, and do one at a time. Worked great ,was fast and low in resources. But the place power supply was a problem, and more than once the place had a blackout and when returning menesia messed up and lost the queues. So I ended up just making my own simple queue using sqlite in the server.

shishy · on May 21, 2020

Debated using RabbitMQ but decided the infrastructure overhead was too high.

Ended up looking into `rq` and `arq` which were both excellent!

https://python-rq.org/

https://arq-docs.helpmanual.io/

Would recommend if you're looking for a (faster) worker queue without all the overhead (in my case, didn't need all the other features that came w/ RabbitMQ so this got the job done).

dirtydroog · on May 21, 2020

We use ZeroMQ a bit. It's been pretty much flawless as far as I can see but I get the impression that it's becoming obsolete. Is RabbitMQ a viable replacement?

atombender · on May 21, 2020

The "MQ" in "ZeroMQ" is misleading, so this is an apples-to-oranges comparison. ZeroMQ is a socket abstraction that allows you to build apps that send messages to each other. RabbitMQ is a reliable message queue broker; a central server that stores messages and that clients connect to in order to push/pop them.