RQ – Simple Job Queues for Python

sorentwo · on Jan 2, 2020

Redis is brilliant for simple job queues but it doesn’t have the structures for more advanced features. Things like scheduled jobs can be done through sorted sets and persistent jobs are possible by shifting jobs into backup queues, but it is all a bit fragile. Streams, available in 5+ can handle a lot more use cases fluently, but you still can’t get scheduled jobs in the same queue.

After replicating most of Sidekiq’s pro and enterprise behavior using older data structures I attempted to migrate to streams. What I discovered is that all the features I really wanted were available in SQL (specifically PostgreSQL). I’m not the first person to discover this, but it was such a refreshing change.

That led me to develop a Postgres based job professor in Elixir: https://github.com/sorentwo/oban

All the goodies only possible by gluing Redis structures together through lua scripts were much more straightforward in an RDBMS. Who knows, maybe the recent port of disque to a plug-in will change things.

thejosh · on Jan 3, 2020

Oban looks fantastic! PG is fantastic for a small-medium job queue IMHO. pg_notify with elixirs postgrex is fantastic.

We're currently using ecto_job which works really well for us, so we have no reason to switch. Plus we like it's in different tables.

sorentwo · on Jan 3, 2020

It seems like PG has been pigeonholed as only suitable for “small-medium” size queues, but without numbers to define what “small-medium” means. A few million jobs an hour is entirely reasonable for PG based on anectdata (and I’ve load tested up to 54 million jobs an hour).

PG is an amazing tool that can handle more than most people think (or at least more than I thought).

heavenlyblue · on Jan 3, 2020

What about VACUUM? Did they add something to PostgreSQL to lower space consumption?

sorentwo · on Jan 3, 2020

There has been some progress on that front. For index size there is a new rebuild command in PG12 which works concurrently. There is also pluggable storage now, which enables much more efficient row deletion. I can’t find a link to the new storage format, but here is the announcement for 12 https://www.postgresql.org/about/news/1943/

lytedev · on Jan 3, 2020

Just wanted to say that Oban is really fantastic. Been using it in production for a couple weeks to handle recurring scheduled jobs and it was a real joy to work with. Thank you!

wojcikstefan · on Jan 3, 2020

> Things like scheduled jobs can be done through sorted sets and persistent jobs are possible by shifting jobs into backup queues, but it is all a bit fragile.

What exactly is fragile about this approach? We've supported scheduled tasks in TaskTiger (https://github.com/closeio/tasktiger) via Redis's sorted sets and haven't had any issues.

sorentwo · on Jan 3, 2020

> What exactly is fragile about this approach? We've supported scheduled tasks in TaskTiger (https://github.com/closeio/tasktiger) via Redis's sorted sets and haven't had any issues.

Using sorted sets for scheduled tasks isn't the fragile part, gluing it all together to prevent losing jobs by shifting into backup lists (or hashes) is the fragile part in my experience.

notyourday · on Jan 3, 2020

eval + multi-exec solves that as the issue

sorentwo · on Jan 3, 2020

Sure, you can glue things together with lua. Here is a script that handles descheduling from a sorted set: https://github.com/sorentwo/kiq/blob/master/priv/scripts/des...

That definitely does the job. My point is that it is much more complex than a select/update clause in SQL.

wojcikstefan · on Jan 3, 2020

Yep, TaskTiger also uses Lua quite extensively ([0]) to ensure atomicity when moving tasks between various stages of processing (queued, active, scheduled, errored).

[0] https://github.com/closeio/tasktiger/blob/master/tasktiger/r...

devy · on Jan 3, 2020

Oban seems to be Elixir-specifc, is there a Python port?

sorentwo · on Jan 3, 2020

No, unfortunately not. It does use jsonb for job arguments to make interop easy. That’s entirely on the enqueuing side though and not processing.

devy · on Jan 3, 2020

Someone mentioned about dramatiq-pg (https://gitlab.com/dalibo/dramatiq-pg) which also uses PG as the task queue backend and jsonb for message payload & result. How similar is that comparing to Oban?

nikisweeting · on Jan 3, 2020

I don't know about the technical differences, but having used dramatiq in production for 3 years now I can say it's 100% awesome, it's been rock solid for us handling many thousands of tasks daily.

wakatime · on Jan 4, 2020

I love PG but would never use it for message queueing. Postgres needs periodic table maintenance, autovacuum tuning for tables with high number of writes, and IMO can never be as good as RabbitMQ for distributed task queues. RabbitMQ is purpose built around messages and it's extremely reliable and performant.

sandGorgon · on Jan 3, 2020

this is exactly my complaint with RQ as well. They are not leveraging the built-in Redis features, even though they are entirely based on Redis only.

akashakya · on Jan 3, 2020

> Streams, available in 5+ can handle a lot more use cases fluently

Stream does solve the backup queue issue, but not being able to filter messages from the consumer's pending list makes implementing retry mechanism and exponential backoff hard (without reaching to sorted set).

Though, I can understand the Redis team's decision to keep it simple.

jokull · on Jan 3, 2020

Could you run this as an instance with a HTTP API to perform actions? Then I could use this with Python.

andrewstuart · on Jan 3, 2020

Reposting from a while back in case it solved a problem for someone.

I use Postgres SKIP LOCKED as a queue. Postgres gives me everything I want. I can also do priority queueing and sorting.

All the other queueing mechanisms I investigated were dramatically more complex and heavyweight than Postgres SKIP LOCKED.

Here is a complete implementation - nothing needed but Postgres, Python and psycopg2 driver:

    import psycopg2
    import psycopg2.extras
    import random

    db_params = {
        'database': 'jobs',
        'user': 'jobsuser',
        'password': 'superSecret',
        'host': '127.0.0.1',
        'port': '5432',
    }
    
    conn = psycopg2.connect(**db_params)
    cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
    
    def do_some_work(job_data):
        if random.choice([True, False]):
            print('do_some_work FAILED')
            raise Exception
        else:
            print('do_some_work SUCCESS')
    
    def process_job():
    
        sql = """DELETE FROM message_queue 
    WHERE id = (
      SELECT id
      FROM message_queue
      WHERE status = 'new'
      ORDER BY created ASC 
      FOR UPDATE SKIP LOCKED
      LIMIT 1
    )
    RETURNING *;
    """
        cur.execute(sql)
        queue_item = cur.fetchone()
        print('message_queue says to process job id: ', queue_item['target_id'])
        sql = """SELECT * FROM jobs WHERE id =%s AND status='new_waiting' AND attempts <= 3 FOR UPDATE;"""
        cur.execute(sql, (queue_item['target_id'],))
        job_data = cur.fetchone()
        if job_data:
            try:
                do_some_work(job_data)
                sql = """UPDATE jobs SET status = 'complete' WHERE id =%s;"""
                cur.execute(sql, (queue_item['target_id'],))
            except Exception as e:
                sql = """UPDATE jobs SET status = 'failed', attempts = attempts + 1 WHERE id =%s;"""
                # if we want the job to run again, insert a new item to the message queue with this job id
                cur.execute(sql, (queue_item['target_id'],))
        else:
            print('no job found, did not get job id: ', queue_item['target_id'])
        conn.commit()
    
    process_job()
    cur.close()
    conn.close()

wojcikstefan · on Jan 3, 2020

Very relevant (and highly recommended) read: https://www.2ndquadrant.com/en/blog/what-is-select-skip-lock...

swuecho · on Jan 3, 2020

Thanks. Could you also post the sql ddl?

pacala · on Jan 3, 2020

What happens if the process dies while processing a job? Is said job going to remain locked and unserviced until the end of times?

xyzzy_plugh · on Jan 3, 2020

Once the client session holding the lock disconnects, closing the session, the lock is released. You can also set client options like idle_in_transaction_session_timeout to force transactions to close after a certain amount of idle time, which also releases the lock.

nurettin · on Jan 3, 2020

so the process that fails releases the lock marking the job undone which causes another process to lock it?

xyzzy_plugh · on Jan 3, 2020

It means another process could grab it, yes. In the parent post's model, the "scheduler" operates in a "at least once" model", as unless the transaction is committed, a premature failure would result in the message will be picked up again by a scheduler, which could result in some duplicate work.

The parent post's model also tracks "attempts", but this only captures known or acknowledged failures -- unacknowledged failures (i.e. crashes) will not be recorded, so a task which explodes the process would be re-run ad infinitum.

An alternative method could be to record an attempt in a separate transaction, so that the next scheduler execution can detect that the job/message was serviced before, even if the message itself appears fresh.

andybak · on Jan 2, 2020

Side niggle - I used to notice a lot of Django projects would use complex job queues for absurdly low workloads. Beginners would get recommendations to use RabbitMQ and Redis for sites that probably were only going to see a few hundred concurrent users at most.

Seriously - don't add complex dependencies to your stack unless you need them. The database makes a great task queue and the filesystem makes a great cache. You really might not need anything more.

stavros · on Jan 3, 2020

The problem with that is that, even though PostgreSQL has some great functionality for pub/sub and queues, most Django queue systems don't use them by default. Most of the ones I've seen resort to polling (which might be fine, depending on the use case), which is suboptimal.

Otherwise, I agree, I would love to be able to use Postgres for queues, and I think there is a Dramatiq Postgres backend that will do queueing properly (EDIT: Yes, Dramatiq-PG).

sorentwo · on Jan 3, 2020

Postgres PubSub is great for triggering job dispatch, but it isn’t good enough on its own. There are a few caveats that make polling an additional requirement:

1. Message compaction — notifications are deduplicated, so if two jobs are inserted in the same queue the system may only get one notification. 2. Message saturation — with high levels of activity you’ll need to start discarding messages, essentially denouncing, otherwise the database gets throttled. 3. Dedicated connections — pubsub requires a dedicated connection for listening, which requires a single connection with custom dispatching or one connection per queue.

Relying on PG for everything is awesome regardless!

stavros · on Jan 3, 2020

Is this a problem for low-traffic scenarios? My thinking is basically that I should save the extra dependency until I have more than a few tasks per second, at which point I will switch to something like RabbitMQ.

sorentwo · on Jan 3, 2020

I don’t think it is much of a problem until you are pushing 2-3k jobs a second. Only the message saturation is a bottleneck anyhow, everything else I mentioned is handled by how a library is architected.

stavros · on Jan 3, 2020

That makes sense, thanks. I generally switch long before I get anywhere near to where the database is the bottleneck, I use Postgres as a broker only for low-traffic sideprojects, where it's much more worthwhile to not bother with an extra broker.

soperj · on Jan 3, 2020

Yeah, that was my issue. Seemed like a ridiculous amount of stuff just to process a few jobs once a day in the background. Eventually I found django-background-tasks and have bene using that ever since. It uses whatever database you're already using, so very quick and easy.

wojcikstefan · on Jan 3, 2020

At a glance, https://github.com/lilspikey/django-background-task indeed seems like a good and easy way to get started. You can later switch to something more specialized once you hit the right scale or when you need more sophisticated features.

slig · on Jan 3, 2020

Just FYI, this fork is updated https://github.com/arteria/django-background-tasks

andybak · on Jan 3, 2020

Yep. That's the solution I've been recommending over Celery for most people.

Half the time people seem to reach for Celery etc all they want to do is run a long task without blocking the response. That's a helleva lot of code to add to your project to just have that functionality.

roywiggins · on Jan 3, 2020

When I wanted to add processing for background jobs to a Django project, the most obvious solution was Celery, and the most obvious job queue for Celery was Redis.

I looked for a solution that would use Postgres+Celery but there wasn't anything obvious. There is a way to just use the filesystem for Celery, but that doesn't seem ideal.

So that's how I ended up with a Redis dependency.

andybak · on Jan 3, 2020

Exactly my complaint - it's become the "obvious" way to do it than involved several 10s of thousands of lines of code (Celery+RabbitMQ+Redis) plus binary dependencies and associated background daemons.

You can roll your own background task in a few dozen lines of pure Python.

blondin · on Jan 3, 2020

for a long time there were no easy way to do background processing in django. with the asyncio paradigm spreading throughout the community and django starting to add support for it, that will quickly change.

andybak · on Jan 6, 2020

> for a long time there were no easy way to do background processing in django.

There was no easy way to do complex background tasks - but Django is just Python. Spawning a new process is easy as is scheduling via cron or similar. I've used both to implement simple background tasks.

elamje · on Jan 3, 2020

I currently use RQ. Here is the logic that lead me to choosing it, then wishing I had just used Postgres.

I need a queue to handle long running jobs. I looked around the Python ecosystem (bc Flask app) and found RQ. So now I add code to run RQ. Then I add Redis to act as the Queue. Then I realized I needed to track jobs in the queue, so I put them into the DB to track their state. Now I effectively have a circular dependency between my app, Redis, and my Postgres DB. If Redis goes down, I’m not really sure what happens. If the DB goes down I’m not really sure what’s going on in Redis. This added undue complexity to my small app. Since I’m trying to keep it simple, I recently found that you can use Postgres as a pub/sub queue which would have completely solved my needs while making the app much easier to reason about. Using Postgres will have plenty of room to grow and buy you time to figure out a more durable solution.

elcomet · on Jan 3, 2020

Can't you track state in a "stateless" way ? For example having the job write it's progress in a file, so that if your app goes down, you can just re-read the file and know the state of your job?

rcfox · on Jan 3, 2020

Relying on the filesystem has its own drawbacks. If you scale horizontally, the local filesystem might not have the file you created because it's running on a different server. Mounting storage across multiple servers will introduce bottlenecks, and now you have to worry about the storage going down. If you're running within a Docker container, you need to make sure the file doesn't live inside the container or it won't be persisted.

samsbox · on Jan 3, 2020

May I ask, what (python) solution you ended up going with?

elamje · on Jan 3, 2020

I haven’t implemented the Postgres solution yet. It’s on my todo, but I will likely implement it myself, i.e. without a package.

kureikain · on Jan 3, 2020

I got burned by this with RQ.

Say you have an object, the object load some config from a settings module which in turn fetch from env. No matter how many times I restarted RQ, the config won't changed. Due to the code changes, the old config cause the job to crash and keep retrying.

Until I got frustrated and get into Redis, pop the job, and yet all the setting was in there. In other words, RQ serialize the whole object together with all properties.

RQ isn't that good IMHO. You will have to add monitoring, health check, node ping, scheduler, retrying, middleware like Celery eventually it grow into a home grown job queue system that make it harder to on board new devs.

Just use Celery. Celery isn't that bloated. It has many UI to support it and backend and very flexible. Celery beat schedule is great as well.

_verandaguy · on Jan 3, 2020

I used this in a project at a previous job -- I have to say, while the API is simple and useful enough for small projects, it raises some issues with how it's designed.

Instead of relying on trusted exposed endpoints and just invoking them by URL, it does a bytecode dump of task functions and stores those in Redis before restoring them from bytecode at execution time.

This has a few drawbacks:

- Payloads in the queue are potentially a fair bit larger for complex jobs - Serialization for stuff that has decorators (and especially stateful one, like `lru_cache`) is not really possible, even with `dill` instead of `pickle` - It's not trivial, but this exposes a different set of security risks compared to the alternative

I don't want to say this is a bad piece of software, it's super easy to set up and way more lightweight than Celery for example, but it's not my tool of choice having worked with the alternatives.

deckar01 · on Jan 3, 2020

I had to setup a task queue in an old, resource constrained environment once. I ended up forking the client process into a worker when the queue wasn't already running. Any time we need an expensive method call to be throttled asynchronously we just have to add a decorator to it.

https://github.com/deckar01/low_queue

IgorPartola · on Jan 3, 2020

Wait is that how RQ works?! How does it handle code updates? What happens if I update the task code while there are tasks still enqueued?

_verandaguy · on Jan 3, 2020

Can't speak to the code updates since I never went that in-depth into the code.

The pickle logic lives here: https://github.com/rq/rq/blob/80c82f731f57186f8c97239b38100f... so if you've got a minute you can skim that and hopefully get an answer to your question.

yamrzou · on Jan 3, 2020

Which alternatives worked for you?

wakatime · on Jan 3, 2020

Can't speak for op, but for us the ones based on RabbitMQ have performed well:

https://github.com/Bogdanp/dramatiq

Stay away from Celery if you can, or stick to v3.2 because 4.x has a ton of bugs.

scaryclam · on Jan 3, 2020

Celery has always been a massive resource drain, and for pretty much zero gain. I'd suggest just saying "stay away from Celery" as it's easy to get going with, but really hard to scale or work with when you need what it promises. There are better options.

_verandaguy · on Jan 3, 2020

Agree that you should stay away from Celery in most cases, but it does integrate well with Django, and pre-4.x is stable enough.

It's just a fairly higher amount of boilerplate and setup compared to most of the alternatives either way.

_verandaguy · on Jan 3, 2020

So far I've mostly used Celery in Python, but at jobs past there was also Sidekiq (granted, the use cases for that were a bit different, but it's kind of in the same bucket).

Haven't had the opportunity to work with them, but I've heard good things about SQS and other cloud queues, but I think those are fundamentally message queues, so you'd be responsible for writing a job queue layer over it, which can be a huge hassle.

harikb · on Jan 2, 2020

May I ask anyone posting stats on "millions per day" please indicate how many nodes/cpus the entire system uses. For example "8.6 million per day" is only "100 per second". If that takes 100 cpus.... folks are underestimating what a single node modern cpu/network card is capable.

wakatime · on Jan 4, 2020

Task throughput isn't about the queue library. Your queue library should be fast enough, after that the bottleneck is the workload your tasks are doing. That's why you need prioritized queues so your slow long-running tasks run low priority and don't block the high priority tasks that should be fast and usually affect UX. We have 9 worker servers, each server has 6 CPUs, we process over 1M tasks per day.

jholloway7 · on Jan 2, 2020

Not sure if lower-level API of RQ supports this, but I tend to prefer message-oriented jobs that don't couple the web app to the task handler like the example.

I don't want to import "count_words_at_url" just to make it a "job" because that couples my web app runtime to whatever the job module needs to import even though the web app runtime doesn't care how the job is handled.

I want to send a message "count-words" with the URL in the body of the message and let a worker pick that up off the queue and handle it however it decides without the web app needing any knowledge of the implementation. The web app and worker app can have completely different runtime environments that evolve/scale independently.

mattbillenstein · on Jan 2, 2020

Agreed - which is why I don't put my business logic in the webapp, or use models coupled to the web framework.

The web framework is a way to handle http, rest, or graphql - deserializing and serializing those protocols, not a way to handle my business logic.

Decoupling these things lets you write one-off scripts or have task queues that don't need to load the context of a large web framework - they can just be simple python.

mixmastamyk · on Jan 2, 2020

Docs say you can pass a string instead of a function.

nerdbaggy · on Jan 2, 2020

I am a big fan of RQ along with Huey (https://github.com/coleifer/huey)

rsrx · on Jan 3, 2020

I also liked Huey because of it's simplicity, and tried using it in a commercial project, but it couldn't execute periodic tasks in less than a minute intervals which puzzled me a bit. I had need to execute task every 10 seconds to reindex database changes to ElasticSearch.

After opening GitHub issue about it and asking if there were plans to implement sub-minute intervals, library's author (coleifer on github) had dismissive/arrogant attitude, and his reply was something along the lines of "too bad you cannot fork it and do it by yourself" and deleted my thread.

This threw me off from using this library, and I went back to Celery.

_pgmf · on Jan 3, 2020

Couple points:

* how did you settle on 10 seconds? Keeping two databases in sync is a complex process. I'd suggest that the difference between running 1x/min or 6x/min is negligible -- and if it's not then probably you need something more sophisticated than a simple cronjob.

* I am providing free software. You are not contracting with me to provide developer support, so in my book I'm under no obligation to be courteous and polite all the time. I try to be most of the time, but nobody is perfect. Luckily the source is available if you don't want to talk to me. That was my point and I'm surprised that is so triggering to some people.

* Closing an issue is not deleting your thread.

rsrx · on Jan 3, 2020

Thanks for giving me advice on database syncing, but I think that you kinda missed the point.

It doesn't matter how I chose 10 seconds as a periodic task interval, problem was that to a simple question about feature support I got dismissive/borderline rude answer which made me lose confidence in Huey as a project and it's long-term maintainability. And it's not like I asked for something exotic, but a support for subminute periodic tasks which almost every other task queue has.

That issue doesn't seem to be anywhere in Github, so it's deleted: https://github.com/coleifer/huey/issues?utf8=%E2%9C%93&q=is%...

As you said, it's a free software and you can do whatever you want with it, but to me this is a huge red flag and I'm happy to not be part of it. ;)

_pgmf · on Jan 4, 2020

Its right here bro, in all it's undeleted glory:

https://github.com/coleifer/huey/issues/118

markandrewj · on Jan 3, 2020

Do you have a preference? I was asked recently about setting up Redis for use as a job processing queue, I thought RQ might be a good fit. Huey looks good too, but I haven't used it yet.

reassembled · on Jan 3, 2020

We are using Huey as part of our distributed automated test runner, as well as Peewee for SQL persistence of test run data. Coleifer is da man!

zaphirplane · on Jan 3, 2020

And has zero issues in github. Can the project be that responsive to bugs ?!

warvariuc · on Jan 4, 2020

It's simple. Most of the times Charles closes the issues after commenting on them w/o waiting for responses.

infecto · on Jan 2, 2020

Was about to say the same. I love Huey.

srj55 · on Jan 2, 2020

huey is solid. I use it in prod.

_pgmf · on Jan 3, 2020

As always I'm happy to see one of my projects getting repped. Huey was something I started out of frustration with celery. It's more of a celery replacement than an rq replacement, though it covers both. If you haven't heard of it -- it does tasks, cron (periodic tasks), delayed tasks, result-storage, etc., and supports multiple worker models (process/thread/greenlet).

Docs: https://huey.readthedocs.io/en/latest/

wakatime · on Jan 3, 2020

We tried this at WakaTime and it didn't scale well. We also tried https://github.com/closeio/tasktiger but the only ones that work for us are based on RabbitMQ:

Celery<4.0

https://github.com/Bogdanp/dramatiq

sdan · on Jan 3, 2020

Can you please elaborate? I just adopted RQ a few weeks ago and is working fine for me (a few days before launch) with a couple thousand queries (1-2 jobs per 5 seconds) but a ton of connections (both making and taking jobs). Any advice?

I'm thinking that after we go to prod, I'll move to celery and MQ overtime.

wakatime · on Jan 4, 2020

This was multiple years ago so I don't remember details. I think something around task reliability, performance, and I think architecture flaws that meant we couldn't prioritize tasks or some other missing features we needed. We process around 50-1000 jobs per second.

ssl232 · on Jan 2, 2020

I've not much to say other than that this is a great little library! I used RQ for a small project recently and found it to be pretty easy to use. As my project grew bigger I also found it contained extra features I didn't realise I'd need when I started.

mjhea0 · on Jan 2, 2020

Big fan of this library. I think Celery is overkill for most projects. RQ is nice for the majority projects and it can scale right along with your project.

Quick tutorial of Flask + RQ: https://testdriven.io/blog/asynchronous-tasks-with-flask-and...

remlov · on Jan 2, 2020

RQ scales surprisingly well and for certain types of projects is a nice lightweight alternative compared to more complicated job queues such as Celery.

ssl232 · on Jan 2, 2020

Yeah when I was looking at queues I was put off by Celery since somewhere in its docs or a tutorial high up in the Google results it was suggested to use both redis and RabbitMQ as some sort of brokers/results stores rather than just one of those to handle both. I'm sure there were good reasons for that, but I was looking for something simple and not to learn multiple new technologies so in terms of simplicity RQ (which uses redis for everything) is pretty hard to beat.

disiplus · on Jan 2, 2020

for a simple queue the broker is the only important part. i like celery and have projects in multiple languages using rabbitmq.

mperham · on Jan 3, 2020

Most job systems like Celery or Sidekiq are language-specific. If you are looking for background jobs for any language, check out Faktory.

More advanced features like queue throttling and complex job workflows are available.

https://github.com/contribsys/faktory/wiki

wryun · on Jan 2, 2020

https://dramatiq.io/motivation.html

wojcikstefan · on Jan 2, 2020

Shameless plug: Our team at Close has been inspired by RQ when we created TaskTiger – https://github.com/closeio/tasktiger. We’ve been running it in production for a few years now, processing ~4M tasks a day and it’s been a wonderful tool to work with. Curious to hear what y’all think!

hu3 · on Jan 3, 2020

Curious. What's the ROI on developing a task queue system for 50 req/sec?

Please don't take my question negatively. I acknowledge that not everything is about ROI.

For example I often let my team scratch their own itches by developing tools just for the sake of having fun and learning new toys. Perhaps this is the case with TaskTiger?

wojcikstefan · on Jan 3, 2020

It was definitely fun to work on it :) That said, I think we've also seen quite a big positive ROI as well.

It is true that we've spent quite some time working on TaskTiger where we could've used an off-the-shelf task queue system. However, since we've created it, it allowed us to develop new features a lot faster and spend a lot less time debugging issues.

We were using Celery before and it was a big pain with memory leaks, periodic tasks not working as expected, some tasks being lost for no apparent reason, and poor visibility into both aggregate statistics and individual tasks. We spent too much time debugging Celery's issues instead of working on our core business. Here's some more context: https://github.com/closeio/tasktiger/issues/141#issuecomment...

Advanced features like scheduled tasks, unique queues, and task locks have helped us move faster, too.

wakatime · on Jan 3, 2020

We tried Tasktiger but ran into zombie tasks and some small but annoying bugs while testing. Around that same time, we found Dramatiq and currently have Dramatiq + Celery<4.0 in use.

wojcikstefan · on Jan 3, 2020

Interesting, we haven't had issues with zombie tasks at all (and we DID have issues with them when using Celery). Did you manage to find out what was causing them?

> and some small but annoying bugs while testing

Do you recall any details?

Appreciate you trying TaskTiger, even if you've moved on since!

wakatime · on Jan 3, 2020

Here's the list of my issues: https://github.com/closeio/tasktiger/issues?utf8=%E2%9C%93&q...

It came down to I like the features of RabbitMQ:

* RabbitMQ scales messages without needing tons of RAM * I don't have to decide between not persisting messages to disk with Redis vs only using half the machine's RAM [1] * Queue and task visibility is better in RabbitMQ * Support for purging all tasks in a queue * Tasktiger had lower throughput than Celery and Dramatiq, maybe needs lazy-forking?

Things I did like about Tasktiger: * no feature bloat and it's possible to actually read it's source code[2].

[1]: To enable persisting to disk (Redis fork + save snapshotting) you must limit Redis to only using half the available RAM on a machine.

[2]: Celery is split up into multiple convoluted, bloated, difficult to read repos.

sdan · on Jan 3, 2020

Do you use Dramatiq or Celery? Do either fix zombie tasks/zombie workers?

wakatime · on Jan 3, 2020

Both, but unfortunately still mostly using Celery 3.2. Never had any problems with zombie tasks, since that's related to how you're storing tasks and queues and RabbitMQ is excellent at that. Zombie tasks are never a problem when using RabbitMQ.

Especially never had problems with zombie workers. That wouldn't be a problem with your choice of queue library, more an infra problem.

sdan · on Jan 3, 2020

Interesting. At the moment I'm noticing some zombie workers popping up here and there with RQ and was thinking of migrating soon to Celery or Dramatiq. I think I'll experiment a bit and decide between both (but Celery seems like the more popular option, so I'm a bit biased for going for that).

mixmastamyk · on Jan 2, 2020

What’s the difference?

iddan · on Jan 3, 2020

RQ is horrible. We’ve used in K Health and migrated to Celery as configuration, optimisation and monitoring where all really hard with RQ. It takes ten seconds to get started but days to get to production. Not a good trade off!

jeffdico · on Jan 3, 2020

Not sure RQ was the problem here. I have used on dev and productions and I didn't have any issues with it at all.

mattbillenstein · on Jan 2, 2020

I've used rq in prod at a couple places - nice little library.

I really like the design of beanstalkd and I used that at one place, but using rq + redis was one less thing to deploy and/or fail.

nijave · on Jan 3, 2020

Having used resque how production workloads I wouldn't want to use Redis as a job queue. It works fine for small workloads but doesn't have many resiliency capabilities and doesn't scale well (cluster mode drops support for some set operations so probably can't use that). Replication is async and it's largely single threaded so then you get this SPOF bottleneck in the middle of a distributed work system

antirez · on Jan 3, 2020

You may want to check the Disque module I just (a few weeks ago) published. Synchronous replication and strong guarantees of delivery under failures. Also best effort algorithms to minimize re-delivery of at-least-once messages.

adamcharnock · on Jan 3, 2020

Different to RQ – but related - I recently released Lightbus [1] which is aimed at providing simple inter-process messaging for Python. Like RQ it is also Redis backed (streams), but differs in that is a communication bus rather than a queue for background jobs.

[1]: https://lightbus.org

akx · on Jan 3, 2020

We were unsatisfied with Rq (I forget which parts, but I seem to recall there was lots of code that strictly wasn't needed) and wrote http://github.com/valohai/minique instead.

For less simple use cases, just use Celery.

sdan · on Jan 3, 2020

I’ve used RQ in production applications just last week. It’s pretty basic so there’s upsides and downsides, but so far it simply works! I May opt in for celery down the line but for the state of my project, rq helped me ship quick and iterate.

alexnewman · on Jan 3, 2020

We use https://github.com/josiahcarlson/rpqueue and I absolutely love it

orf · on Jan 3, 2020

Just use Celery. I know several teams who made the, in hindsight, very poor choice of using RQ. By the time you realize it’s a bad decision it’s very hard to get out of.

Unless your use case is absurdly simple, and will always + forever be absurdly simple, then celery will fit better. Else you find yourself adding more and more things on top of RQ until you have a shoddy version of Celery. You can also pick and choose any broker with Celery, which is fantastic for when you realize you need RabbitMQ,

nikisweeting · on Jan 3, 2020

Our experience has been different, moving from Celery to Dramatiq saved us a bunch of headaches and stability has been markedly improved.

tschellenbach · on Jan 3, 2020

Celery + RabbitMQ is a great option for Python projects

matisoffn · on Jan 2, 2020

Like Sidekiq but for Python. Neat.

ankut04 · on Jan 3, 2020

For python, celery seems good.