Celery 2.3 released

jws · on Aug 5, 2011

What is Celery?

Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.

The execution units, called tasks, are executed concurrently on a single or more worker servers using multiprocessing, Eventlet, or gevent. Tasks can execute asynchronously (in the background) or synchronously (wait until ready).

stavros · on Aug 5, 2011

Ask described it to me as "a library for discovering bugs in the multiprocessing library".

asksol · on Aug 5, 2011

Yes, but note that multiprocessing rocks. It's just that celery uses it so extensively :) (and now also eventlet/gevent if that suits better)

jnoller · on Aug 5, 2011

And someone here has commit privs to fix those bugs too :) (ask does)

stavros · on Aug 5, 2011

I agree with you on both counts. I've used multiprocessing to great effect, and, although it was fantastic for what it did, it was very buggy...

Nowadays, I'd just use Celery instead!

reinhardt · on Aug 5, 2011

Does Celery support task dependencies? Would be great to specify a DAG of tasks and have them run in parallel. Basically something like Oozie but not restricted to Hadoop.

dangoldin · on Aug 12, 2011

If I understand your question correctly, you should be able to do this using callbacks.

bryanh · on Aug 5, 2011

Maybe it is just me, but I can never bring myself to setting up Celery and RabbitMQ and all for some Django apps that could sorely use it. There seems to be so many moving parts, last I checked there were like 3-4 different services to tie everything together...

Anyone know of a simpler system or technique for achieving the same thing? Or perhaps, a better way to go about using Celery.

dtran · on Aug 5, 2011

IMHO, setting up RabbitMQ and Celery, especially with django-celery, ends up being much simpler and less painful than rolling any kind of custom queue/async processing. In terms of moving parts, for most setups, you can put Celery and RabbitMQ under supervisord and almost never have to worry about them.

If you don't need all the features of RabbitMQ or just don't want to deal with it, you can use any of the other broker backends: Redis (if you don't need persistence) or even with your default DB through the Django ORM.

Pewpewarrows · on Aug 5, 2011

Fyi, redis has an appendonly option to provide persistence:

http://redis.io/topics/persistence

asksol · on Aug 6, 2011

Yes, but I think what he was referring to was message acknowledgements, which Redis doesn't have. If a worker reserves a message, and is abruptly killed, then the message is lost.

funksta · on Aug 5, 2011

A "quick and dirty" solution is to use django-celery with the django-kombu (http://pypi.python.org/pypi/django-kombu/0.9.3) backend. This stores the tasks in your database, rather than having to run a separate broker like RabbitMQ. I've been using this for a few months and it's worked pretty well.

bretthoerner · on Aug 5, 2011

The only moving parts are the broker (RabbitMQ, Redis, etc), your existing app (already moving), and some workers --- which for a simple case like yours is "python manage.py celeryd".

It's worth it, in my opinion. Celery is one of the best-of-breed (even outside of Python) tools like SQLAlchemy.

stavros · on Aug 5, 2011

If you're already using redis, use that for your broker. I agree that configuration can appear overwhelming, but it's pretty straightforward. To say that we've had success with Celery would be a big understatement, it's fantastic.

tworats · on Aug 5, 2011

I felt absolutely the same way, but once we started using redis (for other reasons) I finally took the plunge.

We've been quite happy with it so far. It has a bit of a learning curve, but once you buy into the way it does things suddenly a lot of code you thought you had to write disappears. I was just looking at our async code and really was surprised at how short it is.

So far I recommend it heartily.

rlander · on Aug 5, 2011

bryanh, it may seem daunting at first, but everything fits together quite nicely. You don't even have to know anything about erlang in order to setup celery.

Nowadays you can install RabbitMQ + Erlang through a package manager like apt-get/homebrew.

After making sure that RabbitMQ is running, add "djcelery" to your project's settings.py + syncdb, add a task (The API is very straightforward and intuitive), run celeryd and voila =) [1].

The other services (like celerybeat/camqadm) are for monitoring/administration purposes.

Besides, RabbitMQ is not even required since there are the so-called ghetto queues (db, redis).

But if you're looking for something even simpler, I suggest you take a look at hotqueue[2]

[1]http://ask.github.com/django-celery/getting-started/first-st...

[2]http://richardhenry.github.com/hotqueue/tutorial.html

reinhardt · on Aug 5, 2011

Same here, it seemed overkill for my use cases so far. Check out Gearman or Beanstalkd, they're simpler to setup.

asksol · on Aug 6, 2011

You can't really compare Celery with beanstalkd or gearman. In fact, Celery supports beanstalk as a transport, and adding gearman support would be simple. You could rather compare celery with the Python clients for these services.

Also, the default configuration is good enough in most cases.

Atilla · on Aug 5, 2011

One of the best releases so far!

dtran · on Aug 5, 2011

Great work as always asksol!

thedangler · on Aug 5, 2011

I have no idea when I would need to use it. Can someone give examples of when I might want to think of using it?

tshaddox · on Aug 5, 2011

You can use it for any task which takes too long to fit into a web request.

A simplistic example: You have a web form where a new user can choose a bunch of his/her interests or hobbies from a big list. The user clicks submit and then sees a list of other users on your site with similar interests.

For small numbers of users and a simplistic algorithm, this could probably be done right in the web request (i.e. in your Django view), but with millions of users and a complex recommendation algorithm, you could have Celery do the task in a thread separate from the web request.

You could then immediately forward the user to a page that says "Your recommendations are being calculated." The user can come back later and see if his/her recommendations are done yet. I assume you could even do some AJAX checks that would load in the recommendations once the Celery task is done, or even display a progress bar.

enjo · on Aug 5, 2011

Django-celery even comes with a built-in view to do that polling;)

enjo · on Aug 5, 2011

We use it extensively, essentially for anything that doesn't need to happen "just this second". It's also replaced everything we used to do in cron, as it easy to carry around in source control.

Most of our emails, calculating expensive things (we have some match-making algorithms that run as tasks), sending notifications and all sorts of fun things are all done in tasks. By moving as much of our code into celery as possible it's been much easier to scale (adding a new node is trivial) and our pages are much more responsive.

That Celery makes it all so very easy to do is icing on the cake.

notaddicted · on Aug 5, 2011

I've set it up for a project where incoming HTTP requests can trigger outgoing HTTP requests, something I found useful was a wrapper function that attempts to execute the task asynchronously (the_task.apply_async) but falls back on running it in-process (the_task.apply). This way if RabbitMQ isn't running everything still basically works. The downside is that if RabbitMQ is running but celeryd isn't, all the tasks just pile up in the queue.

mahmud · on Aug 5, 2011

So, like Quartz.

http://quartz-scheduler.org/overview/index.html

guildchatter · on Aug 5, 2011

Anybody here use Celery on their production system? Is it reliable/stable? Any weird problems?

matclayton · on Aug 5, 2011

We run it for mixcloud.com its awesome, saved us hundreds of hours of dev time easily.

RyOnLife · on Aug 5, 2011

We use it/love it at Uber. Had a problem has been with the results building up in RabbitMQ and causing the queue to exceed it's memory allocation. We disabled results, and fortunately that's now the default with this release.

chintan · on Aug 5, 2011

Awesome tool!

We are running some massively distributed tasks with it.

stavros · on Aug 5, 2011

We use it for historious, it has never had any problems. I would definitely, definitely recommend it.

enjo · on Aug 5, 2011

We use it in production and have for a couple of years. It's never once caused an issue.

vae77 · on Aug 5, 2011

Great news! Awesome software!