Hacker News new | past | comments | ask | show | jobs | submit login
Celery 2.3 released (celeryproject.org)
72 points by timf on Aug 5, 2011 | hide | past | favorite | 33 comments



What is Celery?

Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.

The execution units, called tasks, are executed concurrently on a single or more worker servers using multiprocessing, Eventlet, or gevent. Tasks can execute asynchronously (in the background) or synchronously (wait until ready).


Ask described it to me as "a library for discovering bugs in the multiprocessing library".


Yes, but note that multiprocessing rocks. It's just that celery uses it so extensively :) (and now also eventlet/gevent if that suits better)


And someone here has commit privs to fix those bugs too :) (ask does)


I agree with you on both counts. I've used multiprocessing to great effect, and, although it was fantastic for what it did, it was very buggy...

Nowadays, I'd just use Celery instead!


Does Celery support task dependencies? Would be great to specify a DAG of tasks and have them run in parallel. Basically something like Oozie but not restricted to Hadoop.


If I understand your question correctly, you should be able to do this using callbacks.


Maybe it is just me, but I can never bring myself to setting up Celery and RabbitMQ and all for some Django apps that could sorely use it. There seems to be so many moving parts, last I checked there were like 3-4 different services to tie everything together...

Anyone know of a simpler system or technique for achieving the same thing? Or perhaps, a better way to go about using Celery.


IMHO, setting up RabbitMQ and Celery, especially with django-celery, ends up being much simpler and less painful than rolling any kind of custom queue/async processing. In terms of moving parts, for most setups, you can put Celery and RabbitMQ under supervisord and almost never have to worry about them.

If you don't need all the features of RabbitMQ or just don't want to deal with it, you can use any of the other broker backends: Redis (if you don't need persistence) or even with your default DB through the Django ORM.


Fyi, redis has an appendonly option to provide persistence:

http://redis.io/topics/persistence


Yes, but I think what he was referring to was message acknowledgements, which Redis doesn't have. If a worker reserves a message, and is abruptly killed, then the message is lost.


A "quick and dirty" solution is to use django-celery with the django-kombu (http://pypi.python.org/pypi/django-kombu/0.9.3) backend. This stores the tasks in your database, rather than having to run a separate broker like RabbitMQ. I've been using this for a few months and it's worked pretty well.


The only moving parts are the broker (RabbitMQ, Redis, etc), your existing app (already moving), and some workers --- which for a simple case like yours is "python manage.py celeryd".

It's worth it, in my opinion. Celery is one of the best-of-breed (even outside of Python) tools like SQLAlchemy.


If you're already using redis, use that for your broker. I agree that configuration can appear overwhelming, but it's pretty straightforward. To say that we've had success with Celery would be a big understatement, it's fantastic.


I felt absolutely the same way, but once we started using redis (for other reasons) I finally took the plunge.

We've been quite happy with it so far. It has a bit of a learning curve, but once you buy into the way it does things suddenly a lot of code you thought you had to write disappears. I was just looking at our async code and really was surprised at how short it is.

So far I recommend it heartily.


bryanh, it may seem daunting at first, but everything fits together quite nicely. You don't even have to know anything about erlang in order to setup celery.

Nowadays you can install RabbitMQ + Erlang through a package manager like apt-get/homebrew.

After making sure that RabbitMQ is running, add "djcelery" to your project's settings.py + syncdb, add a task (The API is very straightforward and intuitive), run celeryd and voila =) [1].

The other services (like celerybeat/camqadm) are for monitoring/administration purposes.

Besides, RabbitMQ is not even required since there are the so-called ghetto queues (db, redis).

But if you're looking for something even simpler, I suggest you take a look at hotqueue[2]

[1]http://ask.github.com/django-celery/getting-started/first-st...

[2]http://richardhenry.github.com/hotqueue/tutorial.html


Same here, it seemed overkill for my use cases so far. Check out Gearman or Beanstalkd, they're simpler to setup.


You can't really compare Celery with beanstalkd or gearman. In fact, Celery supports beanstalk as a transport, and adding gearman support would be simple. You could rather compare celery with the Python clients for these services.

Also, the default configuration is good enough in most cases.


One of the best releases so far!


Great work as always asksol!


I have no idea when I would need to use it. Can someone give examples of when I might want to think of using it?


You can use it for any task which takes too long to fit into a web request.

A simplistic example: You have a web form where a new user can choose a bunch of his/her interests or hobbies from a big list. The user clicks submit and then sees a list of other users on your site with similar interests.

For small numbers of users and a simplistic algorithm, this could probably be done right in the web request (i.e. in your Django view), but with millions of users and a complex recommendation algorithm, you could have Celery do the task in a thread separate from the web request.

You could then immediately forward the user to a page that says "Your recommendations are being calculated." The user can come back later and see if his/her recommendations are done yet. I assume you could even do some AJAX checks that would load in the recommendations once the Celery task is done, or even display a progress bar.


Django-celery even comes with a built-in view to do that polling;)


We use it extensively, essentially for anything that doesn't need to happen "just this second". It's also replaced everything we used to do in cron, as it easy to carry around in source control.

Most of our emails, calculating expensive things (we have some match-making algorithms that run as tasks), sending notifications and all sorts of fun things are all done in tasks. By moving as much of our code into celery as possible it's been much easier to scale (adding a new node is trivial) and our pages are much more responsive.

That Celery makes it all so very easy to do is icing on the cake.


I've set it up for a project where incoming HTTP requests can trigger outgoing HTTP requests, something I found useful was a wrapper function that attempts to execute the task asynchronously (the_task.apply_async) but falls back on running it in-process (the_task.apply). This way if RabbitMQ isn't running everything still basically works. The downside is that if RabbitMQ is running but celeryd isn't, all the tasks just pile up in the queue.



Anybody here use Celery on their production system? Is it reliable/stable? Any weird problems?


We run it for mixcloud.com its awesome, saved us hundreds of hours of dev time easily.


We use it/love it at Uber. Had a problem has been with the results building up in RabbitMQ and causing the queue to exceed it's memory allocation. We disabled results, and fortunately that's now the default with this release.


Awesome tool!

We are running some massively distributed tasks with it.


We use it for historious, it has never had any problems. I would definitely, definitely recommend it.


We use it in production and have for a couple of years. It's never once caused an issue.


Great news! Awesome software!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: