Good, basic practices to follow. Here's a few more:
- If you're using AMQP/RabbitMQ as your result back end it will create a lot of dead queues to store results in. This can easily overwhelm your RabbitMQ server if you don't clear these out frequently. Newer releases of Celery will do this daily I think - but it's worth keeping in mind if your RMQ instance falls over in prod.
- Use chaining to build up "sequential" tasks that need doing instead of calling one after another in the same task (or worse, doing a big mouthful of work) in one task as Celery can prioritise many tasks better than synchronously calling several tasks in a row from one "master" task.
- Try to keep a consistent module import pattern for celery tasks, or explicitly name them, as Celery does a lot of magic in the background so task spawning is seamless to the developer. This is very important as you should never mix relative and absolute importing when you are dealing with tasks. from foo import mytask may be picked up differently than "import foo" followed by "foo.mytask" would resulting in some tasks not being picked up by Celery(!)
- Never pass database objects, as OP says, is true; but go one step further and don't pass complex objects at all if you can avoid it. I vaguely remember some of the urllib/httplib exceptions in Python not being serializable and causing very cryptic errors if you didn't capture the exception and sanitise it or re-raise your own.
- Use proper configuration management to set up and configure Celery plus what ever messaging broker/backend. There's nothing more frustrating than spending your time trying to replicate somebody's half-assed Celery/Rabbit configuration that they didn't nail down and test properly in a clean-room environment.
With regards to #1: What happens is that if task_B depends on a value that task_A returns, task_A will insert its value into the queue and task_B will consume it.
If task_C returns a value which no other task cares about, it will insert the value into the queue, and never gets consumed. This is why dead queues (also known as "tombstones") happen.
Always remember to set ignore_result=True for tasks which don't return any consumed value.
In general using an AMQP for the result storage is somewhat of a bad idea i think. But yes I agree about the ignoring results part seeing as most tasks I've seen in the wild don't return anything at all. Hence #6 in the post.
- If you're using AMQP/RabbitMQ as your result back end it will create a lot of dead queues to store results in. This can easily overwhelm your RabbitMQ server if you don't clear these out frequently. Newer releases of Celery will do this daily I think - but it's worth keeping in mind if your RMQ instance falls over in prod.
- Use chaining to build up "sequential" tasks that need doing instead of calling one after another in the same task (or worse, doing a big mouthful of work) in one task as Celery can prioritise many tasks better than synchronously calling several tasks in a row from one "master" task.
- Try to keep a consistent module import pattern for celery tasks, or explicitly name them, as Celery does a lot of magic in the background so task spawning is seamless to the developer. This is very important as you should never mix relative and absolute importing when you are dealing with tasks. from foo import mytask may be picked up differently than "import foo" followed by "foo.mytask" would resulting in some tasks not being picked up by Celery(!)
- Never pass database objects, as OP says, is true; but go one step further and don't pass complex objects at all if you can avoid it. I vaguely remember some of the urllib/httplib exceptions in Python not being serializable and causing very cryptic errors if you didn't capture the exception and sanitise it or re-raise your own.
- Use proper configuration management to set up and configure Celery plus what ever messaging broker/backend. There's nothing more frustrating than spending your time trying to replicate somebody's half-assed Celery/Rabbit configuration that they didn't nail down and test properly in a clean-room environment.