Hacker News new | past | comments | ask | show | jobs | submit login

PGQueuer is a lightweight job queue for Python, built entirely on PostgreSQL. It uses SKIP LOCKED for efficient and safe job processing, with a minimalist design that keeps things simple and performant.

If you’re already using Postgres and want a Python-native way to manage background jobs without adding extra infrastructure, PGQueuer might be worth a look: GitHub - https://github.com/janbjorge/pgqueuer




Also https://github.com/TkTech/chancy for another (early) Python option that goes the other way and aims to have bells and whistles included like a dashboard, workflows, mixed-mode workers, etc...

Check out the Similar Projects section in the docs for a whole bunch of Postgres-backed task queues. Haven't heard of pgqueuer before, another one to add!


I always wondered about the claim that SKIP LOCKED is all that efficient. Surely there are lots of cases where this is a really suboptimal pattern.

Simple example: if you have a mixture of very short jobs and longer duration jobs, then there might be hundreds or thousands of short jobs executed for each longer job. In such a case the rows in the jobs table for the longer jobs will be skipped over hundreds of times. The more long-running jobs running concurrently, the more wasted work as locked rows get skipped again and again. It wouldn't be a huge issue if load is low, but surely a case where rows get moved to a separate "running" table would be more efficient. I can think of several other scenarios where SKIP LOCKED would lead to lots of wasted work.


Good point about SKIP LOCKED inefficiencies with mixed-duration jobs. In PGQueuers benchmarks, throughput reached up to 18k jobs/sec, showing it can handle high concurrency well. For mixed workloads, strategies like batching or partitioning by job type can help.

While a separate "running" table reduces skips, it adds complexity. SKIP LOCKED strikes a good balance for simplicity and performance in many use cases.

One known issue is that vacuum will become an issue if the load is persistent for longer periods leading to bloat.


>One known issue is that vacuum will become an issue if the load is persistent for longer periods leading to bloat.

Generally what you need to do there is have some column that can be sorted on that you can use as a high watermark. This is often an id (PK) that you either track in a central service or periodically recalculate. I've worked at places where this was a timestamp as well. Perhaps not as clean as an id but it allowed us to schedule when the item was executed. As a queue feature this is somewhat of an antipattern but did make it clean to implement exponential backoff within the framework itself.


Job rows could have an indexed column state so you just query for the rows with state "not-started".

This way you won't need to skip over the long jobs that are in state "processing".


I'm not 100% confident, but this sounds like it would have unexpected effects.


What are its advantages compared to a more dedicated job queue system?


I think PGQueuers main advantage is simplicity; no extra infrastructure is needed, as it runs entirely on PostgreSQL. This makes it ideal for projects already using Postgres and operational familiarity. While it may lack the advanced features or scalability of dedicated systems like Kafka or RabbitMQ, it’s a great choice for lightweight without the overhead of additional services.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: