PGQueuer is a lightweight job queue for Python, built entirely on PostgreSQL. It...

TkTech · 2024-12-07T11:18:39 1733570319

Also https://github.com/TkTech/chancy for another (early) Python option that goes the other way and aims to have bells and whistles included like a dashboard, workflows, mixed-mode workers, etc...

Check out the Similar Projects section in the docs for a whole bunch of Postgres-backed task queues. Haven't heard of pgqueuer before, another one to add!

WJW · 2024-12-07T11:10:02 1733569802

I always wondered about the claim that SKIP LOCKED is all that efficient. Surely there are lots of cases where this is a really suboptimal pattern.

Simple example: if you have a mixture of very short jobs and longer duration jobs, then there might be hundreds or thousands of short jobs executed for each longer job. In such a case the rows in the jobs table for the longer jobs will be skipped over hundreds of times. The more long-running jobs running concurrently, the more wasted work as locked rows get skipped again and again. It wouldn't be a huge issue if load is low, but surely a case where rows get moved to a separate "running" table would be more efficient. I can think of several other scenarios where SKIP LOCKED would lead to lots of wasted work.

jeeybee · 2024-12-07T11:41:01 1733571661

Good point about SKIP LOCKED inefficiencies with mixed-duration jobs. In PGQueuers benchmarks, throughput reached up to 18k jobs/sec, showing it can handle high concurrency well. For mixed workloads, strategies like batching or partitioning by job type can help.

While a separate "running" table reduces skips, it adds complexity. SKIP LOCKED strikes a good balance for simplicity and performance in many use cases.

One known issue is that vacuum will become an issue if the load is persistent for longer periods leading to bloat.

blackenedgem · 2024-12-07T13:17:42 1733577462

>One known issue is that vacuum will become an issue if the load is persistent for longer periods leading to bloat.

Generally what you need to do there is have some column that can be sorted on that you can use as a high watermark. This is often an id (PK) that you either track in a central service or periodically recalculate. I've worked at places where this was a timestamp as well. Perhaps not as clean as an id but it allowed us to schedule when the item was executed. As a queue feature this is somewhat of an antipattern but did make it clean to implement exponential backoff within the framework itself.

cercaapps · 2024-12-07T11:26:56 1733570816

Job rows could have an indexed column state so you just query for the rows with state "not-started".

This way you won't need to skip over the long jobs that are in state "processing".

normie3000 · 2024-12-07T11:59:22 1733572762

I'm not 100% confident, but this sounds like it would have unexpected effects.

arunix · 2024-12-07T11:25:54 1733570754

What are its advantages compared to a more dedicated job queue system?

jeeybee · 2024-12-07T11:35:04 1733571304

I think PGQueuers main advantage is simplicity; no extra infrastructure is needed, as it runs entirely on PostgreSQL. This makes it ideal for projects already using Postgres and operational familiarity. While it may lack the advanced features or scalability of dedicated systems like Kafka or RabbitMQ, it’s a great choice for lightweight without the overhead of additional services.