Implementations that use advisory locks like https://github.com/que-rb/que are much more efficient (atleast when I last tested) and will easily reach 10k job/s on even very modest hardware.
There is a Go port of Que but you can also easily port it to any language you like. I have a currently non-OSS implementation in Rust that I might OSS someday when I have time to clean it up.
This is easily a few thousand dollars worth of advice. I use PostgreSQL regularly but I would never have been able to come up with this solution myself.
Having queues on Postgres is going to be such a great addition to my tool belt especially since there's always a Postgres instance running somewhere anyway.
I know the first use case that jumps to mind is a job queues, but I feel like the method described in the article is quite low level which means it can be used as a base to solve many use cases.
* I won't have to reach for Kafka/Redpanda unless the rate of events/messages reaches 100k-1000k per day.
* I can add one column called queue_id where each unique queue_id refers to a new queue which means I can use a single table for multiple queues.
* If I add a new column called event_type, then I wonder if it's possible to create a composite event queue, where for example the query must return exactly two rows and one row must have event_type=type_1 and second row must have event_type=type_2 and both rows are locked and processed exactly once?
Queuing means different things to different people; e.g. jobs vs events vs messages. Jobs typically run longer, events are typically reactive, messages are often part of the main flow and potentially part of an interactive operation.
I've implemented job queues in PostgreSQL, but I do polling in two phases. One phase to update the rows (using SKIP LOCKED) with metadata around the fact that a job has started, when it started, what agent is running the job, etc. That enables monitoring for stuck jobs, looking at current status, incremental updates on long running jobs, cooperative job cancellation, etc. It requires monitoring for abandoned jobs, but that's not too bad - it also avoids long-running transactions.
The other phase is to copy completed / aborted jobs to a history table for stats, failure reasons etc., keeping the main queue table small.
Another thing those posts rarely specify is what the target throughput is. The proper design for 1 job/s will be very different from the proper design for 10k jobs/s.
For example the database based queue we use in production is even more primitive than what this article describes (no batches, polling once a second, rows never deleted), but works perfectly fine since we're only processing less than 10 jobs/s and don't expect it to grow beyond 100 jobs/s.
keeping the transaction open while the job is running is probably dubious. presumably you are using a queue because the job takes a long time to run. having long running transactions steals a scarce PG resource (connections) and also causes issues with vacuuming.
> Queueing jobs in Node.js using PostgreSQL like a boss
You can scale much higher in Postgres 14 if you use partitioned tables, both horizontally and in terms of day-to-day stability because old partitions can be dropped in O(1) time, taking all table bloat with them. Obviously more work to set up, though.
It's not O(1), it's O(number of gigabytes of table). A DROP TABLE statement has to delete the files backing the table and tables are split into files of 1 gigabyte each.
Does anyone have anything that resembles Celery or Huey for python that uses the nice Postgres semantics for queueing? I know ruby has something like this in the form of Que. My application has very modest needs for its task queue, probably like a few thousand events per year, and celery seems like massive overkill.
At our company we use Redis, it's lightweight and has list structure which can be used as a push/pop queue
The basic gist of it is that on one end a producer pushes to the list and a consumer(s) on the other end pops the job and executes it. Fire and forget style.
Yeah, redis or similar are generally good and battle tested solutions. Celery and Huey use redis, as do many other task queue and scheduling solutions. The issue is then that you have to manage an additional deployment of redis. That’s not so bad if it’s a core part of your application esp if you’re hosting on a paas or major cloud, but it can make local dev and testing suck a bit more.
Yes it's true, we already use Redis for several workloads and operations so it's not really an extra thing to manage, on the contrary we don't need to learn about say rabbitmq and manage it, just worry about Redis.
Why do you consider it bad for local dev? A Redis instance literally takes 1MB memory when started.
I remember coming across a project that implemented a complete queueing system using SQL functions, so to add jobs you would just call a function and same for polling/listening for jobs. It was very well documented, each function was explained and how to use it. But I can't find it anymore.
Probably not the one you're thinking of, but I wrote something like this for C# once upon a time called OddJob [0]. If nothing else it shows decent patterns for a job queue that is friendly towards multiple producers/consumers.
We’ve implemented two types of queues using Postgres tables in production and it’s been working nicely. Happy to answer questions about our usage if it’s useful :)
We're currently keeping it as simple as possible. We just use a table and this query that runs every second:
"DELETE FROM job_queue RETURNING type, info;"
Works great and reliably so far. I think our current volume is on the order of 100-1000 req/sec. We'll likely switch to graphile-worker when we need more performance, but we're all about avoiding premature optimization. That library has been benchmarked to handle 10k req/sec: https://github.com/graphile/worker#performance
PgSQL table based job queues should be able to handle that kind of load, but to support it you will need very high IOPS based disk. If you have NVME SSD paired with decent CPU, it should be able to manage this.
This is mainly because PgSQL based queue is durable by default and based on how you implement your inserts, every new message will need a disk based commit.
There’s people who think using a database as a queue is bad, but if you keep pressing then for tangible reasons, it usually turns out to be emotional …… they just don’t like the idea.
Next time someone says it’s a bad idea, keep pressing for real answers, not emotion.
There is a lot of ignorance around all topics but in technology it seems to be more pronounced if only because I have personal and professional interest. I felt inclined to support 10x0five in his statement as my own experiences are identical as it relates to storing files in a database. There were, and likely still are, those that proclaim over and over to never store serially linked chunked data in a database, aka files, yet after building and proving a model in production long before today's tech that stance has seemed to change. The benefits far outweigh the costs when having everything in a database which is comparable in today's uptake of putting everything under version control. Fear not what others judge about your ideas as this should just be more motivation to "press for real answers" proving yourself correct.
You don’t even need a database to implement an effective queueing mechanism.
Linux file move is atomic, which means file system based queues are perfectly viable. Just save one message per file. Move the file to a different directory when it changes state.
I built a prototype queue around this mechanism. Performance was bound to the disks ability to create small files, around 30,000 messages per second.
This sort of performance rivals some of much more sophisticated and complex queuing systems, but has zero configuration.
It has complex failure modes though, unless you're not at all concerned about what happens if the program crashes, power goes out, file gets corrupted, etc.
It's surprisingly difficult to write something that stores changing data on disk correctly and durably.
If you mean they have to solve the same problems with very careful design and programming, yes. It's non trivial. Even Postrgres had a correctness issue around fsync for nearly twenty years.
If you roll your own, that's very hard to get right. I wouldn't have confidence in such a system that I've built myself.
There is a Go port of Que but you can also easily port it to any language you like. I have a currently non-OSS implementation in Rust that I might OSS someday when I have time to clean it up.