Other than the space for past notifications and/or having to issue a DELETE, are...

brandur · 2024-05-14T10:47:27 1715683647

In Postgres listen/notify are inherently lossy channels — if a notification goes out while a listener wasn't around to receive it, it's gone, so they should never be relied upon in cases where data consistency is at stake.

I find that the main thing they're useful for is notifying on particular changes so that components that care about them can decrease the time until they process those changes, and without sitting in a hot loop constantly polling tables.

For example, I wrote a piece here [1] describe how we use the notifier to listen for feature flag changes so that each running program can update its flag cache. Those programs could be sitting in loops reloading flags once a second looking for changes, but it's wasteful and puts unnecessary load on the database. Instead, each listens for notifications indicating that some flag state changed, then reloads its flag cache. They also reload every X seconds so that some periodic synchronization happens in case an update notification was missed (e.g. a notifier temporarily dropped offline).

Job queues are another example. You'll still be using `SKIP LOCKED` to select jobs to work, but listen/notify makes it faster to find out that a new job became available.

[1] https://brandur.org/fragments/instant-feature-flags

mcqueenjordan · 2024-05-14T10:55:34 1715684134

Got it, thanks for the reply. The feature flag cache reload use case seems like reasonable one to me.

santicalcagno · 2024-05-14T10:48:57 1715683737

I implemented a queue using both LISTEN/NOTIFY for notifications to the task processor and SKIP LOCKED to sift through the pending tasks in the tasks table.

I think you can eliminate polling if you don't need to retry tasks, by simply processing pending tasks at startup and then just responding to LISTEN events. However, I'm curious if there are any alternatives to polling the queue in cases where you need to support retrying tasks at a given timestamp.

mcqueenjordan · 2024-05-14T10:58:21 1715684301

I personally think polling the queue/table via queries is a very sensible pattern and not something I have a desire to remove. In theory, you could go at it via a push approach by wiring into the WAL or something but that comes with its own rats nest of issues.

cryptonector · 2024-05-14T21:28:27 1715722107

One nice thing about `NOTIFY` is that the system is very fast and scales to many `LISTEN`ers that can all get notifications with very little latency. I.e., it's a C10K system.

Because there are no access controls on who can NOTIFY to what channel, you can't rely on the payload, so you really do have to look at a work queue. But if it's just one user, and all you're trying to do is broadcast date fast, then NOTIFY works great.

thom · 2024-05-14T10:47:31 1715683651

The only tradeoff here is the pure NOTIFY approach (if you don't care about losing notifications) can sit there on a single connection, and probably performs a bit better than having a bunch of workers in contention for that connection (at which point you don't really need SKIP LOCKED anyway). But ultimately tuning the level of parallelism of your worker pool and how many connections to dedicate to it doesn't seem a huge hardship.

hot_gril · 2024-05-14T18:42:52 1715712172

Tbh I didn't know about SKIP LOCKED until now, but it looks like you have to hold a xact open the entire time the worker runs, which can be a problem. What I've done before is timestamp cols for start/end. A worker takes any job whose end time is null and start time is not too recent, which makes retries natural and flexible.

A pubsub pattern like pg_notify can definitely make sense depending on the requirements, but I wouldn't jump to it first. The few times I've used pubsub elsewhere, it was when subscribing to some other team's service, not via a shared DB.

mcqueenjordan · 2024-05-15T16:54:27 1715792067

Yeah, you can avoid holding the xact with the means that you mentioned, e.g. SKIP LOCKED and set some value to PROCESSING, then do your processing, then update to DONE at the end. Or as you mentioned, timestamps.

I think the SKIP LOCKED part is really only useful to avoid contention between two workers querying for new work simultaneously.

hot_gril · 2024-05-15T17:28:05 1715794085

Yeah, I can imagine SKIP LOCKED being faster if used that way. Just hasn't been an issue for me yet, so I haven't tested.

orangeisthe · 2024-05-18T09:09:23 1716023363

They can be used in conjunction.