Hacker News new | past | comments | ask | show | jobs | submit login

Yeah, I think more folks should learn about postgres LISTEN and NOTIFY functionality, https://www.postgresql.org/docs/current/sql-notify.html, it's great for stuff like this, and there are now various client libraries like pg-listen in NPM that make this really easy to use.

This is especially useful if you have lots of different teams/code paths that may update your DB - just set up a trigger that causes a NOTIFY message to get sent, then have a client responsible for reading for PG and populating ES. Alternatively, if you can accept a bit more latency, just have a trigger that sets a "needsESIndexing" dirty column somewhere and have a polling process that picks rows WHERE needsESIndexing = TRUE and just updates this to FALSE when the indexing is complete.




It has been several years since I worked on a system that implemented listen/notify, but I recall there was a significant caveat that if the listener was unavailable at the exact moment the NOTIFY executes, the message would be lost.

That’s a significant risk for things that need to be in sync between two systems, so we stuck with listen/notify for info-level ops things and used polling/queue systems that offered better guarantees for more important tasks. Don’t want to be in a position where a quiet hiccup with a deploy or something results in fun bugs like 0.5% of rows being silently out of sync between ES and Postgres.


You could put a Nats JetStream connection in line and you'd have a good buffer to catch the notifications. Not fool proof but then you'd need some kind of batch process that looks for any outstanding changes and syncs them.

We do something like this with our systems. External events get written to the event bus but all operations are idempotent on the event bus. So at night we send another round of the days events to clean up any inconsistencies.


Making jobs idempotent forces good design imo.


Yes agreed. It's (usually) more work, but the gains in robustness and fault tolerance are (usually) well worth it


Or a bit more 'web scale's use wal_insert and the outbox pattern to transactionally replicate events to a messaging system (Kafka etc).


you can also just use a CDC to import from Postgres into Kafka using Debezium and then on the other end Kafka Connect to write into Elasticsearch, no need to write any code at all with this setup ymmv ofc


Sounds like a lot of moving parts though, unless the team is already familiar with maintaining all those parts


if you already use Kafka anyway its a valid solution, I wouldn't introduce Kafka for this obviously. Also Kafka Connect and Debezium are mature well maintained open source projects in case that wasn't clear enough.

https://debezium.io/

https://docs.confluent.io/platform/current/connect/index.htm...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: