You're right, at scale, Kafka allows you to parallelise the workload more efficiently, as well as supporting multiple consumers for your replication slot. However, you should still be able to use the plug and play output processors with Kafka just as easily (implementation internally is abstracted, it works the same way with/without Kafka). We currently use the search indexer with Kafka at Xata for example. As long as the webhook server has support for rate limiting, it should be possible to handle a large amount of event notifications.
Regarding the transaction metadata, the current implementation uses the wal2json postgres output plugin (https://github.com/eulerto/wal2json), and we don't include the transactional metadata. However, this is something that would be easy to enable, and integrate into the pgstream pipeline if needed, to ensure transactional consistency on the consumer side.
The current implementation of the webhook notifier will call the webhook URL at most once. There's no real reason for it beyond it being a first iteration of the webhook output plugin. I've created an issue in the repo to support retries , since this is a valuable feature (https://github.com/xataio/pgstream/issues/67).
Worth mentioning that the retry strategy is defined on a processor level. The opensearch indexer for instance does handle retries, making it "at least once" delivery.
It can be annoying to have to re-subscribe every N days when your application is working as expected though. I wonder if one way to work around this could be to "blacklist" servers that have failed consistently for an amount of time. They could be deleted from the subscriptions table if they remain inactive once they've been blacklisted.
Hi there! I'm one of the authors of the pgstream project and found this thread very interesting.
I completely agree, retry handling can become critical when the application relies on all events being eventually delivered. I have created an issue in the repo (https://github.com/xataio/pgstream/issues/67) to add support for notification retries.
Thanks for everyone's feedback, it's been very helpful!
You're right, at scale, Kafka allows you to parallelise the workload more efficiently, as well as supporting multiple consumers for your replication slot. However, you should still be able to use the plug and play output processors with Kafka just as easily (implementation internally is abstracted, it works the same way with/without Kafka). We currently use the search indexer with Kafka at Xata for example. As long as the webhook server has support for rate limiting, it should be possible to handle a large amount of event notifications.
Regarding the transaction metadata, the current implementation uses the wal2json postgres output plugin (https://github.com/eulerto/wal2json), and we don't include the transactional metadata. However, this is something that would be easy to enable, and integrate into the pgstream pipeline if needed, to ensure transactional consistency on the consumer side.
Thanks for your feedback!