lots of Go based CDC stuff going on these days. Redpanda Connect (formerly benth...

mebcitto · 2025-01-03T09:41:11 1735897271

From the newish Go-based Postgres CDC tools, I know about:

* pgstream: https://github.com/xataio/pgstream

* pg_flo: https://github.com/pgflo/pg_flo

Are there others? Each of them has slightly different angles and messaging, but it is interesting to see.

rockwotj · 2025-01-03T09:56:23 1735898183

https://github.com/artie-labs/reader is another I know of

jitl · 2025-01-03T17:18:08 1735924688

No support for streaming PG changes

rockwotj · 2025-01-04T02:33:03 1735957983

It’s in the works from my understanding. I helped build the Redpanda Connect one and it’s quite easy to use

andruby · 2025-01-03T11:44:22 1735904662

Sequinstream is written in Elixir and also pretty recent.

https://github.com/sequinstream/sequin

Any reason we're seeing so many CDC tools pop up?

jitl · 2025-01-03T17:01:23 1735923683

The status quo tool Debezium is annoying/heavy because it’s a banana that comes attached to the Java, Kafka Connect, Zookeeper jungle - it’s a massive ecosystem and dependency chain you need to buy into. The Kafka clients outside of Java-land I’ve looked are all sketchy - in Node, KafkaJS went in maintained for years, Confluent recently started maintaining rdkafka-based client that’s somehow slower than the pure JS one and breaks every time I try to upgrade it. The Rust Kafka client has months-old issues in the latest release where half the messages go missing and APIs seem to no-op, and any version will SIGSEGV if you hold it wrong - obviously memory unsafe. The official rdkafka Go client depends on system C library package versions “matching up” meaning you often need a newer librdkafka and libsasl which is annoying; the unofficial pure-go one looks decent though.

Overall the Confluent ecosystem feels targeted at “data engineer” use-cases so if you want to build a reactive product it’s not a great fit. I’m not sure what the performance target is of the Debezium Postgres connector maintainers but I get the sense they’re not ambitious because there’s so little documentation about performance optimization; data ecosystem feels contemporary with “nightly batch job” kind of thing vs product people today who want 0ms latency.

If you look at backend infrastructure there’s a clear trope of “good idea implemented in Java becomes standard, but re-implementing in $AOT_COMPILED_LANGUAGE gives big advantage:

- Cassandra -> ScyllaDB

- Kafka -> RedPanda, …

- Zookeeper -> ClickHouse Keeper, Consul, etcd, …

- Debezium -> All these thingies

There’s also a lot of hype around Postgres right now, so a bit of VC funded Cambrian explosion going on and I think a lot these will die off as a clear winner emerges.

rockwotj · 2025-01-04T02:32:02 1735957922

BTW I think most the ecosystem has settled on https://github.com/twmb/franz-go being the best and highest performing kafka client for Golang (purego)

nijave · 2025-01-04T22:18:01 1736029081

>Any reason we're seeing so many CDC tools pop up?

When I looked for something ~1 year ago to dump to S3 (object storage) they all sucked in some way.

I'm also of the opinion Postgres gives you a pretty "raw" interface with logical replication so a decent amount of building is needed and each person is going to have slightly different requirements/goals.

I haven't looked recently but hopefully these do a better job handling edge cases like TOASTd values, schema changes, and ideally full load