lots of Go based CDC stuff going on these days. Redpanda Connect (formerly benthos) recently added support for Postgres CDC [1] and MySQL is coming soon too [2].
The status quo tool Debezium is annoying/heavy because it’s a banana that comes attached to the Java, Kafka Connect, Zookeeper jungle - it’s a massive ecosystem and dependency chain you need to buy into. The Kafka clients outside of Java-land I’ve looked are all sketchy - in Node, KafkaJS went in maintained for years, Confluent recently started maintaining rdkafka-based client that’s somehow slower than the pure JS one and breaks every time I try to upgrade it. The Rust Kafka client has months-old issues in the latest release where half the messages go missing and APIs seem to no-op, and any version will SIGSEGV if you hold it wrong - obviously memory unsafe. The official rdkafka Go client depends on system C library package versions “matching up” meaning you often need a newer librdkafka and libsasl which is annoying; the unofficial pure-go one looks decent though.
Overall the Confluent ecosystem feels targeted at “data engineer” use-cases so if you want to build a reactive product it’s not a great fit. I’m not sure what the performance target is of the Debezium Postgres connector maintainers but I get the sense they’re not ambitious because there’s so little documentation about performance optimization; data ecosystem feels contemporary with “nightly batch job” kind of thing vs product people today who want 0ms latency.
If you look at backend infrastructure there’s a clear trope of “good idea implemented in Java becomes standard, but re-implementing in $AOT_COMPILED_LANGUAGE gives big advantage:
- Cassandra -> ScyllaDB
- Kafka -> RedPanda, …
- Zookeeper -> ClickHouse Keeper, Consul, etcd, …
- Debezium -> All these thingies
There’s also a lot of hype around Postgres right now, so a bit of VC funded Cambrian explosion going on and I think a lot these will die off as a clear winner emerges.
>Any reason we're seeing so many CDC tools pop up?
When I looked for something ~1 year ago to dump to S3 (object storage) they all sucked in some way.
I'm also of the opinion Postgres gives you a pretty "raw" interface with logical replication so a decent amount of building is needed and each person is going to have slightly different requirements/goals.
I haven't looked recently but hopefully these do a better job handling edge cases like TOASTd values, schema changes, and ideally full load
1: https://github.com/redpanda-data/connect/pull/2917
2: https://github.com/redpanda-data/connect/pull/3014