Hacker News new | past | comments | ask | show | jobs | submit login

lots of Go based CDC stuff going on these days. Redpanda Connect (formerly benthos) recently added support for Postgres CDC [1] and MySQL is coming soon too [2].

1: https://github.com/redpanda-data/connect/pull/2917

2: https://github.com/redpanda-data/connect/pull/3014




From the newish Go-based Postgres CDC tools, I know about:

* pgstream: https://github.com/xataio/pgstream

* pg_flo: https://github.com/pgflo/pg_flo

Are there others? Each of them has slightly different angles and messaging, but it is interesting to see.



No support for streaming PG changes


It’s in the works from my understanding. I helped build the Redpanda Connect one and it’s quite easy to use


Sequinstream is written in Elixir and also pretty recent.

https://github.com/sequinstream/sequin

Any reason we're seeing so many CDC tools pop up?


The status quo tool Debezium is annoying/heavy because it’s a banana that comes attached to the Java, Kafka Connect, Zookeeper jungle - it’s a massive ecosystem and dependency chain you need to buy into. The Kafka clients outside of Java-land I’ve looked are all sketchy - in Node, KafkaJS went in maintained for years, Confluent recently started maintaining rdkafka-based client that’s somehow slower than the pure JS one and breaks every time I try to upgrade it. The Rust Kafka client has months-old issues in the latest release where half the messages go missing and APIs seem to no-op, and any version will SIGSEGV if you hold it wrong - obviously memory unsafe. The official rdkafka Go client depends on system C library package versions “matching up” meaning you often need a newer librdkafka and libsasl which is annoying; the unofficial pure-go one looks decent though.

Overall the Confluent ecosystem feels targeted at “data engineer” use-cases so if you want to build a reactive product it’s not a great fit. I’m not sure what the performance target is of the Debezium Postgres connector maintainers but I get the sense they’re not ambitious because there’s so little documentation about performance optimization; data ecosystem feels contemporary with “nightly batch job” kind of thing vs product people today who want 0ms latency.

If you look at backend infrastructure there’s a clear trope of “good idea implemented in Java becomes standard, but re-implementing in $AOT_COMPILED_LANGUAGE gives big advantage:

- Cassandra -> ScyllaDB

- Kafka -> RedPanda, …

- Zookeeper -> ClickHouse Keeper, Consul, etcd, …

- Debezium -> All these thingies

There’s also a lot of hype around Postgres right now, so a bit of VC funded Cambrian explosion going on and I think a lot these will die off as a clear winner emerges.


BTW I think most the ecosystem has settled on https://github.com/twmb/franz-go being the best and highest performing kafka client for Golang (purego)


>Any reason we're seeing so many CDC tools pop up?

When I looked for something ~1 year ago to dump to S3 (object storage) they all sucked in some way.

I'm also of the opinion Postgres gives you a pretty "raw" interface with logical replication so a decent amount of building is needed and each person is going to have slightly different requirements/goals.

I haven't looked recently but hopefully these do a better job handling edge cases like TOASTd values, schema changes, and ideally full load




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: