They're fixed now, but this is very much beta stuff. It wasn't as magical as we'd hoped so we ended up ripping it out and switching the application to use SSE[1] which streamed into a kafka producer.
Most systems do cache/index invalidation or HDFS archiving from the web app or a set of microservices that gets notified from the web app. So instead of pushing raw DB rows into Kafka they will push generic events e.g. "user deleted account" and let the various services work out how to respond to it.
In this model you have tightly coupled both the choice of DB as well as the DB schema to the events system.
Sounds like a bit of a nightmare if your iterate a lot on your architecture.
We are doing something very similar to enable new use cases on a large database that has many hard to change applications already talking to it. That is its a legacy migration approach. It gets a 'generic' form of the data into an eventing system all at once that would have taken an unreasonable amount of dev resources to get any other way. New producers can talk to the events system in a more straightforward way and consumers can get all the data they need, legacy and modern, from the same place.
That said, I'll warn you, we didn't find Bottled Water to be anywhere near robust enough for our use cases. It seems to make assumptions about the size of your database, the size of your transactions, and the downside risks of logical replication backing up that certainly didn't meet our requirements.
The architecture you describe is going to be extremely sensitive to all the usual asynchronous system problems--message omission, message duplication, and message reordering (you could fix all of them with enough effort and potential slowdown, but probably don't). Having tried it, I'd categorize your system as a nightmare too :)
This makes sense for e.g, when you want to make your database content available for search (via elasticsearch) - for this you may want to push the raw DB rows.
Again it doesn't really make sense because you are tying the physical DB schema to the search engine schema.
Your application domain model should be at the centre of your architecture not the physical database model. For example storing a User object rather than a row from a User table.
I understand why a database company would see a database as the centre of the world. But it really should be your application. Especially if you want to use PostgreSQL and InfluxDB for different domain types and yet have both indexed in ElasticSearch.
It's also pointless to try and convince someone who thinks otherwise either, I found it better to let the those developers shoot themselves in the foot and learn the hard way. It's the only way they'll learn.
You are right that the ES and DB schemas shouldn't be directly tied together. That is why I have specific DB functions that output the correct format for ES and only those functions need to change if the ES or DB schemas change. It is a much simpler way to do it then to have application worry about it in my opinion.
It's funny. We're in a need of something like this. I was thinking my options this morning and went to Hacker News just to see the link posted here. We already have the latest PostgreSQL and Kafka running. What a coincidence!
Haha well it's not "City Building" or some ridiculously common English word. At least, not as far as I know.. lol it was a random series of noises that popped out of my head which, I'd argue, is better than naming something "Bottled Water" or "Hydrogen" or "Rust" lol
I ran into both of these:
https://github.com/confluentinc/bottledwater-pg/issues/53
https://github.com/confluentinc/bottledwater-pg/issues/61
They're fixed now, but this is very much beta stuff. It wasn't as magical as we'd hoped so we ended up ripping it out and switching the application to use SSE[1] which streamed into a kafka producer.
[1] https://en.wikipedia.org/wiki/Server-sent_events