Not really sure why you would do it like this. Most systems do cache/index inval...

kasey_junk · on Nov 1, 2016

We are doing something very similar to enable new use cases on a large database that has many hard to change applications already talking to it. That is its a legacy migration approach. It gets a 'generic' form of the data into an eventing system all at once that would have taken an unreasonable amount of dev resources to get any other way. New producers can talk to the events system in a more straightforward way and consumers can get all the data they need, legacy and modern, from the same place.

That said, I'll warn you, we didn't find Bottled Water to be anywhere near robust enough for our use cases. It seems to make assumptions about the size of your database, the size of your transactions, and the downside risks of logical replication backing up that certainly didn't meet our requirements.

Jweb_Guru · on Nov 1, 2016

The architecture you describe is going to be extremely sensitive to all the usual asynchronous system problems--message omission, message duplication, and message reordering (you could fix all of them with enough effort and potential slowdown, but probably don't). Having tried it, I'd categorize your system as a nightmare too :)

olalonde · on Nov 1, 2016

https://en.wikipedia.org/wiki/Change_data_capture

It's mostly useful when you can't change the application code.

manojlds · on Nov 1, 2016

This ebook goes into the details of the why - http://www.confluent.io/wp-content/uploads/2016/08/Making_Se...

(authored by the creator of this tool)

bladecatcher · on Nov 1, 2016

This makes sense for e.g, when you want to make your database content available for search (via elasticsearch) - for this you may want to push the raw DB rows.

threeseed · on Nov 1, 2016

Again it doesn't really make sense because you are tying the physical DB schema to the search engine schema.

Your application domain model should be at the centre of your architecture not the physical database model. For example storing a User object rather than a row from a User table.

I understand why a database company would see a database as the centre of the world. But it really should be your application. Especially if you want to use PostgreSQL and InfluxDB for different domain types and yet have both indexed in ElasticSearch.

shawn-butler · on Nov 1, 2016

No.

The application domain model can be buggy and full of holes. The data store is the source of truth.

zepolen · on Nov 1, 2016

Couldn't agree more.

It's also pointless to try and convince someone who thinks otherwise either, I found it better to let the those developers shoot themselves in the foot and learn the hard way. It's the only way they'll learn.

rpedela · on Nov 1, 2016

You are right that the ES and DB schemas shouldn't be directly tied together. That is why I have specific DB functions that output the correct format for ES and only those functions need to change if the ES or DB schemas change. It is a much simpler way to do it then to have application worry about it in my opinion.

zepolen · on Nov 1, 2016

You're gonna have a lot of fun once you transcend beyond 'single central application' and have multiple apps interfacing with your data.

netghost · on Nov 1, 2016

This is probably also quite useful if you have a lot of logic in Postgres itself (ie: triggers), which can't be captured by the calling code.

anentropic · on Nov 1, 2016

and cache invalidation is notoriously easy to get right...