Hacker News new | past | comments | ask | show | jobs | submit login

Not really sure why you would do it like this.

Most systems do cache/index invalidation or HDFS archiving from the web app or a set of microservices that gets notified from the web app. So instead of pushing raw DB rows into Kafka they will push generic events e.g. "user deleted account" and let the various services work out how to respond to it.

In this model you have tightly coupled both the choice of DB as well as the DB schema to the events system.

Sounds like a bit of a nightmare if your iterate a lot on your architecture.




We are doing something very similar to enable new use cases on a large database that has many hard to change applications already talking to it. That is its a legacy migration approach. It gets a 'generic' form of the data into an eventing system all at once that would have taken an unreasonable amount of dev resources to get any other way. New producers can talk to the events system in a more straightforward way and consumers can get all the data they need, legacy and modern, from the same place.

That said, I'll warn you, we didn't find Bottled Water to be anywhere near robust enough for our use cases. It seems to make assumptions about the size of your database, the size of your transactions, and the downside risks of logical replication backing up that certainly didn't meet our requirements.


The architecture you describe is going to be extremely sensitive to all the usual asynchronous system problems--message omission, message duplication, and message reordering (you could fix all of them with enough effort and potential slowdown, but probably don't). Having tried it, I'd categorize your system as a nightmare too :)


https://en.wikipedia.org/wiki/Change_data_capture

It's mostly useful when you can't change the application code.


This ebook goes into the details of the why - http://www.confluent.io/wp-content/uploads/2016/08/Making_Se...

(authored by the creator of this tool)


This makes sense for e.g, when you want to make your database content available for search (via elasticsearch) - for this you may want to push the raw DB rows.


Again it doesn't really make sense because you are tying the physical DB schema to the search engine schema.

Your application domain model should be at the centre of your architecture not the physical database model. For example storing a User object rather than a row from a User table.

I understand why a database company would see a database as the centre of the world. But it really should be your application. Especially if you want to use PostgreSQL and InfluxDB for different domain types and yet have both indexed in ElasticSearch.


No.

The application domain model can be buggy and full of holes. The data store is the source of truth.


Couldn't agree more.

It's also pointless to try and convince someone who thinks otherwise either, I found it better to let the those developers shoot themselves in the foot and learn the hard way. It's the only way they'll learn.


You are right that the ES and DB schemas shouldn't be directly tied together. That is why I have specific DB functions that output the correct format for ES and only those functions need to change if the ES or DB schemas change. It is a much simpler way to do it then to have application worry about it in my opinion.


You're gonna have a lot of fun once you transcend beyond 'single central application' and have multiple apps interfacing with your data.


This is probably also quite useful if you have a lot of logic in Postgres itself (ie: triggers), which can't be captured by the calling code.


and cache invalidation is notoriously easy to get right...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: