Yes but here's the problem. Consider common scenarios like: Master goes down. Sl...

Yes but here's the problem. Consider common scenarios like:

Master goes down. Slave takes over. Master comes back. Slave goes down 10 minutes later. Repeat.

This is common in e.g. multi data center replication and is often due to transient network failures. Netflix has a great open source tool called chaos monkey that can induce lots of random failure scenarios like this or much worse. Don't get me started on transient partial failures due to latency and packet loss spikes.

The manual nature of pg replication setup makes me really nervous here. What happens when it finds itself in a state where manual intervention is needed? You are now down.

This is tolerable for big companies with dedicated SREs and DBAs and enough of them that it's easy to always have someone on call, but it's a nightmare for smaller ventures. Even for larger ventures this adds a lot of cost overhead.

Like I said elsewhere this was really the true killer feature of the more successful NoSQL document store type databases. Everything else was largely hype.

We switched recently to RethinkDB for this reason. We miss the richness of SQL (to the point that we still use PG too for warehousing and analytics) but in return we got incredible robustness across three data centers. Of course our app does not need rich queries or strong consistency 99% of the time so YMMV. For some jobs ACID and complex queries on live data are not optional.