Hacker News new | past | comments | ask | show | jobs | submit login

This depends. Even with W=R=1, you still send each read/write to every replica. If the replica that acknowledged the write fails, the other two replicas should eventually receive the write.

But--what happens if the write messages to the other two replicas drop? Well, typically the coordinator will hand the write off to replacement replicas (hinted handoff/sloppy quorums). Alternatively, when you detect the message loss via a timeout you could retry. But what happens if the coordinator dies and can't retry? Then you'd better replicate your coordinator. That gets hairy, so hinted handoff and sloppy quorums are easier. However, whichever way you go, it's definitely possible to handle an arbitrary (still fixed) number of node failures without data loss.

In general, though, sloppy quorums/hinted handoff solve this problem. I haven't heard any data loss complaints with Riak/Cassandra/Voldemort due to replication, but I'm very interested to hear if you have.

You can definitely extend PBS to node failure cases/sloppy quorums/hinted handoff. The main reason we didn't was because we don't have good failure data. There's nothing stopping you, and, as we point out in the paper/backup slides, you can potentially hide this in the tail (e.g. the .01% of stale cases) provided your DB "does the right thing".




Indeed, if your coordinator isn't a replica you obviously need it to fail as well in order to get data loss -- I was thinking of the symmetric case where every replica is a coordinator and W=1 reduces to "store locally, broadcast, and send an ACK".

And yes, of course there are hairy ways to solve the problem; I just would have liked to see it mentioned so that people realize that it exists.


You make a good point. Without additional safeguards/depending on implementation, W=1 can indeed mean "durability of one".

This also depends on your failure model. If your node crashed (RAM cleared) and your durable storage broke, you're in trouble. If the data was durably persisted and you just have to restart the node, it's a better situation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: