Hacker News new | past | comments | ask | show | jobs | submit login

The Spanner design seems more resilient in the face of server failures. The initial Calvin papers call for taking the entire replica offline if a single server in the replica fails. Are there more advanced versions of Calvin that get around this?



Yes --- the current version of Calvin (in the Yale research group) does not have this limitation. We're actually not sure which paper you're talking about, but either way, it's not fundamental to the Calvin approach. In general, if a single server in a replica fails, the other servers within the replica that need data from the failed server can access that data from one of the replicas of the failed server. (We can't speak for FaunaDB, but like the current version of Calvin, it is unlikely they have this limitation.)


My understanding was that the replica would go down in order to recover the failed server. This was a side effect of the way snapshots and command logging worked. You couldn't just restore the snapshot on the failed node because the multipartition commands would have to execute against the entire replica. Instead you would restore the snapshot on every node, and roll forward the entire replica.


Yes. For log data, it's simply a matter of reading from a replica peer of the down node.

For transaction resolution it's a bit easier if you are able to assume more about the storage layer's semantics.

For example, if you store versioned values for some bounded period of time (ala MVCC), you can go to other replicas for the version required to resolve a transaction, removing the restriction that transaction resolution must proceed in lock-step across all nodes, and allows transaction reads to route to live peers assuming they have the required version of each read dependency.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: