Serialized transactions over a broad distribution of keys isn't a huge problem, ...

voidmain · on July 26, 2012

FoundationDB co-founder here.

As any ACID database must, we choose Consistency over Availability in the CAP theorem sense. This means that when a datacenter is disconnected or poorly connected to the internet, the database is not available for writes in that datacenter. If you are serving a web site to the internet, and need it to stay available if one Amazon AZ fails, this isn't too bad.

Some applications really do need disconnected operation, and thus 'AP' semantics. (For example, a grocery list for a mobile phone, or an application that must be available everywhere if the entire Internet partitions.)

aphyr · on July 26, 2012

So you require quorum over data centers to stay up? How do you provide transactional consistency between dcs? Multipaxos over blocks of transactions with delayed failure?

voidmain · on July 26, 2012

We require a quorum over coordination servers to stay up. For example, if you only have two datacenters with actual DB servers, a third coordination server out in the cloud somewhere can act as a tiebreaker (it will be lightly loaded).

These coordination servers do paxos, but are only needed when there are failures or role changes - they don't participate at all in committing individual transactions. Normally, we only need a single geographic ping time to do a durable transaction if it originates in the current 'primary' datacenter.

For further improved latency, we plan to allow each individual transaction to decide whether it needs multi-datacenter durability. The commit process is the same, but we can notify the client of success earlier if it is willing to take the risk that a WAN partition or meteor strike violates its 'D' guarantee. ACI are guaranteed either way.

aphyr · on July 26, 2012

Gotcha. That sounds solid to me; would be a good explanation to put in your feature list.

Sounds like your coordinators are authoritative masters for transaction ordering. Does that imply a single-machine limit to throughput in a given dc? Presumably the ordering process is much less expensive than the kv store itself, so this might not be a practical issue.

What happens when nodes are asymmetrically partitioned from the coordinator and peers? E.g. a node is unreachable by a peer, but reachable by a coordinator, or vice versa?