This is a much more recent presentation by Kyle (the author of the linked article) with a more mature version of his Jepsen tool. I imagine he'll get around to a written version of his findings soon, until then it's worth the time to watch this and learn a bit about distributed systems, databases and testing. https://www.youtube.com/watch?v=XiXZOF6dZuE
From the point of view of a "devops" developer - who has done hard time in QA and DBA as well - this requirement changes drastically when performance and scaling enter the equation. The scale slides quickly from "lose no data, ever" to "we can lose a few seconds of transactions if it speeds up the web page" to "we can lose a lot of data and still be OK, as long as we're still online".
Seconds, or even minutes worth of lost data do cost the company, but not nearly as much as poor performance. And unfortunately, developers tend to over-value data in the equation, leading to decisions which cause a company problems when it comes time to grow.
Ultimately, the best tools for ensuring business continuity (with few exceptions) is redundancy coupled with a set of proper backups.
Wouldn't message queues help you with this? They can be easily scaled horizontally to handle any write load and correctly insert data to the database safe over time.
You can also shard your database to distribute write load.
All of those absolutely help, at the cost of added complexity, additional points of failure, and hardware/VPS costs. And you still risk losing data, or at least data integrity.
Not to mention there are still hard limits on how quickly you can insert data into a database with 100% durability (which is, of course, impossible, but another topic entirely), and there are scales where even these mitigation tactics can't help you anymore (in particular, online casino games have this problem since they are persisting the state of multiple players very frequently).
They are shipping a closed source black box and advertising their own demos and tests for a while. Just like this one. Not sure how much I would trust that more than just having a marketing statement saying "oh yeah we are the most stable, fastest, etc".
That and the quite low limits for transaction duration (5 seconds!) make FoundationDB a no-go for me.
Closed source + very narrow limits due to system design makes this a very hard sell.
It's a pitty because they do seem to have some decent ideas in there. I like the layers that build more complex data models upon a transactional distributed key-value store.
Great read! I think this is a first time I've seen a really good, structured failover test that contrasts postgre, redis, and mongo (I don't know much about Riak). It would have been interesting to see mysql in there as well.
This is an awesome article. I would love to see the tests packaged in such a way that we could port them to other DB's easily and perhaps do something like the web framework shootout, but with database consistency.