Millions of Tiny Databases

mjb · on March 4, 2020

I'm one of the authors of this paper, along with Fan and Tao. I'm also a huge fan and religious reader of The Morning Paper, so it's really cool to see Adrian feature our work. If you don't know The Morning Paper, be sure to check out some of the other stuff that Adrian writes: deep looks at a mix of systems, ML, and even classic papers.

maxtollenaar · on March 4, 2020

Congrats on the work!

dwohnitmok · on March 4, 2020

Another AWS component with TLA+ use. I'll shill it again because I think TLA+ is one of the most practical formal methods toolkits out there at the moment. It's great. Try it out.

You're not getting rid of implementation bugs with TLA+, but it's a huge breath of fresh air as a formal documentation language.

mjb · on March 4, 2020

> it's a huge breath of fresh air as a formal documentation language.

Yeah! When I started with TLA+ I was mostly enamored with model checking and proofs. Those turned out to be useful, but the unexpected use of being a really great way to write crisp descriptions of protocols and algorithms is probably a bigger benefit. That's one of the reasons I tend to choose PlusCal over "raw" TLA+ these days: it's easier for others to read and engage with.

After publishing this paper, I had a great email conversation with Leslie Lamport about this use of TLA+. We talked about TLA+'s use as a "low ambiguity documentation" tool, and some of the cases where we've been able to resolve conversations about ambiguities in our implementation because we had the TLA+ spec to fall back on.

dwohnitmok · on March 4, 2020

Fascinating; did you guys stick almost exclusively to PlusCal (except presumably for writing invariants in TLC)?

I'm quite partial to just using straight TLA+ for everything because it's both what ultimately it all desugars to anyway and because it makes what you put in TLC and what you write for your spec the same language. Plus once you're in the mindset of TLA+, the syntax of PlusCal has always seemed more of a distraction than anything else, but it does seem that PlusCal is a lot less scary for an experienced developer with no TLA+ experience.

mjb · on March 4, 2020

Speaking only for myself, because we don’t have anything approaching a standard here, I do about 60% PlusCal and 40% straight TLA+.

Mostly the tradeoffs are the ones you mentioned. If I was the only audience of what I was writing, I’d pick TLA+ every time, but for a broader audience PlusCal can make this stuff much more approachable.

dwohnitmok · on March 4, 2020

Makes sense. One last question while I still have you here. What's the TLA+ adoption look like within AWS? I imagine it's probably still only a small minority of teams, but exactly how small are we talking (you guys are the only ones, 2-5, 5-10, or 10s?).

pron · on March 9, 2020

Just a tiny nitpick: there is no need to qualify TLA+ with anything (especially "raw", as "raw TLA" is already used for something else). There's TLA+ and there's PlusCal, a language that compiles to TLA+. True, TLA+ also refers to the gestalt of the TLA+ toolbox, but whether you're referring to the tools or to the language is often clear from context. E.g: "At my company we use TLA+, but we prefer writing specifications in PlusCal." "At my company we also use TLA+, but we write the specifications in TLA+."

topspin · on March 4, 2020

Reading this makes me think of GitHub. I recall GitHub having a large, distributed MySQL database at the heart of the system and when a partition developed the whole system faltered[1]. This seems ironic to me; git was designed to be decentralized and one can imagine a design for GitHub that did not involve a globe spanning MySQL database, or at least one that didn't directly impact the operation of all GitHub repos when it falls over.

Parts of this blog post also align well with another recent post: Simple Systems Have Less Downtime[2]:

[1] https://github.blog/2018-10-30-oct21-post-incident-analysis/ [2] https://news.ycombinator.com/item?id=22471355

collyw · on March 5, 2020

I wonder how many people actually need a distributed version control system. It seems to make git more complex than is necessary.

topspin · on March 5, 2020

I use to wonder this. Now I don't. The notion that with some other VCS I would not have a complete copy of the entire history available to me locally now seems dysfunctional. You make branches at will and it troubles no one unless you need it to. Git is a great improvement over all that came before. Some bizarre default CLI behaviors are my only complaint.

tkyjonathan · on March 5, 2020

I said this 3 years ago, the future is having an SQLite DB inside a container for each one of your customers.

/s

pjc50 · on March 5, 2020

> When I think about minimising blast radius, I immediately think of bulkheads

This is an excellent model to have for high-reliability work. There are going to be failures, so the design should provide means of containing the failures.

The paper is also good at recognising the risk of cascade failures in failover systems, where a single excessive load causes a failure - but the process of trying to move the load elsewhere also becomes overloaded.

jefurii · on March 5, 2020

I thought this was going to be an article about all the SQLite databases embedded in applications, and on smartphones and watches and other devices. Or about how Archive.org publishes metadata for individual objects in OBJECT_meta.sqlite files (alongside OBJECT_meta.xml).

webdva · on March 4, 2020

Very inspiring work and achievement. It is perhaps the equivalent of the application of quantum mechanics during the middle to late twentieth century. As such, perhaps "decentralized general computing" is more plausible than I thought.

teraflop · on March 4, 2020

It's a great paper, but I think you're hugely exaggerating its significance.

As the authors themselves point out, none of the fundamental building blocks of this system are particularly new. For example, the idea of partitioning a very large dataset into lots of independent slices, each of which is handled by its own Paxos group, is the same idea that forms the basis of Google's Megastore and Spanner, the former of which is more than a decade old.

Most of the interesting stuff in this paper is the discussion of the nuts-and-bolts of software engineering, such as testing, deployment and monitoring.

thedance · on March 5, 2020

People find this stuff exciting because most of our industry works on systems that are at least ten years behind the state of the art. A good example is HDFS, an extremely bad likeness of GFS which itself wasn't great and died ten years ago. If you describe Colossus/D in detail many people in our industry will think it's really amazing, but of course that's more than ten years old now. Many people will choose HDFS for new systems in new designs, today. You can spend your whole career without getting so much as a whiff of the state of the art.

vkazanov · on March 5, 2020

And what are the better modern alternatives to hdfs among distributed file systems?

thedance · on March 5, 2020

I would say that a distributed filesystem is a solution looking for a problem in most cases. Amazon S3 or Google Cloud Storage address some use cases, and Google Cloud BigTable is a direct drop-in replacement compatible with the HBase API but having dramatically better performance and reliability. There are other use cases that have other alternatives, it all depends on what you plan to do with the data on the filesystem, how far you need to scale it, and whether you clients are in your own datacenters or in vendor clouds.

adev_ · on March 6, 2020

> And what are the better modern alternatives to hdfs among distributed file systems?

For open source solutions, BeeGFS.

If you want to pay and you are IBM fan, GPFS.