Looking at TAO just as a data-model (instead of as a database system) and I'm re...

bcherny · on Oct 30, 2021

> how does it handle ever-changing business-requirements that can retroactively apply to existing objects and associations

If the change is backwards-compatible:

1. Update the schema definition

2. Run codegen

3. Done

If it’s not:

1. Create a new field/assoc

2. Double-write data to the new and old fields/assoc

3. Backfill data from the old one to the new one

4. Delete the old one

5. Done

> I assume each TAO object can be treated as a keyed blob with type-name (i.e. a 3-tuple of (id, type, blob) ) - in which case using an RDMBS is overkill and introduces many overheads, so why use an RDMBS?

I believe data is stored in tuples, same as most other graph DBs. I can’t speak to why MySQL in particular.

DaiPlusPlus · on Oct 30, 2021

Facebook's database is distributed - it's not as simple as "updating the schema definition" because that process-step alone is non-trivial with many cases to consider.

bcherny · on Oct 30, 2021

The process is more complicated under the hood. I’m describing it from the POV of an engineer that wants to update a field — from that POV, it really is that simple.

DaiPlusPlus · on Oct 30, 2021

> The process is more complicated under the hood.

Right, but that's exactly what I'm asking about :)

RandomBK · on Oct 30, 2021

The key is in making sure every step-wise change is backwards-compatible.

Creating a new field, double-writing, and backfilling are all relatively easy tasks in a distributed environment. Individual nodes can take their time catching up to the latest schema.

After a day or two, the engineer can verify that all data have been correctly migrated and double-written to the new ent/assoc. They then turn off double-writing to begin deprecating the old assoc. This can be done in an A/B test to ensure no major breakages.

Once the double-writing is fully turned off, the engineer checks one last time that all the data have been migrated. They check that the read/write rate to the old ent/assoc is 0, which then confirms that the old field is now safe to delete.

Overall, the trick is to think of the migration as a series of safe, backwards-compatible, easily-distributed steps, rather than a single atomic operation.

bcherny · on Oct 30, 2021

Sorry, I must have misunderstood the question! :)

pbalau · on Oct 30, 2021

> ...so I assume I'm missing something, but what?

Facebook started, as any other web app, with a relational database used as a relational database (fks, indexes, all that stuff). They built tooling around that and acquired a great deal of knowledge on how to run mysql at scale. Switching to a different system is a huge investment with no real benefits. In short, they are using mysql because inertia, caused by the difficulty to run such a system at this scale.

DaiPlusPlus · on Oct 30, 2021

> Switching to a different system is a huge investment with no real benefits

But there would be "real benefits": there are significant performance overheads, especially at Facebook's scale, with using an RDBMS compared to something specifically written for their use-case. If a different storage system has even, say, 10% better performance overall then that translates to a 10% reduction in hardware-costs, which at Facebook's scale is easily tens-of-millions of dollars per year.

The first thing that comes to mind is RocksDB - of course then I remembered just now that Facebook does use RocksDB, but with MyRocks (which combines RocksDB with MySQL - though the MyRocks website and docs don't clearly explain exactly how and where MyRocks sits in-relation to application servers and MySQL - or different MySQL storage engines... ).

pbalau · on Oct 30, 2021

> But there would be "real benefits": there are significant performance overheads, especially at Facebook's scale, with using an RDBMS compared to something specifically written for their use-case. If a different storage system has even, say, 10% better performance overall then that translates to a 10% reduction in hardware-costs, which at Facebook's scale is easily tens-of-millions of dollars per year.

And how would you know that? Based on what did you come up with those numbers? I find that hyperbole puts me off discussions of this kind.

cranekam · on Oct 30, 2021

There's some info on MyRocks here:

https://engineering.fb.com/2017/09/25/core-data/migrating-a-...

Of note is that it halved the storage requirements for the user DB tier. That's a pretty big win.

As for why MySQL persists: I no longer work at FB so I'm not up to speed on the current thinking but one thing to remember is that TAO isn't the only thing that talks to MySQL. The web tier itself still has (or had) data types that were backed by MySQL and memcache. Lots of other systems for data processing, configuration, and so on stored data in MySQL. Replacing all of that would be a huge undertaking. I doubt there's a 10% win laying around just waiting for the taking with all this work, especially after MyRocks rolls out.

Also note that MySQL isn't the only storage system at Facebook. There are several others that were specifically written for specific use cases (e.g. ZippyDB, https://engineering.fb.com/2021/08/06/core-data/zippydb/).