Migrating Facebook to MySQL 8.0

takeda · on July 22, 2021

Perhaps it's due to me being in the PostgreSQL camp, but what happened to MariaDB? I was convinced that's where MySQL developers went and that's where development proceeded, while MySQL was just half dead, because Oracle did not care about it.

lucideer · on July 22, 2021

That was the expectation but the Oracle camp surprised everyone by actually giving it some attention, and meanwhile MariaDB's promise of being a "100% compatible drop-in replacement" came with more and more caveats with each release (as is to be entirely expected with any fork).

When you consider the network effect of all the stacks already heavily invested in MySQL, all Oracle really needed to do was put in a modicum of effort to MySQL to stave off the MariaDB migrations (and from my understanding they've put in quite a bit more than a modicum).

Lastly, Postgres is an interesting player. Migrating from MySQL to Postgres might be a scarier prospect than to MariaDB, but since the Oracle acquisition a lot of MySQL-veterans were looking for something non-MySQL for new projects and in that I get the sense Postgres beat out MariaDB.

munk-a · on July 22, 2021

I carried out a (quite ill-advised in retrospect) migration from MySQL to Postgres on a mature codebase with absolutely no automated test coverage and it went pretty smoothly. We had some bumpy performance directly after the production migration but we were able to reach parity after about two months of observing and adding targeted indices.

I absolutely adore Postgres's tweaks to the SQL language, that dialect is amazing and has served us extremely well. And, since we passed pg10 a while back, the performance tuning you can do on it is pretty amazing.

twic · on July 22, 2021

> I absolutely adore Postgres's tweaks to the SQL language

Do you mean additions to the SQL standard, or bits of the standard that MySQL doesn't implement, or something else?

The reason i ask is that i also enjoy writing SQL for PostgreSQL, but as far as i know, i am sticking to standard SQL. Perhaps there are things i'm missing, or things i like which i haven't realised are nonstandard!

felixfbecker · on July 22, 2021

There is definitely stuff that is non-standard, like the casting with :: in addition to CAST AS.

But it's true that PSQL sticks very closely to the standard or only adds on top of it, while MySQL is very uncomformant.

Funnily enough, I actually first used Postgres over MySQL a long time ago when I moved from PHP to NodeJS, and was annoyed that MySQL uses backticks for identifiers while PSQL uses SQL standard double quotes, because in JS multi-line strings (for SQL queries) are also use backticks and I was annoyed by escaping. Sometimes it's the little things...

petergeoghegan · on July 22, 2021

It is practically essential for any RDBMS to provide non-standard statements and features. For example, even CREATE INDEX isn't in the SQL standard, since the SQL standard deliberately avoids talking about what it considers to be implementation details. This is probably a good goal in many ways. But what counts as an implementation detail is open to interpretation.

You could say the same thing about the isolation levels. They are supposed to be platonic ideals, free from any implementation baggage. But the reality is that you can tell that the people that originally defined how they work were mostly (perhaps entirely) thinking about old school two-phase locking.

I'm not a critic of the standard -- it's imperfections (which are arguably contradictions) reflect real world differences that are hard or impossible to resolve. Just pointing out that this is how it is.

munk-a · on July 22, 2021

Oh, it's totally fair to criticize the standard. Those clowns lock their deliberation and standard definitions behind huge paywalls.

The best way to learn the SQL standard is to read the Postgres docs since they clearly call out non-compliances - there is no other easy to access and clear explanation of the SQL standard.

dzolob · on July 23, 2021

Any “platonic ideals” mention outside its natural scope deserves attention.

dmitryminkovsky · on July 23, 2021

You don't think the backticks lend a nice old school vibe?

paulddraper · on July 22, 2021

> as far as i know, i am sticking to standard SQL

PostgreSQL documentation is very good about comparing each feature to the standard.

For example, the SQL standard specifies that triggers fire in the historical order they are added to the table.

PostgreSQL chose to fire them alphabetically.

Technically nonconformant, but in practical terms, far better.

dijit · on July 22, 2021

Also the c++ bindings are very mature.

This was surprising for my old tech lead who was sheepish on moving away from MySQL.

He would later refer to the MySQL bindings as “brain dead”

ipaddr · on July 22, 2021

Why would alphabetically be better? Wouldn't you want first in first out?

setr · on July 23, 2021

Alphabetical makes it so two databases with the same triggers have the same behavior. FIFO breaks that, because now it depends on trigger construction order (eg when dump/restoring) and is totally invisible to the user.

Additionally, modifying a trigger by drop/creating it would change behavior, unless you also drop/create every trigger (in the original order)

ipaddr · on July 23, 2021

Thank you most important thing I learned today.

zaphirplane · on July 24, 2021

What is an example where the order of execution is required, not contrived or bad practise. Looking for a good use case

setr · on July 24, 2021

Easiest I can imagine is a user-facing id that gets incremented by trigger (e.g. select max(col)+1 — autoincrement/serial doesn’t guarantee contiguity, or even reasonable increments…).

And then any second trigger that relies on that value

paulddraper · on July 24, 2021

Oh, I think it's probably bad practice to knowingly depend on that.

But I also think that it's unnecessarily problematic to make the order non-deterministic.

Pamar · on July 23, 2021

Not only what @setr answered but consider also that by having a consistent ruling you can actually design or alter the order of triggers just by naming them according to a specific scheme.

It just makes easier to understand what happens when triggers are invoked, without having to actually force specific cases by direct experimentation.

I did not even know that the standard prescribed a FIFO order, looks like a pretty bad idea to me, actually.

jolux · on July 22, 2021

It means that the order they fire is imperative and determined by how you run your migrations.

paulddraper · on July 23, 2021

The problem is that it doesn't leave you a great way to adjust order. It's just kinda weird.

(To be fair, table columns share this characteristic.)

marcosdumay · on July 23, 2021

> To be fair, table columns share this characteristic.

The table columns order is meaningless, but the triggers order is very relevant.

paulddraper · on July 24, 2021

Table column order is not meaningless.

``` INSERT INTO foo SELECT * FROM bar ```

The harder problem however, is order in composite types.

marcosdumay · on July 24, 2021

Yeah, I do really recommend people not to do that sort of thing when they ask me.

throwaway2037 · on July 23, 2021

Why was this down-voted? It is a good question. I thought exactly the same myself. And the reply from @setr is helpful. (Hat tip) I never thought about the impact of dump/restore!

gunapologist99 · on July 22, 2021

If you use any of the really amazing JSON or Array stuff, you're stepping outside of the standard.. but it's worth it!

dkersten · on July 22, 2021

I absolutely love Postgres' JSON and array support. Its saved me from needing a document database in a few cases, which drastically simplified the overall deployment (one DB instead of two, or trying to represent relational stuff in a document DB). I really like JSON for stuff that is user defined, that I don't need to query into, but postgres' support means I still can should I ever need to. Which has happened once or twice for debugging.

1propionyl · on July 23, 2021

100%. I've dealt with many small-to-medium sized projects where the usual approach would have been some form of EAV mess.

Just using a BSON column for such attributes hits the Pareto 80/20 sweet spot, _and_ I can query against them much more easily than in an EAV schema (far fewer joins).

They're great for when the schema is mostly normalized, but there's just a little bit of denormalized stuff that doesn't warrant completely architecting the schema around it.

fn1 · on July 23, 2021

> but postgres' support means I still can should I ever need to.

And since postgres supports indices on arbitrary expressions you can use CREATE INDEX for the json-queries in your WHERE:

CREATE INDEX ON publishers((info->>'name'));

dkersten · on July 23, 2021

Yes, Postgres' jsonb and array support lets you not only query into these fields, but also, like you say, index into them too. Its really the best of both worlds: you get a great solid relational database, but if you need document database features, you don't really lose out on anything by using json, except that the queries are a bit more ugly. I'm super happy with postgres these days, its rock solid and full of fantastic features.

mdaniel · on July 23, 2021

The word "any" there is mildly misleading about the JSON things, since there is now SQL/JSON which PG supports: https://www.postgresql.org/docs/13/functions-json.html#FUNCT...

It would be horribly painful to stick to _just_ SQL/JSON for manipulations, and I have absolutely zero idea what the coverage is in other platforms, but portability isn't completely off the table like it used to be

eatonphil · on July 22, 2021

MySQL does not conform to the standard in a number of ways. On the one hand, thankfully you can turn on ANSI MODE. On the other hand if you're using MySQL you've probably got a bunch of devs used to their non-ANSI SQL and turning on ANSI MODE may break stuff they do.

https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sql-mo...

munk-a · on July 22, 2021

Personally, back before JSON became a standard thing - I made heavy use of HSTORE to clean up some insanely cludgey string packing BS we'd been doing - leveraging recursive CTEs using the UNION definition allowed us to move a fixed tier system into a self-referential one that saved us a lot of headaches as well.

Other folks have mentioned :: instead of CAST AS (which is really nice when you need to write it a bunch of times) but it's mostly the diverse built-in function set and constraint support that keeps me an enthusiastic Postgres supporter.

jacobmischka · on July 22, 2021

I assume they mean non-standard additions in PostgreSQL. A lot of the very handy pg functions are non-standard, as well as the rich text matching capabilities, many data types, and even the RETURNING clause, I believe.

tapoxi · on July 22, 2021

We also did a MySQL 5.7 to PG 9.6 migration (thanks to https://pgloader.io/) and we got some pretty immediate speed improvements (2-5x).

We shot ourselves in the foot a few weeks later with some poorly written RLS policies, but it was overall a smooth experience.

dijit · on July 22, 2021

pgloader is great.

Sadly for me when I tried to make a PoC of Postgres for our eCommerce platform I found that we had violated our constraints and lost data.

That was the day I went from MySQL agnostic to MySQL hating.

It largely doesn’t matter that the defaults could be configured to be more strict. We live in a world where people are trying to avoid hiring sysadmin and that database was set up by a developer. It should not have been a default.

evanelias · on July 22, 2021

> It largely doesn’t matter that the defaults could be configured to be more strict. We live in a world where people are trying to avoid hiring sysadmin and that database was set up by a developer. It should not have been a default.

Everyone agrees with this. That's why they changed the default nearly 6 years ago.

Yes, the old default was terrible. But there's no way to change the past, so why complain about it for years and years?

All versions with this bad default have hit end-of-life, what more can be done?

lucideer · on July 22, 2021

I am very much an example of one of those previously MySQL devs that now uses Postgres for new projects, though I've never done a migration between the two.

code_duck · on July 22, 2021

I had an app with a fairly complex and large set of tables and data and migrated it from mySQL to pg with very little trouble. It was a few years back, but from what I recall a re-wrote a few queries in my data layer and that’s about it. Much easier than it was to setup nginx to take over some Apache duties.

sltkr · on July 22, 2021

I was curious so I looked up the differences here:

https://mariadb.com/kb/en/mariadb-vs-mysql-compatibility/

It doesn't sound too bad. The incompabilities seem to be mostly stuff like not being able to generally use replication from MySQL to MariaDB or vice versa. But I would guess that's not a very common case anyway: most users are either developers that want their software to work with both MariaDB and MySQL (which can be achieved by sticking to the very large intersection of supported features), or they are users that want to switch from MySQL to MariaDB (which seems well-supported). Running a mix of MariaDB and MySQL servers and expecting to be able to replicate between them seems like a particularly uncommon setup (though maybe it's useful during a migration).

hellcow · on July 22, 2021

The issues I encountered while using MariaDB in place of MySQL were subtle -- most things worked as expected, which makes the issues so much more unexpected. But at the end of the day, the same migrations produced a different schema.

Arguably MariaDB made the correct decision to fix some MySQL 5.7 bugs that we were unwittingly relying upon, but the different behavior caused issues for us all the same.

Rather than worry about MySQL vs MariaDB and whether the one I need is packaged and available for my distro/OS of choice, I reach for Postgres on new projects.

PeterZaitsev · on July 22, 2021

I think this is a key difference - where MariaDB continues to evolve MySQL 5.6/5.7 codebase Oracle chose the "big leap" with MySQL 8 - introducing data dictionary and many other internal architecture changes. This is one of the reasons migration to MySQL 8 can be more painful than releases before it but for good reason

hodgesrm · on July 22, 2021

Binlog incompatibility makes migration a lot more difficult at scale.

tomxor · on July 22, 2021

> meanwhile MariaDB's promise of being a "100% compatible drop-in replacement" came with more and more caveats with each release

Anecdata: I've been using mariadb as a drop in mysql replacement for over 5 years in production, and it's been working seamlessly.

I'm no DB expert, and perhaps my use case isn't complex enough or written in a highly mysql dependent way - but I felt like someone should chime in since there's not much positivity toward mariadb in this thread so far. I'd be interested to hear from some other people with actual mariadb experience.

cknoxrun · on July 23, 2021

We migrated a high-traffic, complex site with a huge amount of data to MariaDB about 5 years ago as well, and it's been solid so far. We use Galera as well, which has performed and scaled really admirably.

Shorel · on July 23, 2021

I have a lot of stored procedures and functions that were converted from Oracle to MySQL, and they also work just fine in MariaDB.

I guess any actual incompatibility is in fringe cutting edge features, which is where both databases have diverged.

lucideer · on July 23, 2021

I think it's still pretty good. I ran into some issues with json support some years ago, but querying json may be an edge case (as opposed to storing text & creating your own indices)

skrebbel · on July 22, 2021

Anyone on the in-crowd who can share a bit about Oracle's motivations here?

I recall a Bryan Cantrill talk about how Oracle is this completely amoral machine that just wants to make money, and how that drives their every decision [0]. Is this no longer the case? Or is it, and if so, how does improving MySQL and giving it away for free making Oracle money? Or was it always hyperbole and Oracle is improving MySQL because they want to be nice?

[0] https://www.youtube.com/watch?v=-zRN7XLCRhc&t=2047s, that link starts at the beginning of what's popularly known as "the lawnmower rant", it's delightful.

zdragnar · on July 23, 2021

Oracle paid a lot of money to buy Sun, and basically ended up with two products- selling support for Java and MySQL. Letting MySQL rot would have been pretty bad for their pocket book (especially as others have noted that postgres has been cutting into their market heavily).

In Bryan's defense, the lawnmower rant is pretty true- they are a simple company. They focus heavily on making money and abandoning things they aren't in a good position to do so from.

Selling per-cpu-core licenses and support contracts, great! Experimenting on or converting small projects into new markets? Not great. Fleshing out OpenSolaris into a product that could rival linux? Dead end.

Maybe it could have worked out, but the lawnmower didn't care about a product that had too few users to make money from. OTOH, even Facebook uses MySQL... All they need to do is invest enough to maintain market share to get support contracts.

pjmlp · on July 23, 2021

> Experimenting on or converting small projects into new markets?

Bringing MaximeVM out of research labs into the market as GraalVM.

First RDMS to support Perl and Java as stored procedures.

Join venture with Sun for Network Computer thin clients.

> Fleshing out OpenSolaris into a product that could rival linux?

Every OpenSolaris sales is one sale less from Unbreakable Linux, so in a day and age where UNIX === Linux, a company like Oracle has chosen what makes more sense monetarly.

I doubt that IBM is getting lots of new AIX customers.

And in the end, except for a timid offer from IBM, no one cared to rescue Sun.

radicalbyte · on July 23, 2021

FANG would buy them if they had somehow managed to stumble forward to the mid-2010s.

pjmlp · on July 23, 2021

Zero offers from any of them, so.

radicalbyte · on July 30, 2021

I don't think that FB were in a position back then. I was very surprised at the time that Google didn't buy them. Microsoft would never had gotten it past the regulators.

takeda · on July 22, 2021

I guess I'm cynical, but to me looks like Oracle renewed interest in MySQL after PostgreSQL gained momentum. MySQL is not really a competitor to Oracle but PostgreSQL overlaps with both of them.

beberlei · on July 22, 2021

Oracle is selling Enterprise features and Support, i am sure they make a lot of money with MySQL.

kaba0 · on July 23, 2021

Oracle, compared to their image is quite technically adept — they did a splendid job with OpenJDK and they have plenty of long-term R&D projects like GraalVM.

skrebbel · on July 23, 2021

Yeah so, all of that seemed to happen after that Cantrill talk. How did Oracle suddenly turn from a pure license contract squeezing company into a tech leader again? I mean, they had to hire great engineering talent to do all this. And convince them to work for Oracle. What's the narrative I'm missing?

kaba0 · on July 23, 2021

I'm not familiar with the internal stucture of Oracle, just tired of the imo unfair bashing of them (at least compared to all other companies).

But I guess they must have had at least sane management that tried to keep the existing mindshare of acquired projects (as far as I know many people went from Sun to Oracle directly for example), as well as finding new talent. And also, the acquisitions made sense and were done at a comparatively good price so I guess they are good lawnmowers with long-term plans (but terrible PR)?

ksec · on July 23, 2021

>Yeah so, all of that seemed to happen after that Cantrill talk.

The talk was in 2011, Oracle purchased Sun in 2010. So basically their continuous work on OpenJDK has nothing to do with the talk at all.

zaphirplane · on July 24, 2021

I heard they were poaching whole teams with a massive pay bump

truffdog · on July 23, 2021

Mr. Cantrill might be engaging in hyperbole. Big companies are big though- lots of people in charge of lots of things, and often they wander off in different directions. It doesn't make sense to model their behavior as if directed by a single hive mind.

tonyedgecombe · on July 23, 2021

Companies do have a culture though, the reason Amazon warehouse employees have to piss in a bottle can be traced back to Bezos and his laser like focus on controlling costs.

kaba0 · on July 23, 2021

Big enough companies are the stereotypical paper clip AIs everyone fear, but with money. Anything seemingly moral action by them is simply a cost-analysis on how the public opinion will effect their profits.

brianwawok · on July 23, 2021

No it’s 100% still correct.

cerved · on July 23, 2021

likely support and keeping customers from migrating to postgres

and yes, Oracle is 110% focused on the bottom line. Preferably this quarter

pyrophane · on July 22, 2021

I like Postgres but still turn to MySQL for production use most of the time for a few reasons:

* Permission management on Postgres is much more painful. * The need for an external connection pooler makes postgres more annoying to set up. * Better quality docs on performance tuning MySQL.

brianwawok · on July 23, 2021

Do you really need a connection pooler for most PG installs? I run a SaaS company with a lot going on, and haven’t had to touch a pooler. Average about 500 open connections. That’s a few, but on a DB server with 200GB of memory it’s nots really breaking anything.

hans_castorp · on July 23, 2021

> * Permission management on Postgres is much more painful.

but way more flexible and powerful

I only work in the Java world where all application servers have the connection pooling built in, so I never had the need for an external connection pooler.

skeletal88 · on July 23, 2021

I joined a company where most stuff is done in Java and tried to explain to colleagues that sn external connection pool would ne good. They didn't know such things existed at all, and then refused because in their mind the application-specific intetnal pools were all they needed. Which was and is totally wrong - the application server internal pools, like HikariCP only share connections between threads inside the server. What if now we have 4 instances of the same server running (because of a deployment, where 2 new ones are spawned, started and the 2 old ones aren't taken down until the loadbalancer determines that the new o es are running ok. Now you need 2x the number of connections available than you would ususlly need. Once we had run out of connections when for some reason aws went crazy and started deploying too many instances of the application.

freedomben · on July 22, 2021

On Fedora/RHEL you get maria when you "install" mysql. I've never had something fail to "just work." That said I don't do anything crazy, so I'm sure there are cases where they don't work.

z3t4 · on July 22, 2021

When I started working with databases about 20 years ago talking to peers it was common to have a layer in-front of the database so that you could easily replace the database. It felt stupid until I actually had to do a database migration... So do you still put a layer infront of the database so that you can for example easily change from mySQL to Postgress? Given there are so many open source database I think there is less need to do so... Or do you go all in vendor lock in? Or some middle ground?

cubano · on July 23, 2021

I believe you are talking about the Microsoft ODBC API which was language and platform agnostic data acquisition layer. Early versions of it had pretty bad performance issues if I remember correctly, but eventually it became somewhat useful.

I believe that developers back in the 90's wanted a way to "drop in" various low and higher cost DBMS systems to their client-server based apps without the need of hiring full-time database sysops.

astrange · on July 22, 2021

That would be an ORM or something similar, I think. It's not realistic unless you're always testing with two databases from the start, because their implementation details leak through.

It can be useful if your program is simple enough to work with both MySQL and SQLite, but often you should just stick with one DB and get to use all its features.

davecanderson · on July 23, 2021

[ServiceStack OrmLite][1] does just that, all our development is done locally using SQLite but then deployments use MySQL. We do fall into the 'program is simple enough' category though.

[1]: https://github.com/ServiceStack/ServiceStack.OrmLite

ris · on July 24, 2021

You can indeed have that and many people do, but the problem with going that route is you can only use the lowest common denominator of features of any of the target databases, which wipes out a lot of the performance or convenience features you'd get from either.

(arguably SQL itself is/was supposed to be that common layer in the first place, but db vendors always want to be able to present users with new features that are not yet standardized)

scrapcode · on July 23, 2021

I can see the business of open source software support coming to play as well. I went from freelance development to more of a govt contracting role and just recently pivoted my career back towards tech. I never saw this side of the business but it is thriving from where I sit.

agumonkey · on July 22, 2021

sounds like their java strategy until recently

mathnode · on July 23, 2021

MariaDB is massive now.

mysql and MariaDB diverged from versions 5.6 and 10 respectively, although MariaDB can and does still merge some fixes from mysql; oracle don’t do that.

MariaDB and the original InnoDB developers are now doing heavy rewrites with big performance gains, replication options like multi-source, spider for sharding and galera are all built in and actively maintained.

Oracle attempted to extort anyone trying to licence a mysql based product(rip infobright), so infinidb and clustrix, now respectively called columnstore and xpand, have new homes with MariaDB and are core parts of the system. Online schema changes, easy horizontal scale (huge numbers that make db2 and oracle sweat), s3 backed tables, non blocking backups, flashback, versioned tables.

The list goes on, and many industry veterans from ibm, oracle, sybase etc are finding a home there and helping MariaDB grow.

There are new public developer resources and the connectors are feature rich and well loved. There have been a few free online conferences during the pandemic, and MariaDB now has its own place at fosdem not just bundled in with mysql.

If you go enterprise, there isn’t much out there as accessible as maxscale for transaction failover!

buchanmilne · on July 23, 2021

> replication options like multi-source

MySQL 5.7 supports multi-source replication: https://dev.mysql.com/doc/refman/5.7/en/replication-channels...

> spider for sharding and galera are all built in and actively maintained.

Galera is available in MySQL-compatible binaries from Percona (Percona-XtraDB-Cluster): https://www.percona.com/software/mysql-database/percona-xtra...

MySQL also has their own Paxos-compliant clustering solution (called "Innodb cluster"), first available in 5.7, but much better in 8.0:

https://dev.mysql.com/doc/refman/5.7/en/mysql-innodb-cluster...

I played with Spider in MariaDB 10.3, and it didn't satisfy the requirements we had, and it doesn't seem to have progressed significantly since then.

Yes, it's neat that MariaDB merged it and is keeping it working, but there don't seem to be many use cases where it ends up being significantly better than e.g. multi-source replication (the use cases where you're going to run out of disk, in most cases spider's un-implemented push-down join etc. make it too slow to be practical either).

> Online schema changes,

https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-op...

(including "instant column add")

> easy horizontal scale (huge numbers that make db2 and oracle sweat), s3 backed tables, non blocking backups,

MySQL has had "MySQL Enterprise Backup" for a long time. Percona wrote a similar implementation (XtraBackup), which works for MySQL, Percona, and MariaDB. AFAIK MariaDB copied XtraBackup as mariabackup (not criticising them for doing this, just pointing out that Percona did the original heavy lifting on this).

> flashback, versioned tables

Yep, versioned tables are a nice feature that I don't think are available in any other MySQL-like DB.

> If you go enterprise, there isn’t much out there as accessible as maxscale for transaction failover!

Not even ProxySQL ( https://proxysql.com/ )?

To be clear, I like MariaDB, but Oracle isn't doing a bad job on MySQL. MySQL 5.7 is pretty solid, and they have been improving defaults (while leaving the option to enable compat settings which are necessary when migrating a large deployment between versions), and MySQL 8.0 has some really nice improvements. For some of the best features from each, Percona (either standard Percona 8.0, which gives features like MariaDB's thread-pool, more comprehensive encryption including redo-logs etc., or XtraDB-Cluster to go with Galera instead of Innodb Cluster) offers quite a few benefits/improvements (and is contributing a lot of bugfixes/improvements upstream).

(My team runs quite a big installation, currently running Percona 5.7, and migrating to MariaDB probably won't be practical or justifiable considering replication compatibility which is a requirement to do no-downtime migrations)

unknownOrigin · on July 22, 2021

What happened to MariaDB? Not much, but unlike Netbeans (which is a complete disaster under Apache... a dying project maintained by a skeleton crew), it still has a commercial entity behind it, so it's trucking along relatively nicely. I mean, it's not like Oracle is investing tremendous amounts of time and money into MySQL either... there are still many, many things in the MySQL ecosystem that are second class citizens when compared to the big boys. Features like materialized views are yet to materialize etc.

pjmlp · on July 23, 2021

Well, when everyone and their dog is buying InteliJ licenses, most people that still care about doing FOSS work for Java IDE end up contributing to Eclipse instead.

Which is a pity, Netbeans is my favourite IDE and still has features hardly replicated in other Java IDEs, including the InteliJ Ultimate edition.

JetBrains would rather sell two licenses than allow for integrated development and debugging of native methods.

hans_castorp · on July 23, 2021

> but unlike Netbeans (which is a complete disaster under Apache

I disagree with this

PeterZaitsev · on July 22, 2021

I do not think there was the time when MySQL was quite dead under Oracle ownership. As far as I aware Engineering team in particularly have only been steadily growing.

It is fair to say Oracle did not invest in ease of Oracle->MySQL migration but they have been making a steady progress on making MySQL better database for modern applications

thayne · on July 23, 2021

MariaDB isn't even the only alternative. There's also Percona. From what I can tell all three have pretty decent followings and are actively developed, with some cross pollination.

Rapzid · on July 22, 2021

One thing of extreme value to come out of MariaDB is the JDBC driver that explicitly allows you to bundle it wherever you want.

MangoCoffee · on July 22, 2021

MariaDB is getting more corporate. it feel like Michael Widenius's MySql second act.

ksec · on July 22, 2021

MySQL 5.6 was released on 5 February 2013.

MySQL 8.0 was released on 19 April 2018.

>The 8.0 migration has taken a few years so far. We have converted many of our InnoDB replica sets to running entirely on 8.0.

At the scale of Facebook I wonder if they are the largest MySQL user on the planet.

And I take this opportunity to ask, does anyone know how does the MySQL roadmap works? What sort of features are coming or when is 9.0 expected to arrive.

tpetry · on July 22, 2021

So there has not been any major improvement in MySQL for 3 years? There hasn‘t been released any minor version? Can someone explain the current state more?

PeterZaitsev · on July 22, 2021

No. MySQL 8 changed its engineering approach with features released in minor releases.

In this regard it is like MacOS X or Windows 10, whatever you prefer where features are shipped without major version change

tpetry · on July 22, 2021

Sure, and this is great. But The newest release is 8.0.26, so there hasn‘t been any minor release yet - in 3 years. And i can‘t find anything about new MySQL features or planned features except the big 8.0 release.

fipar · on July 22, 2021

There have been several minor releases in the past 3 years: https://dev.mysql.com/doc/relnotes/mysql/8.0/en/

taywrobel · on July 22, 2021

In semantic versioning parlance those would all be considered "patch" releases, not a "minor" release. The minor version segment in a semantic version is the 2nd number, so most people would consider the next "minor release" for MySQL to be 8.1.0.

If following the guidance set out at https://semver.org, a patch release wouldn't add any new functionality, it would just address bugs in a backwards compatible way. A minor version would introduce new functionality in a way that doesn't violate backwards compatibility.

Quickly scanning the release notes for some of the MySQL 8 versions, they seem to be generally sticking to this; most of the versions fixed some bugs or unintuitive behavior, but it doesn't look like much new functionality has been introduced since 8.0.4, the last release candidate for the major version bump.

PeterZaitsev · on July 22, 2021

MySQL does not follow semantic version and there have been features introduced in minor released including some incompatible changes as you can see for example https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-24....

fastball · on July 23, 2021

That's still a patch release not a minor release.

bonzini · on July 23, 2021

If it does not follow semantic versions, it is just a third-level version number. It's like Linux's 2.6.x series.

Tostino · on July 22, 2021

Yeah, no thank you. I'll stick with Postgres.

evanelias · on July 22, 2021

Many major new features have been introduced in 8.0 point releases. Some examples:

* 8.0.12 added INSTANT algorithm for adding new columns to a table

* 8.0.13 added DEFAULT expressions (column default values may now be any arbitrary expression, not just a constant)

* 8.0.16 added CHECK constraints

* 8.0.17 added the CLONE plugin for rapid physical copying of a db

* 8.0.23 added INVISIBLE columns (excluded from SELECT *)

These are just a handful among many others... basically MySQL 8 does not follow SemVer at all, nor does it claim to. (Although personally as a tool developer in the mysql ecosystem, I selfishly wish they did!)

dijit · on July 22, 2021

That’s pretty scary, isn’t it?

evanelias · on July 22, 2021

No, since MySQL 8 doesn't ever claim to use SemVer. They have an explicitly stated policy of including features in point releases. Why would it be "pretty scary" that a non-SemVer software doesn't use SemVer?

SemVer is an arbitrary versioning scheme, not a universal standard. It's the operator's responsibility to understand the versioning scheme prior to upgrading.

throwdbaaway · on July 23, 2021

It is pretty scary because most of these new features or enhancements are buggy. By not adopting SemVer, users of MySQL who prefer stability (oxymoron?) have no obvious upgrade path. They could be happily using version 8.0.123, and then upgrade to version 8.0.126 to get some security fix, and suddenly encounter a bunch of functionality and/or performance regressions.

Some examples...

WL#10310 Redo log optimization:

Feb 2018, initial commit 6be2fa0bdbba, landed in 8.0.11 (Apr 2018)

Jun 2018, crash regression fix commit 270d18368650, landed in 8.0.13 (Oct 2018)

Apr 2019, performance regression fix commit 75c4f7161a56, landed in 8.0.18 (Oct 2019)

WL#5655 - InnoDB: Separate doublewrite file to ensure atomic writes:

Dec 2019, initial commit ce14ef911, landed in 8.0.20 (Apr 2020)

Feb 2020, data loss regression fix commit c1bc61dc7, landed in 8.0.20 (Apr 2020)

May 2020, stall regression fix commit 00b284707, landed in 8.0.21 (Jul 2020)

In comparison, Cassandra during the DataStax days was extremely buggy, but at least it tried to follow SemVer. So the operators can follow some simple guideline, e.g. >=2.0.14 is fine, >=2.1.17 is fine, >=2.2.9 is fine. Of course regression can still happen, but that would be an exceptional case.

kaba0 · on July 23, 2021

This is a misconception that new features introduce more bugs than eg. patches/minor version upgrades. I can only bring up the OpenJDK project as example but this sort of change happened there as well - what used to be JDK 8.xxx and everyone jumped to that without second thought is the exact same thing as JDK 9->10. It’s just management’s backwards thinking that somehow the former is safe to apply while the latter is not.

throwdbaaway · on July 23, 2021

There is certainly no hard and fast rule. In the JDK case, in terms of language design, my observation is that with Brian Goetz in charge, it doesn't matter too much whether it was the slow release cycle back in the pre-9 days, or the shorter one from JDK 9 onward -- whenever a new feature is introduced, all kinds of compatibilities are being maintained at all costs.

(Except Project Jigsaw of course, but I'd like to think that he got overruled by his boss)

As for bugs, I will just quote Gil Tene from https://www.theserverside.com/opinion/Dont-ever-put-a-non-Ja...

> Play with them on your laptop, but don't use a single feature, and wait for the LTS.

Personally I think the LTS model is great. Features get rolled out as soon as they are ready, and then get stabilized after being used by early adopters. Meanwhile, production can stay on LTS and continue to get just the necessary fixes.

evanelias · on July 23, 2021

In my experience, it is quite an exaggeration to claim that "most" post-GA 8.0 features or enhancements are buggy, especially in the general case. Yes, some of them may have edge-case bugs or performance issues, but typically only affecting some users with very specific environmental or workload situations.

Operators must test before upgrading. Luckily some third-party software makes this easier, e.g. Percona's pt-upgrade or ProxySQL's mirroring feature.

I'll state again unambiguously, I'm not personally fond of 8.0's release policy of rolling out new features post-GA. But I view it as an annoyance, and at least one with some upsides (not having to wait 2 years for a massive feature dump all at once) rather than being uniformly negative or "pretty scary". Personally I was careful about testing point release upgrades before 8.0, and I'm still careful about it now.

dijit · on July 22, 2021

I’m not saying semver is a universal standard but this is _wildly_ different to everything else.

Nobody expects defaults to change between patch/minor versions.

This is another example of how MySQL does not give a flying F about the footguns it leaves lying around.

Putting the onus on the user is not reasonable; especially in a world where we’re trying to reduce toil or get rid of ops completely.

evanelias · on July 22, 2021

In another subthread, you're (justifiably) unhappy with how they didn't change the strict-mode default soon enough; but here you're upset that some defaults changed too fast?

Also my comment above in this subthread wasn't even about defaults. I only mentioned some features added, and you responded saying it's scary and then talking about defaults?

You don't like MySQL, I get it. Regardless of reasons, I don't think your view will change, so why enter every HN thread about MySQL just to repetitively bash MySQL? What's the purpose of this?

dijit · on July 22, 2021

The purpose, mainly is to warn people of the pain I’ve suffered. My life should be a lesson here because honestly it is _painful_ to realise you have lost monetary data and have to go hat in hand to the CTO and CFO to explain why.

The strict mode setting should have been the default since the beginning. It is absolutely unthinkable that it wasn’t, the only justification I can think of is that:

a) it wasn’t built with that in mind and thus was experimental.

b) was not enabled to ease adoption (this is kinda evil, in my personal opinion because it teaches bad habits)

c) was considered a power-user feature, which is ridiculous.

What you’ve said just exemplifies the trend I’ve seen before: MySQL does not care about creating footguns, and people who have bought into the ecosystem like to exhalt that “you’re holding it wrong”- which is absolutely not a good warning.

If you listen to my rambling about the pain I’ve seen, and you still want to use it: at least it’s an informed decision. But pretending that all is well is not ideal, it doesn’t lead anyone to make better software or practices (MySQL) or better developers (because they don’t see footguns before they’ve sprung some self-inflicted wounds on themselves)

evanelias · on July 23, 2021

> The strict mode setting should have been the default since the beginning.

I absolutely agree with that. But what's the point in talking about a terrible default that changed nearly 6 years ago, literally over and over again, year after year?

> the only justification I can think of

The real answer is almost certainly "backwards compatibility". Same reason many things in Windows are non-ideal, for example.

Someone made a really bad technical decision early on, but fixing it overnight will break things for an utterly massive number of paying customers, so it can't be rectified without a slow transition plan. It happens.

Yes, it absolutely should have been fixed on a faster schedule regardless, given its importance. But just because it wasn't, doesn't mean that the entirety of the piece of software is hopelessly flawed and everyone involved in its development is an utter cretin. Especially when a piece of software is multiple decades in age and has gone through multiple corporate acquisitions.

> What you’ve said just exemplifies the trend I’ve seen before: MySQL does not care about creating footguns

What I've said? What footguns? I'm asking this honestly: what are you referring to? Let's recap this subthread:

* @taywrobel said "it doesn't look like much new functionality has been introduced since 8.0.4"

* I listed a number of new features introduced in 8.0 point releases, and explained that MySQL 8 is intentionally not doing SemVer

* You said "that's pretty scary" without ever elaborating on which specific thing I said you find scary

* I disagreed, given MySQL's stated written policy of not following SemVer

* You mentioned something about defaults changing (??? this wasn't even a topic I mentioned by that point in the subthread)

Now you're talking about footguns again, but the only one you're mentioning with any specificity was fixed many years ago.

> people who have bought into the ecosystem like to exhalt that “you’re holding it wrong”

Where has anyone said anything remotely like that in this subthread? Are you referring to my statement that the user should understand what's in an upgrade before upgrading? If so, I absolutely stand by that statement; it's utterly crazy to upgrade any database without even glancing at the release notes, let alone having a basic understanding of how the database vendor handles releases and versioning. This is operational basic practices 101, not about "holding it wrong".

And for what it's worth, "people who have bought into the ecosystem" includes a massive chunk of the S&P500 using MySQL as a primary data store. If it's as deficient as you claim, why aren't all these companies going out of business?

buchanmilne · on July 23, 2021

> Nobody expects defaults to change between patch/minor versions.

Can you provide an example of an application-side (IOW, not a DB-infrastructure-type variable) that has changed it's default in a patch version after GA in 5.7 or 8.0?

IIRC, all of the defaults changed in very early non-GA versions.

bonzini · on July 23, 2021

> _wildly_ different to everything else

Linux?

dijit · on July 23, 2021

Linux never changes backwards comparability. Linus is literally famous for saying as much.

thayne · on July 23, 2021

> Why would it be "pretty scary" that a non-SemVer software doesn't use SemVer?

Is there another level of patches below point releases for backported security patches, or are your options to upgrade to something that adds new features and may be backwards incompatible, leave your system vulnerable to known security wholes, or patch it yourself (or maybe depend on your distro to do itl

evanelias · on July 23, 2021

In practice, this is rarely a problem unless you're a cloud vendor with a DBaaS product; the most common concern there tends to be bugs that cause server crashes. But even that is fairly rare, and those platforms (AWS, GCP, Azure, Alibaba, etc) generally maintain custom patch sets anyway.

As far as I can recall, the vast majority of modern MySQL CVEs have been low severity and/or require the attacker to already be in your private network.

For sake of comparison -- does Postgres backport every security patch to every major.minor tree? And aren't HA Postgres upgrades generally more difficult than MySQL ones since logical replication is less widespread in the PG landscape?

anarazel · on July 23, 2021

> For sake of comparison -- does Postgres backport every security patch to every major.minor tree?

Unless the security fix in question is not applicable to some branch, yes.

https://www.postgresql.org/support/security/ https://www.postgresql.org/support/versioning/

buchanmilne · on July 23, 2021

> Is there another level of patches below point releases for backported security patches

No, because its unecessary (in my experience)

> or are your options to upgrade to something that adds new features and may be backwards incompatible

It won't be backwards incompatible, otherwise they would leave the change for a new minor release.

Why would you not want a new feature (e.g. the new "clone plugin") if it gives you value now without any migration required (because of incompatibilities that may arise if all new features are left for minor versions which are reserved for incompatibilities)?

thayne · on July 23, 2021

> No, because its unecessary (in my experience)

Assuming that is true, this sounds like a normalcy bias to me (https://en.wikipedia.org/wiki/Normalcy_bias).

> Why would you not want a new feature (e.g. the new "clone plugin")

Because features can introduce bugs or performance regressions. Sure you can say that's rare, but I still might not want to take that risk with some of the most critical infrastructure in my app.

umanwizard · on July 22, 2021

SemVer is a relatively recent proposal that the vast majority of software does not follow. Calling the three parts of a three-part version “major”, “minor” and “patch” is a SemVer thing, not a universal standard.

JimDabell · on July 23, 2021

SemVer did not introduce major / minor / patch, that was standard usage in free and open source software for decades beforehand.

Example from 20 years ago:

> The most popular version numbering scheme stems from the <major>.<minor>.<patchlevel> scheme

— https://ask.slashdot.org/story/01/01/05/0054230/version-numb...

Example from 1997:

> Python versions are numbered A.B.C or A.B. A is the major version number -- it is only incremented for major changes in functionality or source structure. B is the minor version number, incremented for less earth-shattering changes to a release. C is the patchlevel -- it is incremented for each new patch release.

— https://web.archive.org/web/19970501012343/http://www.python...

wongarsu · on July 23, 2021

There's also Windows with it's A.B.C.D or major.minor.patch.build version scheme. Not sure how far back that goes, but it clearly predates semver as well.

umanwizard · on July 23, 2021

Fair enough, you’re right that it’s more widespread than just SemVer (though IMO still not an ironclad standard)

101011 · on July 23, 2021

Worth calling out here that SemVer has been around for a decade now, so "recent" is relative

rtsil · on July 23, 2021

For some of us, that is recent.

mumblemumble · on July 23, 2021

I'm coming to realize that semver is a lot like scrum. Everyone knows about it, lots of people like it, plenty of people want to use it, nobody seems able to move past arguing about how it's supposed to work and actually get a working implementation off the ground.

unethical_ban · on July 22, 2021

Sure, but you're assuming they follow that standard simply because they have x.y.z.

buchanmilne · on July 23, 2021

MySQL GA dates: 5.5 3 December 2010 5.6 5 February 2013 5.7 21 October 2015 8.0 19 April 2018

The average seems to be about 2.5 years between minor releases, so this doesn't seem unusual.

Usually new minor versions are only released if there are: * Changes to database behaviour defaults (e.g. default sql mode) * Deprecation or removal of features * Compatibility changes (changes to replication formats)

New features are fine, as long as they don't introduce regressions in other areas.

Even MySQL 5.7 received some good improvements in the past 18-24 months (not just bug fixes).

I feel this is a really pedantic complaint from someone who either doesn't use MySQL, or wants to hate it for a silly reason.

PeterZaitsev · on July 22, 2021

There are new features introduced in 8.0.26 as in .25 .24 etc before that check out release notes.

MuffinFlavored · on July 22, 2021

> At the scale of Facebook I wonder if they are the largest MySQL user on the planet

I have to imagine they use other databases as well, right? We're much smaller than Facebook and we have some Mongo and some Postgres and a mix of Go/Rust/Scala/node.js.

I wonder how many boxes they have dedicated to be being MySQL machines. I wonder how big their tables are and how many writes per second they see during the day.

hodgesrm · on July 23, 2021

Their scale is enormous. I first heard the phrase "thundering herds" in a Facebook MySQL talk. (Thundering herds refers to what happens to your infra when something big happens in real life that lights up social networking/search/etc.)

MuffinFlavored · on July 23, 2021

Let's try to put a guess in numbers.

> With roughly 2.85 billion monthly active users as of the first quarter of 2021

> 1.8 billion of Facebook users (66%) use the app on a daily basis

Say 1 database could store the data + handle the reads/writes for 100,000 users (literally just guessing)

2.85 billion monthly active users / 100k users per server = 28,500 servers/"instances" for MySQL

That's not even counting Kubernetes pods (if they use that) running Docker containers (if they use that) for their API / content serving.

I wonder what their average transactions per second looks like.

For Twitter:

> Every second, on average, around 6,000 tweets are tweeted on Twitter, which corresponds to over 350,000 tweets sent per minute, 500 million tweets per day and around 200 billion tweets per year.

> As of the first quarter of 2019, Twitter averaged 330 million monthly active users

Twitter has 11.5% the MAU of Facebook (330m of 2.85b)? I don't know if 1 tweet can be extrapolated to 1 Facebook like/comment/share/post but if it did, 6k tweets per sec * 8.7x the user base = 52.2k writes per second?

Found this:

> Facebook now has about 30,000 servers supporting its operations, hosts 80 billion photos, and serves up more than 600,000 photos to its users every second.

packetslave · on July 23, 2021

Kubernetes, no. Twine (aka "Don't call it Tupperware"), yes

https://engineering.fb.com/2019/06/06/data-center-engineerin...

Also, that "30,000 servers" number is off by... a lot. No comment on by how many orders of magnitude. :-)

dweekly · on July 23, 2021

Yep. Here's how I'd napkin math it.

Facebook has purchased 6GW of renewable energy to power its datacenters [1]. Its datacenters have a PUE of about 1.06 last they published (they unfortunately seem to have taken their dashboards down) so that's about 5.6GW of electricity actually powering real load. Some of that will power the networking fabric so we could conservatively estimate 5GW of compute load. The OCP Yosemite v3 platform power budget is a max of 1.5kW [2]. Also conservatively, let's assume a real average 1kW load. I'll leave the division as an exercise for the reader to estimate the number of servers they operate.

1 https://www.google.com/amp/s/www.euronews.com/green/amp/2021...

2 https://www.opencompute.org/documents/ocp-yosemite-v3-platfo...

MuffinFlavored · on July 23, 2021

5 Gigawatts to Kilowatts = 5000000

5000000 kilowatts / 1kW per machine = 5,000,000 machines internationally?

dweekly · on July 25, 2021

Yes, and that's a conservative lower bound (their web application servers likely draw much less than a kilowatt each and constitute a healthy portion of the fleet).

MuffinFlavored · on July 25, 2021

Global warming wise, should society really be focusing on using paper straws or should Facebook aim to make their server code 10% more efficient?

dweekly · on July 25, 2021

The good news is that electricity still costs a lot, even at scale, so there is lots of financial incentive for hyperscalars to make their datacenters and computation maximally efficient. To wit: there are a lot of people at Facebook (and Google, and Amazon, and Microsoft, and Baidu....) who work on making their server code more efficient. Small changes can result in millions of dollars a year in savings (in electricity cost, infrastructure cost, and opportunity cost in not needing to procure more servers).

daniel_iversen · on July 23, 2021

We used MySQL at Dropbox too, and they probably still do.

Thaxll · on July 22, 2021

Youtube also use MySQL.

gresrun · on July 22, 2021

YouTube used to use a MySQL-based system called Vitess[0]. It no longer uses it and has migrated to Spanner.

[0]: https://vitess.io/

Thaxll · on July 22, 2021

Thanks for the update! I found more details there: https://opensource.google/projects/vitess

> How Google uses Vitess

> Vitess was serving all YouTube database traffic from 2011 to 2019.

wikibob · on July 22, 2021

Ah this is interesting. Do you have any references on their migration to Spanner?

buro9 · on July 22, 2021

gresrun works at YouTube, if by reference you mean citation then his comment is it.

jjnoakes · on July 22, 2021

wikibob was probably looking more for public documentation about the why and how rather than confirmation that the statement is true.

wikibob · on July 23, 2021

Yep, I love watching conference talks or reading Twitter threads, papers, etc about these kind of migrations.

owaislone · on July 22, 2021

Probably because Youtube was using MySQL before being acquired and switched to a better, probably fully managed DB that was much easier (free?) to maintain and scale.

annexrichmond · on July 22, 2021

was that migration a technical decision or a political one?

haimez · on July 23, 2021

This is one of those scenarios where the answer is likely to be “both”, but I doubt you’ll get anyone discussing YouTube internal architecture in enough detail to be satisfying. Just look up what Vitesse is and compare what that implies in terms of operational complexity to using something like spanner if you want a straw man technical rationale.

dekhn · on July 22, 2021

google internally has always had a split mind about mysql. It never really jibed with the underlying google architecture but played an important role in multiple products (at least partly due to its replication features). Replacing it completely has turned out to be a huge challenge. I gained a fair amount of respect for mysql after seeing it applied at scale, but it was also a bit scary.

dheera · on July 22, 2021

Wow. Why don't they use in-house Google databases, which are supposedly vastly more scalable than MySQL?

perfectspiral · on July 22, 2021

YouTube built Vitess, which lets you scale out MySQL writers. If I can scale out both my writers and my readers then what could be "vastly more scalable" than that?

ansible · on July 22, 2021

Spanner paper:

https://static.googleusercontent.com/media/research.google.c...

It is designed from the ground-up to be globally distributed. Some other databases like FoundationDB (now owned and killed? by Apple) and CockroachDB were designed to scale across multiple datacenters.

Correction: FoundationDB is alive as open-source again, yay!

svieira · on July 22, 2021

FoundationDB was re-released as open source a couple of years ago.

https://www.foundationdb.org/

imoverclocked · on July 22, 2021

AFAIK, FoundationDB is alive and well. What makes you think it's killed?

capableweb · on July 22, 2021

It was acquired by a huge company that sometimes kills products. Shortly after the acquisition, the downloads were removed from the website. It did go open source "recently" though which made it longer lasting, but I'd understand if someone would think it was dead by now.

- https://news.ycombinator.com/item?id=9259986 - Apple Acquires FoundationDB - March 24, 2015

- Comparison between March 25 and March 14, 2015 - https://web.archive.org/web/20150314231702/https://foundatio... (14th March) > https://web.archive.org/web/20150325003252/https://foundatio... (25th March)

WJW · on July 22, 2021

As far as I can tell, Vitess still has trouble with cross-server transactions and distributed locking. Presumably something "vastly more scalable" could work around this.

The Spanner team noticeably put out https://cloud.google.com/blog/products/databases/inside-clou... a few years back, which tl;drs that they managed to get a reliable enough network that partitions don't really happen anymore and so you can get both C _and_ A from the CAP theorem. I don't think that Vitess is quite at that level, especially since you can self-host it and thus the devs don't have such network-level uptime guarantees.

perfectspiral · on July 22, 2021

Are cross-server transactions scalable? Would anyone architect a system that did them en masse unless they absolutely 100% had to?

I am just speculating here but I doubt that YouTube migrated to Spanner so that they could do cross-server transactions. I would think consolidating the Ops burden into the GCP org and also serving as a trophy "customer" were higher on the list of reasons. But again, just speculating...

derekperkins · on July 23, 2021

Yeah, Google policy was to migrate everything to Spanner, not a technical decision.

> Are cross-server transactions scalable? Would anyone architect a system that did them en masse unless they absolutely 100% had to?

As a long time Vitess user and contributor/maintainer, this perfectly describes the issue. Minimal effort has been put into solving cross shard / 2PC transactions because a well designed system rarely needs it.

ksec · on July 22, 2021

I guess people from PlanetScale will know. Not sure if they are reading the thread.

jacquesm · on July 22, 2021

Because there is a substantial cost to such migrations. If it works don't fix it.

linspace · on July 22, 2021

Does anything "just work"? Everything is constantly changing, new features always in demand, more users, more data.

lazide · on July 22, 2021

And you need to support that for anything you have ON TOP of any migration work - which is huge, and also a similarly moving target

mastersummoner · on July 22, 2021

While "if it ain't broke, don't fix it" is often a perfectly valid approach, there are going to be times when an upgrade can result in significant performance improvements or fewer admin headaches.

myohmy · on July 22, 2021

If IT is on the value chain this seems insane. If IT is ancillary to the value chain then this is standard.

SQueeeeeL · on July 22, 2021

[flagged]

williamdclt · on July 22, 2021

I don't think the people behind Spanner are a bunch of 20yo throwing together a MongoDB

fowl2 · on July 22, 2021

GitHub also seems to use MySql, as recently as 2020[1]

[1] https://github.blog/2020-05-20-three-bugs-in-the-go-mysql-dr...

derekperkins · on July 23, 2021

Still do

_fnqu · on July 22, 2021

The NSA also uses MySQL for their project XKEYSTORE (digital surveillance)

pbreit · on July 23, 2021

Any idea what Facebooks stores in MySQL? Core user database?

kevinventullo · on July 23, 2021

To a first approximation, everything.

Most product engineers interact with it by querying TAO, which you can read about in various blog posts, but is basically a cache in front of MySQL (this is a vast oversimplification).

notacoward · on July 23, 2021

Hi, Kevin. :) As of the time I left, the bulk storage system I worked on - images, video, machine-learning datasets, backups of other things, etc. - held most of the bytes at Facebook. The MySQL/TAO complex had orders of magnitude more addressable objects, undoubtedly handled many times more requests and bytes in flight per second, but when it comes to bytes at rest they're not even number one.

kevinventullo · on July 24, 2021

Fair enough :)

notacoward · on July 23, 2021

Basically all the small stuff. Every user's identity, preferences, history, and connections. If you post text, that's there too. If you post media, that's somewhere else. Connections are the hard part, because it's a gigantic graph with the potential to make every query - e.g. to construct a user's timeline - hit vast numbers of servers. There's all sorts of special sauce to do with partitioning, caching, etc. to absorb as much as possible of that load before it hits the MySQL servers, but it's still a gigantic service even by Facebook standards.

dafty4 · on July 22, 2021

Thanks, HermanL and PradeepN, for sharing this migration story with HN!

You wrote:

"Despite all the hurdles in our migration path, we have already seen the benefits of running 8.0."

May I ask what are the main performance benefits you have measured/noticed?

alberth · on July 22, 2021

It's interesting to see FB migrating so much over to Oracle.

2 weeks ago was FB use of GraalVM (from Oracle). And now this.

https://news.ycombinator.com/item?id=27782475

hodgesrm · on July 22, 2021

The MySQL 8.0 upgrade is not a migration to Oracle. Facebook has run MySQL for many years at this point and, as the article describes, operates on a private fork. They have employed some of the leading engineers in the MySQL community to support their version of MySQL.

cranekam · on July 22, 2021

It's a stretch to claim this is a "migration". Facebook has used MySQL since its very early days — certainly since before Oracle acquired Sun and thus MySQL AB.

dcolkitt · on July 22, 2021

I guess one could dream of the remote possibility that Facebook outright acquires Oracle. It seems like a weird match, but it'd give FB a toehold in a lot of verticals that it currently has no presence in. Whatever criticisms you have of FB, it's at least a pretty open source friendly company.

rrdharan · on July 22, 2021

Facebook has no interest in infrastructure / enterprise revenue streams (Facebook for Work notwithstanding). See the history of Asana, Cloudera and Phabricator (and even Dropbox, from an employee/recruiting perspective, kinda).

amelius · on July 22, 2021

What would happen to Oracle's infamous legal department then?

munk-a · on July 22, 2021

Then they'd leverage social media oversharing to find more targets to litigate - it'd be a win-win! /s

aloisdg · on July 22, 2021

You mean pretty much the whole company? /s

dylan604 · on July 22, 2021

They'll start serving ads and sharing personal data of all of the people they sue?

rattray · on July 22, 2021

Fabulously lavish retirements?

maxk42 · on July 22, 2021

Oracle has a very complete and much cheaper cloud solution relative to AWS. I've begun migrating my own projects over to their cloud. The savings on egress costs alone pays dividends.

mr_overalls · on July 22, 2021

I find the comments in this article very accurate:

https://news.ycombinator.com/item?id=15160149

> ORA is the elephant's graveyard of software.

> Once something gets bought by them, you know it is done. Slowly, but surely.

> They perform a function akin to the maggots that destroy cadavers in nature. Part of the overall ecosystem.

> ORA stopped being a tech co a while ago, now it is a finance play. Use cash to buy a business for its locked in customers, gut it to squeeze max money out of it until last customer is gone. Rinse, repeat.

BossingAround · on July 22, 2021

And yet MySQL is very much alive..? Java has fully opensource implementations too, and has had innovations like GraalVM and Jakarta EE.

I think I'm too young to hate Oracle with a burning passion. I would be very scared of their sales/corporate team, but I mean, even having large contracts with Google or Amazon is difficult to say the least.

danpalmer · on July 22, 2021

MySQL is alive for companies already using MySQL, but would well informed companies be choosing it today? I'm not so sure.

I think you either want it to have more functionality and you choose Postgres, you want a particular performance profile and you pick MariaDB, or you want the enterprise support and you pick Oracle's database. Why you would pick Oracle-supported MySQL today I'm not sure.

threeseed · on July 22, 2021

> have more functionality and you choose Postgres

PostgreSQL still after all these years has a terrible story when it comes to HA and horizontal scalability. Nothing is built in or supported. And one of the only companies to provide something half decent i.e. Citus is now owned by Microsoft and so that project is now at risk.

At least MySQL has something built-in.

danpalmer · on July 22, 2021

So MariaDB then.

ericyan · on July 22, 2021

Easy to operate, perhaps?

With MySQL group replication, it is pretty easy to set up a HA cluster that allows you to sleep soundly. It can also handle thousands of connections without the need of a connection pooler.

danpalmer · on July 22, 2021

Sure, but MariaDB is the more obvious choice surely?

evanelias · on July 22, 2021

No, why do you think so?

Among the largest users, MySQL (or Percona Server) is chosen far more frequently than MariaDB.

GP mentioned Group Replication, which isn't even available in MariaDB.

kaba0 · on July 23, 2021

That one is more on MySQL, isn’t it? They were the ones that sort of stopped innovating for a time.

abledon · on July 22, 2021

didn't the new basecamp "Hey" email product choose to use MySQL?

danpalmer · on July 22, 2021

I think so, but they have been operating MySQL at scale for many products for about 20 years, so I can understand why they might not choose MariaDB (or Postgres, Oracle, etc).

VintageCool · on July 23, 2021

Most people seem to be using the Percona Server branch of MySQL.

danpalmer · on July 23, 2021

Good point, again this seems like another alternative that makes "stock" MySQL an option that there seems to be no good reason to choose.

adrianmsmith · on July 22, 2021

This is not so with Java. They have moved Java along a lot since they bought it; they released 1.8 finally, and have introduced lots of new features such as type inference since then.

MySQL is also going strong after the acquisition. Lots of new features.

voidfunc · on July 22, 2021

This is such a hilariously myopic HN take.

mr_overalls · on July 22, 2021

I work for a government agency in a rather poor flyover state. We ran Oracle databases and a few other products for multiple decades, until Oracle conducted an audit and decided to shake us down for everything we were worth. We migrated away from their products, but only after being forced into expensive multi-year subscriptions for products that we didn't need.

I will never forget the arrogance and greed of their legal/sales people. They are bloodsucking sociopaths.

abraae · on July 22, 2021

I'm no fan of oracle's audit approach but it sounds like you were expecting special leniency to not pay for their stuff - for decades - because you are a govt agency/ live in a low income place. Not how business works. If you're playing by the rules, no one can force you into any long term contracts you don't need.

myohmy · on July 22, 2021

I'm also a government Oracle dev. I have no way of knowing what I'm licensed for and they don't disable the features we're not licensed for. I literally have to talk to my director to talk to our account rep to figure out that I'm not allowed to use table partitioning.