I’ve run into serious house burning down problems with myrocks too. Simple recip...

yoshinorim · on Sept 16, 2020

I am one of the creators of MyRocks at FB. We have a few common MySQL features/operations we don't use at FB. Notably:

1) Schema Changes by DDL (e.g. ALTER TABLE, CREATE INDEX)

2) Recovering primary instances without failover

We use our own open source tool OnlineSchemaChange to do schema changes (details: https://github.com/facebook/mysql-5.6/wiki/Schema-Changes), which is heavily optimized for MyRocks use cases like utilizing bulk loading for both primary and secondary keys. ALTER TABLE / CREATE INDEX support in MyRocks is limited and suboptimal -- it does not support Online/Instant DDL (so blocking writes to the same table during ALTER), and enters non bulk loading path and trying to load the entire table in one transaction -- which may hit row lock count limit or out of memory. We have plans to improve regular DDL paths in MyRocks in MySQL 8.0, including supporting atomic, online and instant schema changes.

I am also realizing that a lot of external MySQL users still don't have auto failover and try to recover primary instances if they go down. This means single instance availability and recoverability is much more important for them. We set rocksdb_wal_recovery_mode=1 (kAbsoluteConsistency) by default in MyRocks, which actually degraded recoverability (higher chances to refuse to start even if it can be recovered from binlog). We're changing defaults to 2 (kPointInTimeRecovery) so that it can be more robust without relying on replicas for recovery.

It would have been a really bad experience when hitting OOM by 1) then failing to restart because of 2). We have relations with MariaDB and Percona, and will make default behavior better for users.

willvarfar · on Sept 21, 2020

Thanks for explaining this! Really appreciate that you joined in here.

We've been test running our real-time dwh etls on myrocks (and postgres and timescale and even innodb) to comppare with our previous workhorse, tokudb. We've chewed through cpu years iterating over every switch and setting we can think of, to find optimum config for our workloads.

Like for example we've found that myrocks really slows down if you do a SELECT ... WHERE id IN (....) from too long a list of ids.

So we have lots of thoughts and data points on things my team have found easy, hard, painful, better etc. I'd be happy to share with you folks.

(FWIW we are moving from tokudb to myrocks now, with tweaks to how we do data retention and gdpr depersonalization and things)

Ping me on willvarfar at google's freemail domain if that's useful!

paulez · on Sept 16, 2020

That's a long standing problem with MySQL, DDL statements (ALTER TABLE and such) where not atomic until version 8.0. This required some serious work on InnoDB as you can read some details at https://mysqlserverteam.com/atomic-ddl-in-mysql-8-0/

nitrobeast · on Sept 16, 2020

Likely your db configuration is very different from what FB uses in production, so they have no incentive to investigate or fix.

willvarfar · on Sept 16, 2020

Its that the 'fix' is so unprofessional.

The problem is that the program runs out of RAM. The challenge is to write the data and metadata in such a way that a program crashing at any point for any reason is recoverable.

This is the basic promise of the Durability in ACID, and people using MyRocks expect it.

Rather than actually making sure that MyRocks is durable, they simply slap on a 'max transaction rows' to make it unlikely you run out of RAM. Instead, you simply get an error, and can't do stuff like ALTER TABLE or UPDATE on large tables.

Of course its easy to run out of RAM despite these thresholds, and its easy to find advice when you google the error messages you get that lead you to up the thresholds and to even set a 'bulk load' flag that disables various checks you probably haven't investigated.

The whole approach is wrongheaded!

A database that crashes should not be corrupt!!! Isn't this reliability 101? Why doesn't myrocks have chaos monkey stress testing etc etc?

</exasperated ranting>

nemothekid · on Sept 16, 2020

>A database that crashes should not be corrupt!!! Isn't this reliability 101? Why doesn't myrocks have chaos monkey stress testing etc etc?

Because Facebook has little incentive to ensure that RocksDB works well in your use case. MyRocks was built for Facebook and anything that Facebook doesn’t do probably isn’t particularly hardened. They aren’t going to invest time doing chaos monkey stress testing on codepaths they don’t use. Things like durability might not be super important to them because they will make it up in redundancy.

I remember being burned by something similar during the early days of Cassandra. I’m sure Cockroach has hit the same bugs.