> RonDB provides Class 6 Availability, meaning its system is operational 99.9999...

ignoramous · on May 10, 2021

Such uptimes aren't in the realms of impossibility. They do remain very hard to design and engineer, however.

Famously, Amazon SLAs Route53 with a 100% uptime (for its data-plane) [0] (not sure if any other AWS service comes close). So, here's at least one KV store that's one-ups RonDB.

[0] https://news.ycombinator.com/item?id=22361358

partiallattice · on May 10, 2021

That's an SLA where they expect to end up paying customers back for outages. They've done an admirable job with only a few global outages, but subsets of customers have experienced plenty of outages.

A single 5 min outage would blow through more than a decade of SLO at 6 9s. As far as I'm aware, there does not exist a service that has been up for more than a decade with fewer than 5 min of downtime, and that definitely includes route 53.

mikaelronstrom · on May 10, 2021

In order to achieve 6 9s you need a two levels of replication, you need synchronous replication with instant failover, plus asynchronous replication to handle site failover. RonDB provides both. NDB is used in lots of telecom services where you can't make a phone call unless NDB is up. These services definitely can at times run for decades without downtime. RonDB is built on top of NDB.

ignoramous · on May 10, 2021

Data durability and high availability are two different metrics.

iirc, S3 SLAs availability at 99.99; durability at 99.999999999 and 99.99999999999999 for cross-region replicated buckets.

mikaelronstrom · on May 10, 2021

Correct, Availability is the amount of time you are available to read and write data. Durability is the amount of time your data is not lost. The metrics mentioned in this post are about availability. Most cloud vendors provide SLAs of 99.95% availability. One problem to solve when working with a cloud vendor is that they need to upgrade their OS images every now and then, so to get the highest availability one must integrate with the cloud APIs announcing those changes.

jamesblonde · on May 10, 2021

RonDB is a fork of MySQL Cluster. Here's the reference for 99.999% uptime:

https://www.mysql.com/products/cluster/availability.html

Note that it is the dominant subscriber database in the telecom space. There is a high likelihood you are using it as part of a home location registry or similar when you use your phone.

Here's a benchmark comparison with Redis (outperforms it on a single node):

https://www.logicalclocks.com/blog/ai-ml-needs-a-key-value-s...

Here's a YCSB benchmark where it beats all other well known key-value stores (not reproducible, but all database vendors (except RonDB) have a DeWitt Clause):

https://www.slideshare.net/ocklin/mysql-ndb-cluster-80-sql-f...

https://www.slideshare.net/mikaelronstrom/ndb-cluster-80ycsb...

mikaelronstrom · on May 10, 2021

These numbers are based on NDB customer experiences from operating tens of thousands of NDB clusters for more than 10 years. Obviously to achieve 99.9999% uptime requires an operational competence as well as the software to achieve it. This is why we are building this operational competence to make those numbers accessible to anyone.

willvarfar · on May 10, 2021

RonDB was and is built by telco veterans. They are used to working in a world meeting that kind of SLA.

0xbadcafebee · on May 10, 2021

Six nines is completely doable if you have the money. Availability is more limited by cost than technical difficulty.

Also consider that availability just means "the service is still running". It may be practically unusable but still available. Always read the fine print.

(Actually their math is wrong: six nines is 31.5 seconds of downtime)

mikaelronstrom · on May 10, 2021

Yep, 30 seconds is easier to remember as is 5 minutes for 5 9s :)