Show HN: RonDB – fast key-value database in the cloud

jamesblonde · on May 9, 2021

RonDB is the 3rd fork of MySQL (after PerconaDB and MariaDB). However, it is a fork of MySQL Cluster (NDB cluster database engine), not InnoDB. And this time, it's the inventor of the storage engine that is doing the forking. Mikael Ronstrom invented NDB Cluster, a distributed in-memory database, when he was a Ericsson in the late '90s and MySQL bought it off Ericsson in 2003 (or 2002?). Now, Mikael is working on a cloud-native distribution of NDB, RonDB, at Logical Clocks. NDB always had a reputation as high throughput, low latency, and difficult to configure/operate. RonDB is about fixing those flaws as a cloud native service.

ransom1538 · on May 10, 2021

Well I hope this time when they work on MySQL Cluster they add auto-sharding tables so we can increase writes. Because if they don't, I don't see the point. Sure, when I had to go to a colocation and plug in wires having an auto repairing MySQL Cluster made tons of sense. Now? I can spin up a large disk db machine in 4 minutes flat. We need a way to split db writes across VMs now. GCP and AWS just gave up on this problem. Hell, GCP doesn't even have a reader endpoint. But money is just sitting there for someone to invent.

mikaelronstrom · on May 10, 2021

RonDB has the capabilities to both scale on VM size and number of VMs as online operations. The auto part requires that we support in the managed version as well. Hopsworks have added auto-scaling to AI worker nodes. So auto-scaling in RonDB seems like a natural progression to this. Thx for the suggestion.

an_ko · on May 10, 2021

> world's fastest key value store

Where can I see the data for this comparison? A quick search only brings up benchmarks of this db against itself on different VM setups, and the site doesn't link such a thing from anywhere I've thought to look yet.

cperciva · on May 10, 2021

Indeed, I'd like to see a benchmark I can run against other key value stores for comparison rather than just taking the author's word that RonDB is faster than anything else.

mikaelronstrom · on May 10, 2021

YCSB is one benchmark that is commonly used by key-value stores. See http://mikaelronstrom.blogspot.com/2020/10/ycsb-disk-data-be..., http://mikaelronstrom.blogspot.com/2020/02/ndb-cluster-world...

mikaelronstrom · on May 10, 2021

We are integrating benchmarks into RonDB to make it easy to compare to other products, currently Sysbench (standard open source database benchmark) and DBT2 (TPC-C open source variant) available, will later add more internal and standard benchmarks.

tirrex · on May 10, 2021

Too much marketing. Don’t get me wrong but I’m lost among “fast, scalable, low latency, high throughput” marketing paragraphs and images.

There are links to other websites(company website I guess), it is same unfortuanetely.

Please consider adding sections/pages about “how did you do it, benchmarks supporting your claim, what is different in your product compared to ndb” etc. If there are links to these somewhere, you may want to make it more visible in your front page.

jamesblonde · on May 10, 2021

Great points. Great claims require great evidence. We just haven't gotten around to building out the site yet.

mikaelronstrom · on May 10, 2021

The release notes of RonDB 21.04.0 found in the RonDB documentation at docs.rondb.com lists all the differences compared to NDB.

tirrex · on May 10, 2021

Thanks, I’ll take a look.

hopsworks · on May 9, 2021

Some sysbench benchmark results are here comparing results on AWS, GCP, and Azure. Interesting to see AWS is better at lower low load (due to better interrupt handling in VMs?), while GCP is best at highest loads.

http://mikaelronstrom.blogspot.com/

partiallattice · on May 10, 2021

> RonDB provides Class 6 Availability, meaning its system is operational 99.9999% of the time, thus no more than 30 seconds of downtime per year. This ensures that RonDB is always available.

I get marketers stretch the truth all the time, but they can't possibly be serious.

ignoramous · on May 10, 2021

Such uptimes aren't in the realms of impossibility. They do remain very hard to design and engineer, however.

Famously, Amazon SLAs Route53 with a 100% uptime (for its data-plane) [0] (not sure if any other AWS service comes close). So, here's at least one KV store that's one-ups RonDB.

[0] https://news.ycombinator.com/item?id=22361358

partiallattice · on May 10, 2021

That's an SLA where they expect to end up paying customers back for outages. They've done an admirable job with only a few global outages, but subsets of customers have experienced plenty of outages.

A single 5 min outage would blow through more than a decade of SLO at 6 9s. As far as I'm aware, there does not exist a service that has been up for more than a decade with fewer than 5 min of downtime, and that definitely includes route 53.

mikaelronstrom · on May 10, 2021

In order to achieve 6 9s you need a two levels of replication, you need synchronous replication with instant failover, plus asynchronous replication to handle site failover. RonDB provides both. NDB is used in lots of telecom services where you can't make a phone call unless NDB is up. These services definitely can at times run for decades without downtime. RonDB is built on top of NDB.

ignoramous · on May 10, 2021

Data durability and high availability are two different metrics.

iirc, S3 SLAs availability at 99.99; durability at 99.999999999 and 99.99999999999999 for cross-region replicated buckets.

mikaelronstrom · on May 10, 2021

Correct, Availability is the amount of time you are available to read and write data. Durability is the amount of time your data is not lost. The metrics mentioned in this post are about availability. Most cloud vendors provide SLAs of 99.95% availability. One problem to solve when working with a cloud vendor is that they need to upgrade their OS images every now and then, so to get the highest availability one must integrate with the cloud APIs announcing those changes.

jamesblonde · on May 10, 2021

RonDB is a fork of MySQL Cluster. Here's the reference for 99.999% uptime:

https://www.mysql.com/products/cluster/availability.html

Note that it is the dominant subscriber database in the telecom space. There is a high likelihood you are using it as part of a home location registry or similar when you use your phone.

Here's a benchmark comparison with Redis (outperforms it on a single node):

https://www.logicalclocks.com/blog/ai-ml-needs-a-key-value-s...

Here's a YCSB benchmark where it beats all other well known key-value stores (not reproducible, but all database vendors (except RonDB) have a DeWitt Clause):

https://www.slideshare.net/ocklin/mysql-ndb-cluster-80-sql-f...

https://www.slideshare.net/mikaelronstrom/ndb-cluster-80ycsb...

mikaelronstrom · on May 10, 2021

These numbers are based on NDB customer experiences from operating tens of thousands of NDB clusters for more than 10 years. Obviously to achieve 99.9999% uptime requires an operational competence as well as the software to achieve it. This is why we are building this operational competence to make those numbers accessible to anyone.

willvarfar · on May 10, 2021

RonDB was and is built by telco veterans. They are used to working in a world meeting that kind of SLA.

0xbadcafebee · on May 10, 2021

Six nines is completely doable if you have the money. Availability is more limited by cost than technical difficulty.

Also consider that availability just means "the service is still running". It may be practically unusable but still available. Always read the fine print.

(Actually their math is wrong: six nines is 31.5 seconds of downtime)

mikaelronstrom · on May 10, 2021

Yep, 30 seconds is easier to remember as is 5 minutes for 5 9s :)

pella · on May 10, 2021

Best wishes!

IMHO: Next step https://jepsen.io/analyses test

jamesblonde · on May 10, 2021

Considering RonDB provides READ_COMMITTED isolation guarantees, there are not as many anomalies to worry about as stronger isolation models, such as snapshot isolation or serialization.

The design philosophy of RonDB is that we push stronger isolation requirements up to higher levels of the stack using row level locks - shared/write locks. For example, HopsFS builds on RonDB and it provides POSIX-like file system guarantees (built on the weaker read committed guarantees provided by RonDB) by implementing locking algorithms:

https://www.usenix.org/conference/fast17/technical-sessions/...

judofyr · on May 10, 2021

For comparison to other databases it should be pointed out that RonDB appears to only support "read committed" concurrency mode: https://docs.rondb.com/rondb_concepts/#consistency-models. Most other "modern" databases these days (CockroachDB/Spanner/FoundationDB/Yugabyte/FaunaDB) are focusing on providing far stricter guarantees.

mikaelronstrom · on May 10, 2021

RonDB supports read-what-you-write consistency which is actually more than any eventual consistency database provides. Thus when you have written something into RonDB you can trust that it is seen by you and others. RonDB provides row locking, this means that the application can provide a stricter guarantee if desirable. Concurrency control and consistency in a database is too complex to handle here, for an in-depth coverage of RonDB consistency, see https://docs.rondb.com/rondb_concepts/

brainless · on May 10, 2021

This blog post shares a little more information: https://www.logicalclocks.com/blog/rondb-21-04-0

And here are the docs: https://docs.rondb.com/

mdcallag · on May 10, 2021

How many nines will managed RonDB provide when running in public clouds?

mikaelronstrom · on May 10, 2021

For the moment managed RonDB will provide the availability of the cloud, but remember that the managed version is still in development. The steps to go 6 9s requires 1) Integrate cloud APIs such that we know when the cloud provider will freeze the instances 2) Provide global replication between cloud regions and failover handling of this. As mentioned reaching 6 9s requires both RonDB SW that is capable of reaching 6 9s as well as operational competence to actually deliver it. This is what we're aiming at, to make this availability reachable for normal users without this operational competence to deliver 6 9s.

z3t4 · on May 9, 2021

What are you scarifying when going LATS? It sounds too good to be true.

mikaelronstrom · on May 10, 2021

Assuming sacrificing: The availability numbers requires instant failover, this requires updating all replicas synchronously. Throughput and latency are both coming from using an asynchronous programming model which have been refined over the years. Todays new blog on this topic is here: http://mikaelronstrom.blogspot.com/2021/05/research-on-threa...

jamesblonde · on May 10, 2021

RonDB (and NDB) favors consistency over availability. But you can configure it to make it HA over availability zones in the cloud, with 1 replica per availability zone. NDB also has asynchronous geographical replication, so inter-region replication will come later.

quickthrower2 · on May 10, 2021

Sacrificing?

devoutsalsa · on May 10, 2021

I wonder how much it costs. I didn't see pricing info on the website.

EDIT: I wonder how much the managed version costs.

pella · on May 10, 2021

GPL License (?)

https://github.com/logicalclocks/rondb

mikaelronstrom · on May 10, 2021

It is based on MySQL NDB Cluster which is GPL v2 licensed. Thus so is all changes made to RonDB.

sepbot · on May 10, 2021

It's "contact us", i.e. how much is it worth to you?

kokizzu2 · on May 10, 2021

are you sure the fastest? '__') i thought fastest is aerospike, or if you need sql capability: tarantool

jamesblonde · on May 10, 2021

In a soon-to-be published online feature store benchmark with batched read/writes, NDB (RonDB) had 40% higher throughput and 40% lower latency than Aerospike. Then, RonDB offers a SQL API - you can scale partition-pruned index scans linearly (they localize to a single shard). Obviously, index scans across the whole cluster don't scale so well, neither do full-table scans, but there are tricks to use them and make them scale for not-so-big-data (such as fully-replicated tables - replicated at all nodes in the cluster).

svcrunch · on May 10, 2021

One of the features I like about Aerospike is the user defined functions (UDF), which are Lua functions that run directly in the cluster. Compared to pulling the data to my servers and performing the computation, UDFs are much more efficient.

Does RonDB have a similar capability?

Aerospike Community Edition is limited to 5TB per cluster, and I wonder if RonDB has a similar limitation?

Last, how does one compute RAM allocation per machine for RonDB? I assume it varies depending on the number of indexes that are defined? I assume you do not hold all data in memory, and you can take advantage of SSD or NVMe disks?

I'm asking because we utilize Aerospike heavily, and RonDB seems like an interesting alternative.

mikaelronstrom · on May 10, 2021

Regarding memory a very quick formula (more details exists in docs.rondb.com and in blogs) is around 25 bytes of overhead per row plus 15 bytes of overhead per primary key index and an additional 10 bytes per row per ordered index. Non-indexed columns can be disk-based and thus use SSDs or NVMe drives or networked storage. This is decided when creating the table.

mikaelronstrom · on May 10, 2021

RonDB have an interpreter that can execute a set of simple things. It is mostly used to push filtering, to push increments/decrements. There is also a pushdown join processor in it. It wouldn't be very hard to build more functionality into the interpreter. The interpreted programs is created by the NDB API and executed by the data owner. Thus the intermediate parts like transaction handler has no idea what it is passing along.

mikaelronstrom · on May 10, 2021

RonDB Community is GPL v2 and there is no limitations to its use. Our business model is to provide the managed service of operating RonDB and providing support for that.

svcrunch · on May 10, 2021

Thanks for taking the time to answer. Is there a way I can contact you offline to discuss estimated pricing for a specific scenario?

jamesblonde · on May 11, 2021

info at logicalclocks dot com

kokizzu2 · on May 10, 2021

oh batched '__') that makes sense.. clickhouse also fast if batched, but not fast when stormed with lots of small requests

mikaelronstrom · on May 10, 2021

Yep, this is a key difference between traditional SQL databases and key-value databases. SQL databases optimise specific queries and have a high overhead per query. Key-value databases have a low overhead per query and optimise on flows of queries instead of on a single query. RonDB is a key-value database with SQL capabilities, so has a bit of both.