ToyDB: Distributed SQL Database in Rust

_6pvr · on July 18, 2021

https://github.com/erikgrinaker/toydb/blob/master/docs/refer...

This is exactly what I was hoping to find. This is great!

Ostrogodsky · on July 18, 2021

I concur, excellent choices of background material.

gigatexal · on July 19, 2021

Yeah great work! I love databases but I've only ever been a user. I am too dumb or too intimidated to peek under the covers and see how the magic works.

But I can read source code (though I'll have to learn Rust) and this is amazing work.

adamnemecek · on July 18, 2021

I've been building a prototype of a relational in-memory database in Rust to replace some involved ad-hoc data structures and in the process, I came across this.

I think that all the state management problems and solutions in all languages/frameworks (react, swiftui, elm etc etc) would be better implemented as a database with changefeeds [0]/ change data capture [1].

The advantage compared with something like Redux is that it's more general. You don't need to implement an enum with a case for each action and then have a big dispatcher.

All your update logic would be a big switch statement on change feed from the database which would handle all the logic.

The fundamental problem of state management is as follows, if a leaf view changes some data, how does some view higher in the hierarchy learn about this?

I think that this solves the problem better than alternatives or the observer pattern.

Ideally the API would be something like LINQ methods in order to avoid parsing SQL.

I'm still trying to figure out if this would work.

[0] https://rethinkdb.com/docs/changefeeds/ruby/

[1] https://en.wikipedia.org/wiki/Change_data_capture

lukevp · on July 18, 2021

> The advantage compared with something like Redux is that it's more general. You don't need to implement an enum with a case for each action and then have a big dispatcher. All your update logic would be a big switch statement on change feed from the database which would handle all the logic.

You seem to have thought a lot about this, but I don't understand the distinction you're making and I'd appreciate help squaring this up in my mind.

In the case of Redux, you have the current state, you have actions and selectors (from a CQRS perspective, an action = a command and a selector = query), and you have reducers that apply the actions to the state then update/notify the selectors. How is this different than what you stated? Are you just suggesting the removal of actions and directly changing the state, like reactive programming, with a changefeed being generated as a side effect of manipulating the state? In that case, how do you handle situations where the update requires knowledge of the current state? Won't you need to read in the current state, apply changes, and then write back? And at some point you will want to be able to pass parameters so you can combine some input (eg. the updated user's last name in a text box) and the current state (the user's first and last name) to update another state property (the user's full name). At that point you've essentially re-implemented the concept of actions and reducers.

lukevp · on July 18, 2021

> The fundamental problem of state management is as follows, if a leaf view changes some data, how does some view higher in the hierarchy learn about this?

At the end of the day, if you have a relational model and you want to allow subscriptions arbitrarily on the object graph, something somewhere will have to traverse the tree of the object's parents and apply a notification to them, right?

Or you will have to only allow replacing object changes/subscriptions at a flat level, which is a document-based model rather than relational. In that case, you can filter the notification stream as an optimization for the clients, but the entire document would require replacement on every update. You can do an additional optimization and allow patches of the object, and use diffing to not notify if a specific property was unchanged after the state update, but these are all implementation details.

I don't think this is a fundamental problem of state management, more-so a restriction of state management when dealing with state expressed as relational objects, and one of the reasons document databases are easier to reason about.

Bringing this back around to state management frameworks like Redux, each Redux store is essentially a single document and selectors are filtered notifiers of changes.

This project sounds interesting - you should make a blog post about your prototype database with some more details, I would love to read it and see some of the code!

brundolf · on July 19, 2021

What you describe sounds a lot like MobX: https://mobx.js.org/the-gist-of-mobx.html

In the Rust world, Salsa operates on similar principles: https://salsa-rs.github.io/salsa/

Personally I believe this pattern has totally solved the problem of responding to changes in state, and its use-cases are just beginning to be explored

brundolf · on July 19, 2021

Given our industry's track record of things not intended for production becoming essential pillars of the internet, I assume that in 5 years we'll all be shipping ToyDB deployments ;)

mathgladiator · on July 18, 2021

Well done, I'm excited about efforts like this and I hope you make progress towards some production traffic (I'm aware it is a toy, but toys are best when used by others :) )

netsec_burn · on July 18, 2021

As a Rust developer (that's up to their neck in Rust projects) I'm really looking forward to a functioning sqlite clone. To the best of my knowledge, Sqlrite (https://github.com/joaoh82/rust_sqlite) was supposed to be that but it ended up diverging from sqlite. It'll make cross platform Rust builds easier.

fsloth · on July 18, 2021

Sqlite has been battletested for years.

I utterly fail to see the point in reimplementing it in some other language.

Innovate in databases - yes please - but "rewrite" - why?

Not being able to use battletested libraries written in plain C (e.g. sqlite) sounds like a hugh mark against rust as a systems programming language.

Is using C libraries an actual issue or just an aesthetic grumble?

mamcx · on July 18, 2021

Sqlite is probably the one C project I totally trust.

But, yeah, "cloning" projects is losing proposition (even more if open source). I wish a sqlite-like on rust? yeah, but not a clone, but in spirit (exist so much we can improve with rdbms!)

pas · on July 18, 2021

Maintainability? Performance? Tighter integration into Rust projects?

Also, with using SQLite's amazing test harness, it might even make sense to claim that the Rust re-implementation is of similar quality in terms of (lack of) bugs.

zie · on July 18, 2021

only some of the test harness is open source, see: https://sqlite.org/testing.html

erik_seaberg · on July 19, 2021

If the rewrite is distributed over a quorum and columns are strongly-typed, I’m happy to see it.

simlevesque · on July 18, 2021

> battletested

Is software that is always evolving battletested ? A new version could break something.

I'm aware that sqlite has one of the best test suite in all of open source, I just wonder if something that was "battletested" is still battletested after a new release.

lukevp · on July 18, 2021

I guess we can’t speak about battle testing in a new release from a “running in production with real user workloads” but if the test suite is properly implemented and has good coverage and exercises many cases and failure modes, it’s possible that a SQLite release is more battle tested on day zero than other software is when it’s been running for months or years. What do you think? A new version presumably still runs through validation before release.

ithkuil · on July 18, 2021

Here's where the asymmetry of security bugs and feature bugs comes into play.

Introducing a buffer overflow in a small unused corner / edge case of the feature set allows an attacker to fully compromise the process, and it's thus a problem affecting a 100% of the users.

Introducing a behavioral bug in the same unused corner / edge case, will affect only a very small number if people (possibly none)

jasonwatkinspdx · on July 18, 2021

Freezing the software is no assurance of security either. I'll take ongoing maintenance of the quality of sqlite every time personally.

mirekrusin · on July 19, 2021

sqlite is known for its massive test coverage with 600x+ test to source ratio [0].

[0] https://www.sqlite.org/testing.html

pm90 · on July 18, 2021

I’m curious to understand what you’re looking to benefit from with a SQLite rust clone vs SQLite itself? Would it make writing apps easier? Or would it open up the internals of dB engines to rust developers? Or something else..? Honest question.

netsec_burn · on July 18, 2021

I think I answered your question in my comment, no? It'll make cross platform builds easier instead of the current steps needed for FFI libraries. My experience with cross platform FFI builds are slim but last time I tried it, it was painful (versus Rust handling it all with the right toolchain).

ofrzeta · on July 18, 2021

Can't be as hard as reimplementing SQLite, can it?

wizzwizz4 · on July 18, 2021

But reimplementing SQLite can be done by somebody else, whereas dealing with FFI must be done by me.

azornathogron · on July 18, 2021

I feel like I must be misunderstanding you... why would the FFI be done by you? The rusqlite crate provides wrappers you can use, and there are probably other wrapper crates that I haven't looked at.

wizzwizz4 · on July 18, 2021

But I have to configure how it finds SQLite. Sure, SQLite specifically is fairly easy – I only need to enable the "bundled" crate feature and make sure my Clang version matches the Rust version so LTO works – but it's a little arcane to use Cargo with any non-Rust package in the dependency chain. Cargo Just Works™. Cargo with C FFI doesn't.

azornathogron · on July 18, 2021

Thanks for the explanation!

netsec_burn · on July 18, 2021

Not the parent comment, but I believe they're referring to the cross compilation aspects which wouldn't be resolved by just using the rusqlite crate (I use rusqlite and have this issue).

hxzhao · on July 18, 2021

the RIIR fallacy, again

xpe · on July 18, 2021

> the RIIR fallacy, again

AYATAINWK?

Are You Aware That Acronym Is Not Widely Known?

NoahKAndrews · on July 18, 2021

I believe it's Rewrite it in Rust

xpe · on July 24, 2021

Ah, there is nothing like perjorative acronyms ... they seem to be designed to shut down thinking rather than invite more of it.

Let's not forget that:

1. new programming languages bring excitement, creative destruction, and yes, lots of hobby projects that don't turn into battle-tested products. I'm ok with that.

2. rewriting is not the same as re-envisioning. See the comment above. SQLite was a starting point for inspiration, not the destination.

Also, we should not conflate an underlying technology with the hype around that technology:

a. Just because there is a lot of hype around Rust does not mean the advantages of Rust should be overlooked

b. Some highly-praised technologies deserve the praise. Yes, sometimes even the cynics are wrong.

z3t4 · on July 18, 2021

So do the throughput decrease when you add more nodes? Or am I reading it wrong? This is a general problem with scaling horizontally, that the overhead can kill the performance, and single core/node performance is sacrificed.

pas · on July 18, 2021

Likely you are reading it correctly. What you gain in possible availability (because now you have more nodes, which give some redundancy with regards to node failure) you have to pay the work necessary for constantly running the consistency protocol from your performance budget.

How much you have to sacrifice, and how linear the scaling is, of course are important quality metrics of distributed systems. In multi-writer optimistic-locking sync-at-commit systems (eg Galera) in case of no conflict, it's possible to have the multi-node throughput exceed the throughput of the single-node version.

AtlasBarfed · on July 19, 2021

So this is a CP system?

pas · on July 19, 2021

Kinf of. But it's not really Consistent in the CAP sense, as it only provides snapshot isolation, not linearizability, plus the A in CAP is kind of useless. (CAP is an amazing theoretical milestone nonetheless, just too narrow for real life situations. Basically it's a simple no-go theorem.)

In practice even this ToyDB is likely able to serve requests in a degraded state (probably as long as the Raft leader's timer does not expire, and if there's a quorum of nodes they can reelect a leader). And it seems that if a node falls out of sync it will automatically rejoin and try to replay the logs. (As long as they are available of course.)

https://github.com/erikgrinaker/toydb/blob/master/docs/archi...

https://www.youtube.com/watch?v=hUd_9FENShA

https://blog.acolyer.org/2014/11/07/highly-available-transac...

jinmingjian · on July 19, 2021

I recommend one ClickHouse compatible OLAP database project in Rust: [TensorBase](https://github.com/tensorbase/tensorbase/) for anyone who likes working with AP-side DBs on Rust.

FYI, recent information and progresses for TensorBase:

1. TensorBase(TB, for short) is not an reimplementing or clone of ClickHouse(CH, for short). TensorBase just supports the ClickHouse wire protocol in its server side.

2. TB's in-Rust CH compatible server side is faster than that in-C++ of CH. TB enables *F4* in the critical writing path: Copy-Free, Lock-Free, Async-Free, Dyn-Free (no dynamic object dispatching).

The result of TB's architectural performance: the untuned write throughput of TB is ~ 2x faster than that of CH in the Rust driver bench, or ~70% faster by using CH own ```clickHouse-client``` command. Use [this parallel script](https://github.com/tensorbase/tools/blob/main/import_csv_to_...) to try it yourself!

3. Thanks to the Arrow-DataFusion, TensorBase has supported good parts of TPC-H. [Untuned TPC-H Q1 result here](https://github.com/tensorbase/benchmarks/blob/main/tpch.md).

4. In simple (no-groupby) aggregation, TensorBase is several times faster than ClickHouse. [Benchmark here](https://github.com/tensorbase/benchmarks/blob/main/quick.md).

5. For complex groupby aggregations, recently we help to boost the speed of the TB engine to the same level of ClickHouse(not released, but coming soon).

6. TB will soon supports MySQl wire protocol, distributed query, adaptive columnar storage optimization... Watch [issues here](https://github.com/tensorbase/tensorbase/issues)

Finally, it is really great to build an AP database in Rust. Welcome to join!

Disclaimer: I am the author of TensorBase.

Wonnk13 · on July 18, 2021

Wow - I just started my Rust journey this weekend. Any good projects that are even simpler than this to learn. maybe a basic k/v store?

adamnemecek · on July 18, 2021

Check out sled

https://github.com/spacejam/sled

It’s surprisingly mature and the person behind it is very committed to the project.

qchris · on July 18, 2021

+1 for sled--I was working with it this morning for a personal project, and it's frankly just really well built. Beside the great documentation and easy use, I kept finding myself looking for architectural tips for structuring Config structs, etc. because I could do a lot worse than end up coding something that is similarly nice to use.

brainless · on July 19, 2021

What are you starting with? I re-started my Rust learning last month. I am very slow at learning but Rust makes digging into systems stuff so much fun. I am from Python/JS background. I start with Rustlings every time.

Now building a Git player - basically watch a Git repo history unfold like a movie, for learning purposes [1].

Please ping me if you have some ideas, looking for learning peers.

1. https://github.com/brainless/gitplay

arpanetus · on July 18, 2021

i hope i aint being toxic

but i wonder what if the same db was written in c++, would it get a post in HN?

asdjlkadsjklads · on July 18, 2021

In this case i'd say yes. It's a DB purposefully written for learning, with good references and documentation.

However if it was just another DB.. especially one competing in an area no one really feels deficient, prob not.

bradleyjg · on July 19, 2021

Nope. But that’s just the way things go. At some point rust will either be boring or gone.

Remember when clojure was going to take over the world?

brainless · on July 19, 2021

I used to think so too. I re-started learning Rust, after giving up in frustration twice. I try to learn new languages roughly 1-2 times a year, (over the last decade and half). I am from Python + JS background for more than a decade and PHP for half decade before that. C/C++ in college days.

Rust has some ideas that are pretty easy to see as to why it guards developers. It is not an easy language. It is surely a systems language and allows you to dig deeper into every aspect. You can roll your own implementation for anything to fit the need - want async in resource constrained platforms? Go ahead, write it.

But the guards are helping you think code that has less errors. It is an idea built into the language. But it is not the only thing* that sells it - the tooling is fantastic for systems language. The community is second to none. It is 10 years old now and only getting started - Linux, Microsoft and other big camps are giving it a chance - this has not happened in languages history at this level other than C/C++.

So yeah, give it a try and you will see why.

bradleyjg · on July 19, 2021

I’m old enough to remember when java was the hot new language. It was legitimately exciting—-memory safety, speed, and reasonable cross platform support among other things. Eventually it became boring, not because it failed but because it succeeded. Maybe that’s how rust will go.

Do people excited about the new thing go over the top and upvote anything even tangentially related to it? Yes, I think you need to admit they do. Is that the worst thing in the world? No, certainly not.

brainless · on July 19, 2021

You are right Java succeeded. But not in the way that I see Rust is heading. I learned Java in early 2000s, not because it was a nice language to learn, but jobs were in it. I remember hating the entire process, the smell of Enterprise in everything.

In Rust, most talks I watch are tiny games, graphics and visualizations, emulators, language interpreters, some web (every language must it seems), experimental databases, kernels, lots of rewrites of other system software... Almost nothing I see in general is production focused, forget enterprise.

These are all from learners, hundreds of people falling in love with the language and the community. I have been in software for almost 2 decades, 15 years of them professionally. I have not seen this in a language that is also slowly going mainstream. Maybe I am wrong, but I am willing to bet a good chunk of my next 10 years to Rust.

Rust will not replace lots of code in Python (my main language), for example. Taking parts out and optimizing them in C++ is something I would have never done. I would do it in Rust. That gives a systems language huge appeal to tons of developers from the top 10 non-systems languages.

moss2 · on July 19, 2021

Very impressive.