Yeah great work! I love databases but I've only ever been a user. I am too dumb or too intimidated to peek under the covers and see how the magic works.
But I can read source code (though I'll have to learn Rust) and this is amazing work.
I've been building a prototype of a relational in-memory database in Rust to replace some involved ad-hoc data structures and in the process, I came across this.
I think that all the state management problems and solutions in all languages/frameworks (react, swiftui, elm etc etc) would be better implemented as a database with
changefeeds [0]/ change data capture [1].
The advantage compared with something like Redux is that it's more general. You don't need to implement an enum with a case for each action and then have a big dispatcher.
All your update logic would be a big switch statement on change feed from the database which would handle all the logic.
The fundamental problem of state management is as follows, if a leaf view changes some data, how does some view higher in the hierarchy learn about this?
I think that this solves the problem better than alternatives or the observer pattern.
Ideally the API would be something like LINQ methods in order to avoid parsing SQL.
I'm still trying to figure out if this would work.
> The advantage compared with something like Redux is that it's more general. You don't need to implement an enum with a case for each action and then have a big dispatcher. All your update logic would be a big switch statement on change feed from the database which would handle all the logic.
You seem to have thought a lot about this, but I don't understand the distinction you're making and I'd appreciate help squaring this up in my mind.
In the case of Redux, you have the current state, you have actions and selectors (from a CQRS perspective, an action = a command and a selector = query), and you have reducers that apply the actions to the state then update/notify the selectors. How is this different than what you stated? Are you just suggesting the removal of actions and directly changing the state, like reactive programming, with a changefeed being generated as a side effect of manipulating the state? In that case, how do you handle situations where the update requires knowledge of the current state? Won't you need to read in the current state, apply changes, and then write back? And at some point you will want to be able to pass parameters so you can combine some input (eg. the updated user's last name in a text box) and the current state (the user's first and last name) to update another state property (the user's full name). At that point you've essentially re-implemented the concept of actions and reducers.
> The fundamental problem of state management is as follows, if a leaf view changes some data, how does some view higher in the hierarchy learn about this?
At the end of the day, if you have a relational model and you want to allow subscriptions arbitrarily on the object graph, something somewhere will have to traverse the tree of the object's parents and apply a notification to them, right?
Or you will have to only allow replacing object changes/subscriptions at a flat level, which is a document-based model rather than relational. In that case, you can filter the notification stream as an optimization for the clients, but the entire document would require replacement on every update. You can do an additional optimization and allow patches of the object, and use diffing to not notify if a specific property was unchanged after the state update, but these are all implementation details.
I don't think this is a fundamental problem of state management, more-so a restriction of state management when dealing with state expressed as relational objects, and one of the reasons document databases are easier to reason about.
Bringing this back around to state management frameworks like Redux, each Redux store is essentially a single document and selectors are filtered notifiers of changes.
This project sounds interesting - you should make a blog post about your prototype database with some more details, I would love to read it and see some of the code!
Personally I believe this pattern has totally solved the problem of responding to changes in state, and its use-cases are just beginning to be explored
Given our industry's track record of things not intended for production becoming essential pillars of the internet, I assume that in 5 years we'll all be shipping ToyDB deployments ;)
Well done, I'm excited about efforts like this and I hope you make progress towards some production traffic (I'm aware it is a toy, but toys are best when used by others :) )
As a Rust developer (that's up to their neck in Rust projects) I'm really looking forward to a functioning sqlite clone. To the best of my knowledge, Sqlrite (https://github.com/joaoh82/rust_sqlite) was supposed to be that but it ended up diverging from sqlite. It'll make cross platform Rust builds easier.
Sqlite is probably the one C project I totally trust.
But, yeah, "cloning" projects is losing proposition (even more if open source). I wish a sqlite-like on rust? yeah, but not a clone, but in spirit (exist so much we can improve with rdbms!)
Maintainability? Performance? Tighter integration into Rust projects?
Also, with using SQLite's amazing test harness, it might even make sense to claim that the Rust re-implementation is of similar quality in terms of (lack of) bugs.
Is software that is always evolving battletested ? A new version could break something.
I'm aware that sqlite has one of the best test suite in all of open source, I just wonder if something that was "battletested" is still battletested after a new release.
I guess we can’t speak about battle testing in a new release from a “running in production with real user workloads” but if the test suite is properly implemented and has good coverage and exercises many cases and failure modes, it’s possible that a SQLite release is more battle tested on day zero than other software is when it’s been running for months or years. What do you think? A new version presumably still runs through validation before release.
Here's where the asymmetry of security bugs and feature bugs comes into play.
Introducing a buffer overflow in a small unused corner / edge case of the feature set allows an attacker to fully compromise the process, and it's thus a problem affecting a 100% of the users.
Introducing a behavioral bug in the same unused corner / edge case, will affect only a very small number if people (possibly none)
I’m curious to understand what you’re looking to benefit from with a SQLite rust clone vs SQLite itself? Would it make writing apps easier? Or would it open up the internals of dB engines to rust developers? Or something else..? Honest question.
I think I answered your question in my comment, no? It'll make cross platform builds easier instead of the current steps needed for FFI libraries. My experience with cross platform FFI builds are slim but last time I tried it, it was painful (versus Rust handling it all with the right toolchain).
I feel like I must be misunderstanding you... why would the FFI be done by you? The rusqlite crate provides wrappers you can use, and there are probably other wrapper crates that I haven't looked at.
But I have to configure how it finds SQLite. Sure, SQLite specifically is fairly easy – I only need to enable the "bundled" crate feature and make sure my Clang version matches the Rust version so LTO works – but it's a little arcane to use Cargo with any non-Rust package in the dependency chain. Cargo Just Works™. Cargo with C FFI doesn't.
Not the parent comment, but I believe they're referring to the cross compilation aspects which wouldn't be resolved by just using the rusqlite crate (I use rusqlite and have this issue).
Ah, there is nothing like perjorative acronyms ... they seem to be designed to shut down thinking rather than invite more of it.
Let's not forget that:
1. new programming languages bring excitement, creative destruction, and yes, lots of hobby projects that don't turn into battle-tested products. I'm ok with that.
2. rewriting is not the same as re-envisioning. See the comment above. SQLite was a starting point for inspiration, not the destination.
Also, we should not conflate an underlying technology with the hype around that technology:
a. Just because there is a lot of hype around Rust does not mean the advantages of Rust should be overlooked
b. Some highly-praised technologies deserve the praise. Yes, sometimes even the cynics are wrong.
So do the throughput decrease when you add more nodes? Or am I reading it wrong? This is a general problem with scaling horizontally, that the overhead can kill the performance, and single core/node performance is sacrificed.
Likely you are reading it correctly. What you gain in possible availability (because now you have more nodes, which give some redundancy with regards to node failure) you have to pay the work necessary for constantly running the consistency protocol from your performance budget.
How much you have to sacrifice, and how linear the scaling is, of course are important quality metrics of distributed systems. In multi-writer optimistic-locking sync-at-commit systems (eg Galera) in case of no conflict, it's possible to have the multi-node throughput exceed the throughput of the single-node version.
Kinf of. But it's not really Consistent in the CAP sense, as it only provides snapshot isolation, not linearizability, plus the A in CAP is kind of useless. (CAP is an amazing theoretical milestone nonetheless, just too narrow for real life situations. Basically it's a simple no-go theorem.)
In practice even this ToyDB is likely able to serve requests in a degraded state (probably as long as the Raft leader's timer does not expire, and if there's a quorum of nodes they can reelect a leader). And it seems that if a node falls out of sync it will automatically rejoin and try to replay the logs. (As long as they are available of course.)
I recommend one ClickHouse compatible OLAP database project in Rust: [TensorBase](https://github.com/tensorbase/tensorbase/) for anyone who likes working with AP-side DBs on Rust.
FYI, recent information and progresses for TensorBase:
1. TensorBase(TB, for short) is not an reimplementing or clone of ClickHouse(CH, for short). TensorBase just supports the ClickHouse wire protocol in its server side.
2. TB's in-Rust CH compatible server side is faster than that in-C++ of CH. TB enables *F4* in the critical writing path: Copy-Free, Lock-Free, Async-Free, Dyn-Free (no dynamic object dispatching).
The result of TB's architectural performance: the untuned write throughput of TB is ~ 2x faster than that of CH in the Rust driver bench, or ~70% faster by using CH own ```clickHouse-client``` command. Use [this parallel script](https://github.com/tensorbase/tools/blob/main/import_csv_to_...) to try it yourself!
5. For complex groupby aggregations, recently we help to boost the speed of the TB engine to the same level of ClickHouse(not released, but coming soon).
+1 for sled--I was working with it this morning for a personal project, and it's frankly just really well built. Beside the great documentation and easy use, I kept finding myself looking for architectural tips for structuring Config structs, etc. because I could do a lot worse than end up coding something that is similarly nice to use.
What are you starting with? I re-started my Rust learning last month. I am very slow at learning but Rust makes digging into systems stuff so much fun. I am from Python/JS background. I start with Rustlings every time.
Now building a Git player - basically watch a Git repo history unfold like a movie, for learning purposes [1].
Please ping me if you have some ideas, looking for learning peers.
I used to think so too. I re-started learning Rust, after giving up in frustration twice. I try to learn new languages roughly 1-2 times a year, (over the last decade and half). I am from Python + JS background for more than a decade and PHP for half decade before that. C/C++ in college days.
Rust has some ideas that are pretty easy to see as to why it guards developers. It is not an easy language. It is surely a systems language and allows you to dig deeper into every aspect. You can roll your own implementation for anything to fit the need - want async in resource constrained platforms? Go ahead, write it.
But the guards are helping you think code that has less errors. It is an idea built into the language. But it is not the only thing* that sells it - the tooling is fantastic for systems language. The community is second to none. It is 10 years old now and only getting started - Linux, Microsoft and other big camps are giving it a chance - this has not happened in languages history at this level other than C/C++.
I’m old enough to remember when java was the hot new language. It was legitimately exciting—-memory safety, speed, and reasonable cross platform support among other things. Eventually it became boring, not because it failed but because it succeeded. Maybe that’s how rust will go.
Do people excited about the new thing go over the top and upvote anything even tangentially related to it? Yes, I think you need to admit they do. Is that the worst thing in the world? No, certainly not.
You are right Java succeeded. But not in the way that I see Rust is heading. I learned Java in early 2000s, not because it was a nice language to learn, but jobs were in it. I remember hating the entire process, the smell of Enterprise in everything.
In Rust, most talks I watch are tiny games, graphics and visualizations, emulators, language interpreters, some web (every language must it seems), experimental databases, kernels, lots of rewrites of other system software... Almost nothing I see in general is production focused, forget enterprise.
These are all from learners, hundreds of people falling in love with the language and the community. I have been in software for almost 2 decades, 15 years of them professionally. I have not seen this in a language that is also slowly going mainstream. Maybe I am wrong, but I am willing to bet a good chunk of my next 10 years to Rust.
Rust will not replace lots of code in Python (my main language), for example. Taking parts out and optimizing them in C++ is something I would have never done. I would do it in Rust. That gives a systems language huge appeal to tons of developers from the top 10 non-systems languages.
This is exactly what I was hoping to find. This is great!