Hacker News new | past | comments | ask | show | jobs | submit login
Why you might want a domain-specific database like TigerBeetleDB (twitter.com/phil_eaton)
118 points by tosh on Sept 9, 2022 | hide | past | favorite | 61 comments



I have always been an unabashed fan of the idea of domain-specific databases. Database implementations make a lot of compromises for the sake of generality and avoid useful features that overfit a single domain. The idea of a database engine perfectly optimized and feature-fit for a domain has obvious qualitative advantages, and it isn't controversial that it is eminently possible to build such a database.

So why are they so rare? It is expensive in several dimensions to build a narrowly tailored database engine from scratch. Database engines are not modular, you can't assemble them from arbitrary parts while maintaining control of their basic characteristics. Building an optimized domain-specific database from scratch is not trivial because you will have to do a lot of the really hard parts yourself.

In my head, there has always been a missing piece of software - a "database compiler", that can take the very complex and high-dimensionality abstract specification for a domain and codegen a purpose-built database engine. I also recognize that this is exceedingly non-trivial; I codegen storage engines to spec and that is already difficult enough. Doing that for a full database engine in a specific domain would have hundreds of input parameters, many of which would require expertise in the underlying codegen to know how to use them. It would be awesome if such a thing existed though.


"In my head, there has always been a missing piece of software - a "database compiler", that can take the very complex and high-dimensionality abstract specification for a domain and codegen a purpose-built database engine."

You nailed it!

And this is in fact our design for TigerBeetle, an Iron Man suit that you can put any state machine business logic into. You get global consensus protocol and local storage engine, with all the performance of TigerBeetle, and it's a really nice experience writing your own business logic inside of that. You can even test everything using Deterministic Simulation Testing.

Long term, we want to extract this into a library. Reading the TB source, you'll see all these abstractions are something we're thinking about.


It is expensive in several dimensions to build a narrowly tailored database engine from scratch. Database engines are not modular, you can't assemble them from arbitrary parts while maintaining control of their basic characteristics.

Aren't things like LMDB and RocksDB sort of modular parts you can build databases around?


We dive into this in detail here in “Let's Remix Distributed Database Design!”, a talk given at the Recurse center: https://www.youtube.com/watch?v=rNmZZLant9o

TLDR: Here's also a 10 min lightning version given at HYTRADBOI '22, on why technology/research has changed enough that we need new storage engines to take advantage of where things are going: https://www.youtube.com/watch?v=yBBpUMR8dHw


That 2nd video is awful, it's just a talking head with the content repeated as text. It's the worst of both video and text.


???

Don't mind this guy and just watch the video. It's pretty much the same format that is used for conference videos everywhere.

It's fine.


The 2nd vid I downloaded. 18.5 meg. A transcription would be maybe 2K? And I could read it at my own speed instead of having to skip back when it covered things too fast, and there could be URLs to useful stuff like papers and wiki pages (like, perhaps, to Eytzinger Binary Search which I'd never heard of).

The video added precisely nothing to a text transcript and wasted an incredible factor for bandwidth. Just.. why?


Humans like me (and a couple other billion on YouTube) enjoy watching another human explaining stuff to them.

If plain text is your thing then that's good as well, to each its own.


I'm curious, did you already know what a Eytzinger Binary Search was?


No.


It’s kind of coming from the opposite direction, but it feels like Postgres extensions solve many of the needs you’re describing. They allow a developer to use the underlying database engine while extending it to include domain specific primitives.


That's the thing, I've written a lot of Postgres extensions over the decades. Those extensions are fundamentally limited by the underlying architecture of Postgres in myriad ways, though I've been very creative about pushing the limits. In the big picture the kinds of extensions that work well in Postgres are for a relatively narrow set of domains that roughly match what Postgres was designed for but for which it may lack some specific features.

The limitations and capabilities of a database are largely architectural. It isn't something you can "extend", it is either there or it isn't. This is why being able to codegen a purpose-built architecture would be interesting; instead of trying to force a domain into an architecture that doesn't fit well, we could produce an architecture that is a perfect fit for the domain while inheriting many of the basic quality of life features you expect.


I'd be fascinated to know what sort of things it is hard to write Postgres extensions for and how that interplays with their existing architecture choices, if you have the time to give some examples.


You know, we almost wrote TB as a Postgres extension!

I think it would have been the right thing to do 5 years ago, before some of the groundbreaking research that came out in 2018 like fsyncgate and “Protocol-Aware Recovery for Consensus-Based Storage” that really changed the way that distributed databases need to be designed [1].

These days we also have io_uring, Deterministic Simulation Testing and safer systems languages (Rust, Zig). And high availability, i.e. consensus, almost has to be part of the (distributed) database going forwards.

Beyond this, in the case of TB, a Postgres extension didn't satisfy our storage fault model, or our design goals of gray failure tail latency tolerance and static memory allocation.

What we're excited about with TB, is also this vision that people will one day be writing their own extensions for TB, swapping out the accounting state machine for another, with TB doing all the distributed heavy lifting.

[1] “Let's Remix Distributed Database Design!” https://www.youtube.com/watch?v=rNmZZLant9o


> In my head, there has always been a missing piece of software - a "database compiler", that can take the very complex and high-dimensionality abstract specification for a domain and codegen a purpose-built database engine.

Not just databases; this is a failing in our tooling more generally. "All" you need to do is code something flexible (in contxt, a feature-rich database), describe the (maybe probabilistic) things you know at compile time, and partially evaluate the first program optimally according to those constraints.

It's the same idea behind the Futamura projections, and unless there's something special that makes tailor-made databases significantly less diverse than arbitrary programs I doubt you'll see solutions here that allow you to mix and match very many components until the general case is also solved.


You have a middle ground which is a normal SQL database and a specific indexing engine tailored to your business case.

We do that for Cheméo, we use SQLite (or if the customer want PostgreSQL) as data store and query engine where it fits but some special indexes for some particular queries like a similarity search[0].

The advantage is that you can use the battle tested SQLite/PostgreSQL (companies love this "standard") and on top your domain specific engine which is stateless (we rebuild the indexes directly from the database, this way we have only one single source of truth and backup is easy).

[0]: https://www.chemeo.com/similar?smiles=CCC%28CCO%29CCC


See also: https://news.ycombinator.com/item?id=32788840

TigerBeetle is designed to keep running even if all machines are experiencing radioactive levels of local storage corruption, or else shutdown safely when it detects that it must. We use automated testing to test TigerBeetle with read/write storage fault injection levels as high as 20-30%. On the other hand, this invariant is not typically given by other engines, per the storage fault research that's come out of UW-Madison the past few years. For example, “Protocol-Aware Recovery for Consensus-Based Storage”.

While other engines may have incredible test suites built up over years and years, they were also designed mostly before the advent of autonomous Deterministic Simulation Testing (think Jepsen except you can speed up time and replay bugs, for example, that would otherwise take 10 years to manifest in real time), which is a showstopper.

Finally, we wanted TigerBeetle to be highly available and distributed. TigerBeetle can run across 3 availability zones with 2 replicas in each, with seamless failover. You can stay running even if you lose a whole AZ plus another replica, thanks to Heidi Howard's Flexible Quorums. Again, this problem is not as simple as simply slapping on RAFT for distribution, because “Redundancy Does Not Imply Fault-Tolerance”, and because RAFT makes concessions around storage faults and dueling leaders for the sake of the readability of the paper, that we didn't want to make for TigerBeetle's actual implementation, hence our choice of Viewstamped Replication (MIT, '88, '12).



Wow! I did not expect this to end up on HN. This thread was meant as an informal introduction to a few things I'm excited about regarding TigerBeetle, which was just spun out as a company last week and is where I now work.

Some of the thread is a bit hand-wavy so feel free to subscribe to the mailing list [0] for less hand-wavy posts on the subject in the future. :)

[0] https://mailchi.mp/8e9fa0f36056/subscribe-to-tigerbeetle


Geez Phil! You do this every time. Every week one of your post makes it to HN and you get suprised by it. At this point this is just normal.


Blog posts are one thing, this wasn't even a blog post though!


I always love it when this happens. Watching Phil's reaction!


TigerBeetle is a beautiful example of why correctness is what we should strive for, instead of hyper-focusing on (and settling for) memory safety.

The design document shows the types of faults they aim to cover:

https://github.com/tigerbeetledb/tigerbeetle/blob/main/docs/...

And here you can see the safety section in their style guide:

https://github.com/tigerbeetledb/tigerbeetle/blob/main/docs/...


Who or what hyper-focuses on memory safety? (And settles for it?) Pray tell, what could the Zig VP of community be alluding to out of the blue without any prompt whatsoever?


Just read any other HN Zig thread and you will find an easy answer to your question :^)


...and there's also an open bug bounty so if you think that Zig is not a meaningful improvement over C in terms of safety, it's certainly going to be easy money for you :^)

https://github.com/tigerbeetledb/viewstamped-replication-mad...


Look at the list of things you can't submit.

Excludes: reconfiguration protocol, snapshot, recovery protocol. Todo items, things they know about but can't/won't fix. Excludes any kind of security issue.

Easy money it is not but this bounty program doesn't mean what you think it does.


And yet there's so much that is there!

The Normal protocol. The View Change protocol. The CTRL protocol from PAR (that you don't get to see often). Thousands of lines of code that are incredibly hard to get right.

All the fault models. The storage fault model alone is also not something you find many distributed systems attempting, let alone paying bounties for.

It's also not common to find bounties that go out of their way to help you. TigerBeetle's bounty ships with a state of the art Deterministic Simulation fuzzing tool that you can use to explore interesting state spaces quicker. It will even classify bugs as liveness or correctness for you. It's like your own Jepsen, except you can inject storage faults, speed up time, and replay anything you find from a seed.

Again, the only reason we were explicit about scope really, is because of our own experience doing bounty programs that were underspecified. For example, while it should be clear enough that this is a distributed systems and consensus bug bounty challenge, literally called “Viewstamped Replication Made Famous”, we didn't want anyone to be confused and think it was a security bug bounty. That's the only reason it's excluded, because we want people to break our consensus. Nevertheless, we do have small awards for interesting findings.

So I hope you'll give it a shot! We'd love to announce and award your findings. For example, why not take on the challenge during HYTRADBOI's database jam?

https://www.hytradboi.com/jam


Seems to me that memory safety bugs (eg UAF) can be reported as they would certainly impact correctness.


If you work for one month and you are at the mercy of the other party to decide if they need to pay you, it's not any more easy money than the lottery.


I've definitely been there! Finding P1s for full read/write access and then seeing the report downgraded to a P3, and having to have the platform arbitrate and bump it back to P1.

However, it was this experience of mine as a part-time security researcher that actually led to us creating the bug bounty program for TigerBeetle's consensus.

For example, if you're looking at another database and find a correctness bug, there might not be a bounty program at all. Whereas with TigerBeetle, there hasn't been a single valid report that we haven't awarded, at least so far.

It's also why we were careful to rather be upfront and explicit about scope, than disappoint anyone after the fact.

And we recognize that consensus is hard and takes time to learn, hence the $8192 award for correctness finds.

That said, I hope you can see from the leaderboard that we've been generous. For example, Alex Miller found a bug in Apple's O_DSYNC and we nevertheless awarded $1024 because it was such a great find (Apple thought so too!).


$1024 is half my daily rate. I doubt someone find "a great find" in 4 hours. And not even time that is guaranteed to be paid.


I think they are pretty open about what they can afford to pay. If you have self selected to be worth $2000 a day due to expertise or necessity, then you know that you're not a good fit for them.

However many people are willing to put in the work, so why are you so critical of their program.

Do you think they are taking advantage of people who should charge more? Or do you think they will not get anyone good for such a low rate, and thus fool themselves into thinking they are secure?


To be clear, this was out of scope of the bounty, it was a bug in Apple, that TB awarded anyway.


100%


This involves a serious, perhaps even fatal, misunderstanding.

It was not impossible to write safe C. That's sort of ironically illustrated by WUFFS. WUFFS-the-library is literally C source code. C source code for a safe, very high performance data processing library. A human could have written that library in C. But they didn't, humans wrote that software in WUFFS-the-language and the tool which turned that into C to produce WUFFS-the-library.

So, this bounty can't do what you claim. The existence of TigerBeetle, apparently high quality software written in Zig, doesn't mean somehow Zig is how you produce high quality software, any more than the fact somebody beat a Dark Souls game with a guitar controller means the guitar controller makes Dark Souls easier to beat.


And the highest-level, most safe language can be compiled to machine code... you are not proving anything at all. Everyone already understands how compilers work. So they don’t care that Brainfuck can in principle (although not in practice) be written perfectly by people.


Wow a domain specific database in Zig. This is literally my side project right now (though a completely different industry).

Good to see proof you can actually make a company out of it.


Thanks!



How well would TB fare as a more general datastore for tasks that aren't specifically accounting? I guess the hard part may be mapping the source domain model neatly into a double-entry accounting representation. Some diverse examples off the top of my head (and my guess): chat messages (no); logging (no); application metrics (yes, but it doesn't fit very well because metrics can trade a lot of features for throughput?); tracking inventory (definitely); tracking deployment and status of cloud resources [like k8s' etcd] (probably?).


You could one day replace the state machine with your own and have, for example, Redis.


> TB comes with flags for things like rejecting transfers where debits would exceed credits on an account.

What when the inevitable happens and this business constraint needs to be loosened. What should have been business logic is now database schema, what do you do then? Or did I misunderstand what it's implementing?


You misunderstand. It's a database built purposely for ensuring transactions happen in an ordered manner an as correct of a way as possible. The constraints and basic rules of accounting are generally pretty slowly changing so it makes sense in a domain that has:

* Very slowly changing validity requirements. Things like (optionally) preventing overspending are things that are pretty intrinsic requirements of the domain (depending on the context, obviously many banks allow overdrafts so they can charge you a fee for spotting you the difference). The validation space is relatively small such that you'll want to allow most transactions that aren't blatantly wrong

* Emphasizes correctness and ordering over everything else

* The transaction rate is relatively inelastic, that is there aren't huge swings on how many transactions are requests per second such that dynamic scaling is not a desirable property to have if it means being less reliable/correct. Conversely, it's a known slow operation that most people are fine with not being instant. Ex. I'd much prefer my rent payment take 10s to process and be more or less guaranteed to be correct than it take 1s and not.

That's why it's domain-specific. You can optimize for the things you care about in your domain most while reducing operational complexity to the minimum viable to meet those demands.

It's amazingly well-focused and solves what I assume are hard, limiting problems that the existing financial infrastructure is not equipped to support.


And in zig lang which is quite interesting


Isn't this basically the premise of FoundationDB? I'd argue if you know your access patterns and they're not going to change you're always going to be better off using a key/value store and appropriately designing a schema around those access patterns.


Basically, and we're huge fans of FoundationDB!

As you zoom in, you will see differences. For example, TigerBeetle's data structures are all cache line aligned, and we use static allocation etc. The storage fault model is significantly different.

We use the same testing techniques though. FDB are pioneers in the space.


As someone who has used Zig in anger for the last year (to build an in-memory db coincidentally). I find the idea of using it in production, especially for financial data, absolutely unsettling. The language is awesome, and developing rapidly, but you don't have to wander far to hit a compiler bug, there is no advanced testing infrastructure a la quick-check and friends, and documentation and documentation tooling except for the core reference is non existent.

I see a bright future ahead for the language, and I'm glad that I got a Zig version on release day, but if you want to write production code, you might wanna stick to Rust or C for now, or put a small team on just improving the ecosystem.


We think carefully about this.

We do in fact have some extremely advanced testing infrastructure, even going as far as using a deterministic Linux hypervisor to do coverage guided fuzzing of our compiled binaries from the outside in. At the same time, we fuzz from the inside out, also Deterministic Simulation Testing, with a ton of assertions (literally a 1000+ and counting) as a force multiplier for fuzzing. Our experience in all this, is that while Zig is early in terms of timeline, the quality is nevertheless extremely high.

Andrew and team know what they're doing. They've got some of the best people in the world in their respective fields. For example, Frank Denis of libsodium heading up Zig's crypto.

Beyond this, we do the basic things like consciously restrict our use of language features to stable features only (which may be why our experience has not been the same as yours with respect to compiler bugs?), and invest in our own I/O stack around io_uring instead of depending on the std lib, which we know will churn.

Two of our team are Zig core team members and we sponsor the Zig Software Foundation to invest in the ecosystem. Don't forget also that Zig's ecosystem is really C's ecosystem so there's an escape hatch there.

Also, TB is not yet production ready. We'll ship our production release when we're confident that our TB binary is safe, or able to shut down if it detects any safety violation.

At the end of the day, databases are a big investment. It's important to think about the next 20 to 30 years. C would absolutely have been the wrong choice for that future.

As you say, the future is bright for Zig, and we're convinced that Zig is the right choice for a database that follows embedded coding standards with respect to memory. For example, TigerBeetle has to handle memory allocation failure, and only does static memory allocation so we never malloc or free after startup.


I got a minor glimpse into the mindset of the TB team a while ago -- on the consensus bits and TB's choice of protocol -- and was impressed with the rigorous and ++informed approach that guided the design of that aspect of the system. Joran even managed to create a Garbo Speaks moment by finding Brian Oki and interviewing him. They are quite thorough! /g

https://www.youtube.com/watch?v=ps106zjmjhw

(Joran, I am happy to see your team and product are getting the exposure that imho they certainly deserve.)


Thanks @eternalban!

Awesome to read your comment here—appreciate the well wishes!


Thanks for the insight on how you mitigate these issues!

We're doing a lot of data-structure work involving bit stuffing, i.e. packed struct, which is very unfinished indeed.


Thanks, it's a pleasure! Yes, with packed structs it's important to keep things carefully aligned.

Have you seen this post [1] about struct packing? This is what we did for TB's structures, so that we can switch them to `extern struct` for C ABI compatibility. It also helped side step the packed struct bugs.

[1] http://www.catb.org/esr/structure-packing/


I feel like there is only one aspect mentioned that is actually a domain specific thing (the double entry bit)... everything else is just a list of features that all sorts of general purpose databases have... am I missing something?


It also holds balances of accounts.


most things don't really have a global intrinsic ordering outside of transaction serialization.


You're right.


I spent way too long wondering why someone would need a special database to store info about the subfamily of Cicindelinae instead of just reading that this is some kind of special accounting database.


ScyllaDB would be much more interesting as a database of sea monsters. Not sure how to feel about couchdb or cockroachdb


Some of these recent database names are sure becoming *ahem* interesting. At this point I'm considering created a database named MurderHornetDB just to avoid missing out on the fun.


Make it open source so I can fork it and release it as AfricanizedMurderHornetDB




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: