Cool stuff but why didn't you just use an existing LMDB wrapper or write a new o...

benbjohnson · on Dec 11, 2014

We originally used szferi/gomdb which worked well but it didn't give us performance visibility with "go pprof" and it was difficult to debug at times. Bolt also takes the approach of simplifying the LMDB API significantly. There are a lot of advanced, unsafe features that aren't available in Bolt (e.g. WRITEMAP, NOMETASYNC). There's more detail in the README:

https://github.com/boltdb/bolt#lmdb

Bolt was originally just a side project but we found that it fit our needs for our analytics database so we eventually swapped out LMDB for Bolt.

_wmd · on Dec 11, 2014

This is a fairly minor nit, but a persistent one nonetheless: you keep claiming to have removed unsafe features from LMDB, yet you left the most dangerous one in there: NOSYNC provides no consistency guarantees whatsoever, whereas e.g. with NOMETASYNC you merely lose durability. In particular NOMETASYNC retained crash safety, which you don't get at all with NOSYNC.

The way this is worded, it sounds like an improvement, but in reality bolt is no safer to use than LMDB was, it just lacks some configurable tradeoffs that were present in the original engine, and were safer to use than those that remain in bolt.

That said, I'm happy to see APPEND go ;)

benbjohnson · on Dec 11, 2014

That's a fair point. Although I'd argue that WRITEMAP and APPEND are more scary. :)

I try to point out in the docs that DB.NoSync should only be used for offline bulk loading where corruption is typically less of an issue (since you can simply rebuild).

https://godoc.org/github.com/boltdb/bolt#DB

The wording was intended to make it sound like an improvement for people wanting smaller, more concise code bases. Bolt is 2KLOC vs LMDB's 8KLOC. Smaller surface area makes it easier to test thoroughly. It's definitely a tradeoff for people who want that really low level control though.

hyc_symas · on Dec 12, 2014

LMDB author here - what have you got against APPEND? Fast bulk-loading is pretty important when you're initializing a system.

Note that APPEND mode won't allow you to corrupt the DB, despite what the doc says - you'll get a KEYEXIST error if you try to load out of order. We just left the doc unchanged to emphasize the importance of using sorted input.

benbjohnson · on Dec 12, 2014

Thanks, Howard. That's good to know. The docs always scared me away from using it at all.

hyc_symas · on Dec 12, 2014

Perhaps it's time for us to update the doc to reflect reality...

zura · on Dec 12, 2014

Hey Ben, what (refs/books/lectures) would you recommend for someone who is interested to learn theory behind these kind of DBs and implement it from scratch?

benbjohnson · on Dec 12, 2014

I originally started Bolt to learn about the implementation details of low-level data stores. It started as a pet project but became more serious as I filled it out and added additional layers of testing.

I'd honestly suggest reading through the source code of some database projects. That's the best way to learn the structures and understand the engineering tradeoffs. Some are more approachable than others though. Bolt and LMDB are 2KLOC and 8KLOC, respectively. LevelDB is 20KLOC and RocksDB is 100KLOC, IIRC.

If you want to go further then I'd suggest porting one of those over to a language you know well. It doesn't have to be code you release to the world or use in production but just porting the code really helps to cement the concepts in my head.

hyc_symas · on Dec 12, 2014

To get the theory, you could just start on Wikipedia. http://en.wikipedia.org/wiki/B+_tree

To understand why the LMDB implementation does what it does will require much more practical knowledge, not theory. The references in my MDB papers (and the paper itself of course) will give you more of that. http://symas.com/mdb/#pubs

B+trees are trivially simple, in theory. Writing one that is actually efficient, safe, and supports high concurrency requires more than theory. It requires understanding of computer system architecture: CPU architecture/cache hierarchies, filesystem and I/O architecture, memory, etc.

It does not require 100KLOC though. RocksDB is just a pile of gratuitous complexity.

zura · on Dec 12, 2014

Thanks for the good points! Bolt and LMDB seem approachable, and I'm especially interested how ACID compliance is implemented and how to check/prove the correctness of it...