Fast Ordered Collections for Swift Using In-Memory B-Trees

lemoncucumber · on March 2, 2016

Sounds very similar to the copy-on-write-friendly b-trees used in Btrfs, originally researched by by Ohad Rodeh: http://liw.fi/larch/ohad-btrees-shadowing-clones.pdf

lumpypua · on March 2, 2016

Absolutely awesome readme, comprehensive and covers an huge chunk of background. I know more about in-memory tree data structures than when I started. Beautiful API. This is how a data structure library should be done.

rurban · on March 2, 2016

I'm very sceptical that in-mem B Trees can beat hashes, given their huge size overhead and cache unfriendlyness

m_eiman · on March 2, 2016

Quote the readme: "B-trees were originally invented in the 1970s as a data structure for slow external storage devices. As such, they are strongly optimized for locality of reference: they prefer to keep data in long contiguous buffers and they keep pointer derefencing to a minimum. (Dereferencing a pointer in a B-tree usually meant reading another block of data from the spinning hard drive, which is a glacially slow device compared to the main memory.)"

Sounds like it could be pretty cache friendly. Besides, a B-tree can be used to implement a hash/dict/map.

rurban · on March 2, 2016

I know b-trees. Still patricia trees or the optimized variant judy hashes are more cache friendly, and non fucked-up hashes even more. For OrderedDict it makes sense, but I would still consider judy or patricia better.

lorentey · on March 5, 2016

Tries are awesome; but they're more specialized. So, as long as you have to choose which one to implement first (and I do), B-trees provide better bang for the buck.

lorentey · on March 5, 2016

B-trees don't beat hash tables at their own game, but hash tables don't beat B-trees at theirs either. Both are general-use data structures that have their particular niches. The trick is to always select the correct tool for the job.

You might be thinking about red-black trees. The size overhead of B-trees is the same or better than that of hash tables. As for cache friendliness, B-trees were explicitly designed to work great on two-level storage; the advantage is clearly theirs.