Nice, I'm a big user of various data stores for scientific work. How does this compare to LMDB (http://www.lmdb.tech/doc/), which uses B-trees (and has a Python interface)?
I haven't done any benchmark yet but I expect my implementation to be at least an order of magnitude slower.
I've found my implementation to be CPU intensive: creating Python objects from the raw pages is expensive. That's why bulk inserts and iterations are much faster than insert/get in a loop.
Why is this not mentioned very clearly in the README? Seems like willful misrepresentation.
You might also mention that, if replacing large values that use overflow pages, the file has the potential to grow without bounds as it looks like overflow pages are not collected?
Are there any production data stores recommended for low memory usage? What should I use to stream data to disk and back with minimal overhead, preferably with indexed lookups?
This is pretty cool. As I was reading the code I really wished for an explanation or simple ASCII diagram of the serialization formats for the various node/record types, as well as for the frames/wal format. Given that the poster is the author of the project, I hope you'll consider filling in these kinds of details, as they'd presumably be of interest to the people you're "Show"-ing this to.
Thank you. I know what you mean, I'm not happy with how the serialization is done right now, it's too complicated.
Maybe someone on HN knows a Python serialization library beyond pickle that would allow to describe how the data is laid out and take care of the rest. It looks like struct is not flexible enough for this usage.