Tunable Delete Aware LSM Engine

nitrobeast · on Aug 8, 2020

It is not clear expiration time per layer is much better than the commonly used strategy of compacting to a single layer periodically.

For KiWi, storing deletes separately could be interesting but I’m not sure about its exact disk layout and how the resorting works.

gopalv · on Aug 8, 2020

> For KiWi, storing deletes separately could be interesting

The deletes in different list is super useful in row-major mode if the data bandwidth is an issue and the delete lists can entirely remove the row from the IO.

Apache Hive switched to splitting up the deletes and insert files a while back to get some row-read savings when parallelizing reads - this is pretty much standard tombstoning (but with an immutable fs), but the LSM style merge sucks when you have a scan project/SELECT + schema evolution across different files.

The delete records always have a fixed schema and a single insert sequence is compacted/inserted in the same schema, which prevents the need to switch schema between rows when being read.

Even if the schema rarely changes, but just assuming it can happen and checking for it is slow across row versions of the same row.

jeffffff · on Aug 8, 2020

assuming that data is being inserted continuously at the same rate, periodic full compaction of an lsm tree with an upper bound on time between full compactions is O(n^2). depending on the quantity and distribution of deletes, you can potentially do much better with incremental compaction strategies while still maintaining an upper bound on the delay in processing a delete.

wizzwizz4 · on Aug 8, 2020

This is pretty impressive. I didn't expect we'd get another innovation in database architectures for a while.

Do you think SQLite 4 will adopt this?

pstuart · on Aug 8, 2020

Work on SQLite 4 has concluded: https://news.ycombinator.com/item?id=15648280