Hacker News new | past | comments | ask | show | jobs | submit login

> We haven't discussed other solutions like Partitioned Elias-Fano indexes or Tree-Encoded Bitmaps. We simply did not investigate them. There are no efficient off-the-shelf implementations of these solutions, so we had to implement one.

I guess they mean there are no implementations in Rust? The Partitioned Elias-Fano (PEF) paper links to the C++ implementation ([1], which I wrote) used for the experiments, it was efficient 9 years ago :) The library was later forked [2] and used in a few other papers, I believe it is somewhat maintained. The EF core algorithm implemented in folly [3] may be a bit faster, and implementing partitioning on top of that is relatively easy.

It would definitely compress much better than roaring bitmaps. In terms of performance, it depends on the access patterns. If very sparse (large jumps), PEF would likely be faster, if dense (visit a large fraction of the bitmap), it'd be slower.

It is possible to squeeze a bit more compression out of PEF by introducing a chunk type for Elias-Fano of the chunk complement (for very dense chunks), but you lose the operation of skipping to a given position, which is however not needed in inverted indexes (you only need to skip past a given id, and that can be supported efficiently). That is not mentioned in the paper because at the time I thought the skip-to-position operation was a non-negotiable.

[1] https://github.com/ot/ds2i/

[2] https://github.com/pisa-engine/pisa

[3] https://github.com/facebook/folly/blob/main/folly/experiment...




Yes, just to confirm there was no off-the-shelf implementation in Rust.

We know the PISA project very well because... it is the fastest engine (though academic) in the tantivy benchmark: https://tantivy-search.github.io/bench/

But I did not know that they were using Partitioned Elias-Fano, maybe the author of the article fulmicoton knew that, I will check that.


Haha I regenerated the results something like last week, and the PISA engine is not in the list anymore :)


any chance this datastructure could be split off into a separate crate? or perhaps your modification ported back to roaring?


Unrelated to article but thanks for publicising the work on PEF. It’s one of those papers and codebases I scan though every now and then attempting to understand it better. I’ll get there one day and hopefully be able to reimplement it myself, for no real purpose other than understanding.


Btw, core EF is quite efficient (perf wise) on the decoding side even on GPUs. I wanted to do PEF, but that seemed a bit more involved and I didn't have the time to do it. Here's a GPU implementation for graph problems if anyone is interested: https://github.com/pgera/efg. I also used folly on the encoding side and it works great.


> It would definitely compress much better than roaring bitmaps. In terms of performance, it depends on the access patterns. If very sparse (large jumps), PEF would likely be faster, if dense (visit a large fraction of the bitmap), it'd be slower.

Just for clarification you mean the access pattern is sparse and not the data itself? How is that relevant?

I did part of the investigation and implementation, but didn't look much into elias fano coding. The select for the dense codec is already really fast (with popcount), there's not much room for improvement on the instruction side (https://godbolt.org/z/dq7WeE66Y), except implicitly by touching less memory. The sparse codec with its binary search should be easy to beat though. Partitioned Elias-Fano indexes may be a superior choice in contrast to the sparse codec in terms of rank and compression, and probably less so for select and code complexity.


> How is that relevant?

Roaring bitmaps and similar data structures get their speed from decoding together consecutive groups of elements, so if you do sequential decoding or decode a large fraction of the list you get excellent performance.

EF instead excels at random skipping, so if you visit a small fraction of the list you generally get better performance. This is why it works so well for inverted indexes, as generally the queries are very selective (otherwise why do you need an index?) and if you have good intersection algorithms you can skip a large fraction of documents.

I didn't follow the rest of your comment, select is what EF is good at, every other data structure needs a lot more scanning once you land on the right chunk. With BMI2 you can also use the PDEP instruction to accelerate the final select on a 64-bit block: https://github.com/facebook/folly/blob/main/folly/experiment...


>Roaring bitmaps and similar data structures get their speed from decoding together consecutive groups of elements, so if you do sequential decoding or decode a large fraction of the list you get excellent performance. EF instead excels at random skipping, so if you visit a small fraction of the list you generally get better performance. This is why it works so well for inverted indexes, as generally the queries are very selective (otherwise why do you need an index?) and if you have good intersection algorithms you can skip a large fraction of documents.

There's no sequential decoding in our variant, every access is independent. The roaring bitmap variant is used only for the optional index (1 docid => 0 or 1 value) in the columnar storage (DocValues), not for the inverted index. Since this is used for aggregation, some queries may be a full scan.

The inverted index in tantivy uses bitpacked values of 128 elements with a skip index on top.

> I didn't follow the rest of your comment, select is what EF is good at, every other data structure needs a lot more scanning once you land on the right chunk. With BMI2 you can also use the PDEP instruction to accelerate the final select on a 64-bit block

The select for the sparse codec is a simple array index access, that is hard to beat (https://github.com/quickwit-oss/tantivy/blob/main/columnar/s...). Compression is not good near the 5k threshold though. PDEP is currently deactivated since ryzen before Zen3 was really slow.

Creation speed is also quite important, do you know how "Partitioned Elias-Fano" performs there?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: