> We haven't discussed other solutions like Partitioned Elias-Fano indexes or Tr...

francoismassot · on July 2, 2023

Yes, just to confirm there was no off-the-shelf implementation in Rust.

We know the PISA project very well because... it is the fastest engine (though academic) in the tantivy benchmark: https://tantivy-search.github.io/bench/

But I did not know that they were using Partitioned Elias-Fano, maybe the author of the article fulmicoton knew that, I will check that.

fulmicoton · on July 3, 2023

Haha I regenerated the results something like last week, and the PISA engine is not in the list anymore :)

WinLychee · on July 4, 2023

any chance this datastructure could be split off into a separate crate? or perhaps your modification ported back to roaring?

boyter · on July 1, 2023

Unrelated to article but thanks for publicising the work on PEF. It’s one of those papers and codebases I scan though every now and then attempting to understand it better. I’ll get there one day and hopefully be able to reimplement it myself, for no real purpose other than understanding.

pgera · on July 2, 2023

Btw, core EF is quite efficient (perf wise) on the decoding side even on GPUs. I wanted to do PEF, but that seemed a bit more involved and I didn't have the time to do it. Here's a GPU implementation for graph problems if anyone is interested: https://github.com/pgera/efg. I also used folly on the encoding side and it works great.

PSeitz · on July 2, 2023

> It would definitely compress much better than roaring bitmaps. In terms of performance, it depends on the access patterns. If very sparse (large jumps), PEF would likely be faster, if dense (visit a large fraction of the bitmap), it'd be slower.

Just for clarification you mean the access pattern is sparse and not the data itself? How is that relevant?

I did part of the investigation and implementation, but didn't look much into elias fano coding. The select for the dense codec is already really fast (with popcount), there's not much room for improvement on the instruction side (https://godbolt.org/z/dq7WeE66Y), except implicitly by touching less memory. The sparse codec with its binary search should be easy to beat though. Partitioned Elias-Fano indexes may be a superior choice in contrast to the sparse codec in terms of rank and compression, and probably less so for select and code complexity.

ot · on July 2, 2023

> How is that relevant?

Roaring bitmaps and similar data structures get their speed from decoding together consecutive groups of elements, so if you do sequential decoding or decode a large fraction of the list you get excellent performance.

EF instead excels at random skipping, so if you visit a small fraction of the list you generally get better performance. This is why it works so well for inverted indexes, as generally the queries are very selective (otherwise why do you need an index?) and if you have good intersection algorithms you can skip a large fraction of documents.

I didn't follow the rest of your comment, select is what EF is good at, every other data structure needs a lot more scanning once you land on the right chunk. With BMI2 you can also use the PDEP instruction to accelerate the final select on a 64-bit block: https://github.com/facebook/folly/blob/main/folly/experiment...

PSeitz · on July 3, 2023

>Roaring bitmaps and similar data structures get their speed from decoding together consecutive groups of elements, so if you do sequential decoding or decode a large fraction of the list you get excellent performance. EF instead excels at random skipping, so if you visit a small fraction of the list you generally get better performance. This is why it works so well for inverted indexes, as generally the queries are very selective (otherwise why do you need an index?) and if you have good intersection algorithms you can skip a large fraction of documents.

There's no sequential decoding in our variant, every access is independent. The roaring bitmap variant is used only for the optional index (1 docid => 0 or 1 value) in the columnar storage (DocValues), not for the inverted index. Since this is used for aggregation, some queries may be a full scan.

The inverted index in tantivy uses bitpacked values of 128 elements with a skip index on top.

> I didn't follow the rest of your comment, select is what EF is good at, every other data structure needs a lot more scanning once you land on the right chunk. With BMI2 you can also use the PDEP instruction to accelerate the final select on a 64-bit block

The select for the sparse codec is a simple array index access, that is hard to beat (https://github.com/quickwit-oss/tantivy/blob/main/columnar/s...). Compression is not good near the 5k threshold though. PDEP is currently deactivated since ryzen before Zen3 was really slow.

Creation speed is also quite important, do you know how "Partitioned Elias-Fano" performs there?