Hacker News new | past | comments | ask | show | jobs | submit login

Do you guys know any way to calculate the intersection of two HLL instances? The algorithm allows merging two sets but it's not that easy to take the intersection of them. A + B - (A U B) is not the optimal way so I would love to hear your suggestions.



Neustar Research has a good article on this which may be of interest to you. However, I think the approach they took was still based on the inclusion-exclusion principle (still a good read though).

* https://research.neustar.biz/2012/12/17/hll-intersections-2/

Another approach if you have the flexibility of changing your implementation, is to use a different type of data-sketch that is more amenable to set expressions. There is some discussion (and references) here:

* https://datasketches.github.io/docs/Theta/ThetaSketchFramewo...

* https://datasketches.github.io/docs/Tuple/TupleOverview.html


I have a way will post it soon


Would love to hear about the solution, are you going to implement it or publish an article?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: