Hacker News new | past | comments | ask | show | jobs | submit login

We used a “popcount” style operation + bitwise AND to determine how similar two people were by interest. We had a bitmap where each bit represented an interest and 1 meant they had that interest and 0 meant they did not. Bitwise AND filters out the non overlapping interests and popcount to measure the “intensity” of their similarity (to tease out people who had almost every or almost no interests selected matching others like that — bell curve kind of thing).



generally, popcnt is useful to implement the Jaccard Distance [1]:

    J(X,Y) = |X∩Y| / |X∪Y|
So, implementing the sets X and Y as bitsets it becomes:

[1]: https://en.wikipedia.org/wiki/Jaccard_index

    J(X,Y) = popcnt(X&Y) / popcnt(X|Y)


If anyone needs that to go fast, for fixed-length (up to ~4k bits) try out my code, chemfp at http://chemfp.com/ . It's designed for cheminformatics, but can work with any data set which can be described by a fixed-length byte string and an identifier.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: