Hacker News new | past | comments | ask | show | jobs | submit login

Semi-related: Anybody know what's the current state of the art for calculating nearest neighbors in high-dimensional spaces (at least 10D, maybe 100s of D)?



Various approaches based on LSH (locality sensitive hashing) and random projections.

But "the best" really depends on your data profile, as the generic problem formulation, for completely generic vectors, is too… generic. See also the curse of dimensionality [0].

In practical terms, if you're looking for an actual implementation to use, here's a benchmark of some state-of-the-art implementations for high-dimensional vector search: http://radimrehurek.com/2013/12/performance-shootout-of-near... and also the updated https://github.com/erikbern/ann-benchmarks.

In particular, for something good AND open source:

- Leonid Boytsov's NMSLIB [1]

- Erik Bernhardsson's Annoy [2]

- Facebook's FAISS [3]

For a commercial engine focused more on practical use (index management, transactions, versioning, support), see ScaleText [4] (disclaimer: our product).

[0] https://en.wikipedia.org/wiki/Curse_of_dimensionality [1] https://github.com/searchivarius/nmslib [2] https://github.com/spotify/annoy [3] https://github.com/facebookresearch/faiss [4] https://scaletext.com/


Very helpful, many thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: