Hacker News new | past | comments | ask | show | jobs | submit login

What's cosine similarity, what do you use it for and why is it good to have it into your db instead of somewhere else (like a lib)?



Euclidean distance stops "making sense" as the number of dimensions goes up: https://stats.stackexchange.com/questions/99171/why-is-eucli...

Cosine similarity measures the angle between two vectors instead, and doesn't suffer from the curse of dimensionality.

I guess it's important to have this in your DB, so you make "nearby" queries (give me text that's similar to this other text) in an efficient way.


Cosine similarity suffers from a curse of dimensionality, just as distance does (it’s just one dimension less). The angle between two random vectors in N dimensions approaches zero with a power of N. The main reason this metric is useful in practice is because it better relates to how certain neural networks use/train their embeddings internally.


Oh yeah, dimensionality curse wasn't the reason for cosine, it was the NN output.


Exactly. Some other context implied here, these vectors were ML embeddings. In that case, like 100 dimensions that vaguely represented our input data in compact form. There was probably a better solution out there, this was just the most readily available for us.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: