What's cosine similarity, what do you use it for and why is it good to have it i...

pxndxx · 2024-01-26T11:06:05.000000Z

Euclidean distance stops "making sense" as the number of dimensions goes up: https://stats.stackexchange.com/questions/99171/why-is-eucli...

Cosine similarity measures the angle between two vectors instead, and doesn't suffer from the curse of dimensionality.

I guess it's important to have this in your DB, so you make "nearby" queries (give me text that's similar to this other text) in an efficient way.

pama · 2024-01-26T18:05:42.000000Z

Cosine similarity suffers from a curse of dimensionality, just as distance does (it’s just one dimension less). The angle between two random vectors in N dimensions approaches zero with a power of N. The main reason this metric is useful in practice is because it better relates to how certain neural networks use/train their embeddings internally.

hot_gril · 2024-01-26T23:48:42.000000Z

Oh yeah, dimensionality curse wasn't the reason for cosine, it was the NN output.

hot_gril · 2024-01-26T17:48:30.000000Z

Exactly. Some other context implied here, these vectors were ML embeddings. In that case, like 100 dimensions that vaguely represented our input data in compact form. There was probably a better solution out there, this was just the most readily available for us.