Cosine similarity suffers from a curse of dimensionality, just as distance does (it’s just one dimension less). The angle between two random vectors in N dimensions approaches zero with a power of N. The main reason this metric is useful in practice is because it better relates to how certain neural networks use/train their embeddings internally.