Switching to Word2Vec embeddings led to a substantial improvement in my cosine similarity evaluations for text similarity, but granted I was looking for actual similarity, not relevance. I tried many different methods and had lots of mediocre results initially.
Interesting, do you happen to have some quantitative results on this/additional insights/etc?
I've interpreted transformer vector similarity as 'likelihood to be followed by the same thing' which is close to word2vec's 'sum of likelihoods of all words to be replaced by the other set' (kinda), but also very different in some contexts.
code: https://github.com/jimmc414/document_intelligence/blob/main/... https://github.com/jimmc414/document_intelligence