I looked at this a few months ago. Elasticsearch has some limitations with respect to the vector length (1024, I think); which rules out using a lot of the popular off the shelf models. A key insight is that the performance of off the shelf models isn't great compared to a hand tuned query and bm25 (the Lucene ranking algorithm). I've seen multiple people make the point that the built in ranking is pretty hard to beat unless you specialize your models to your use case.
A key consideration for the vector size limitation is that storing and working with large amounts of huge vectors gets expensive quickly. Simply storing lots of huge embeddings can take up a lot of space.
And of course using knn with huge result sets is very expensive, especially if the vectors are large. Having the ability to filter down the result set with a regular query and then ranking the candidates with a vector query helps keep searches responsive and cost low.
If you are interested in this, you might want to look at Opensearch as well. They implemented vector search independently from Elasticsearch. Their implementation supports a few additional vector storage options and querying options via native libraries. I haven't used any of that extensively but it looks interesting.
Ah, that’s neat, thank you for the input! We’re actually using a homegrown (word2vec descendant) model to build document vectors - it works well on its own, but implementing a search engine on top efficiently has proven to be futile.
It's been a minute since I've looked at embeddings in this context but isn't Johnson-Lindenstrauss going to be applicable here so that you can get away with 1024-long (or shorter) vectors?
A key consideration for the vector size limitation is that storing and working with large amounts of huge vectors gets expensive quickly. Simply storing lots of huge embeddings can take up a lot of space.
And of course using knn with huge result sets is very expensive, especially if the vectors are large. Having the ability to filter down the result set with a regular query and then ranking the candidates with a vector query helps keep searches responsive and cost low.
If you are interested in this, you might want to look at Opensearch as well. They implemented vector search independently from Elasticsearch. Their implementation supports a few additional vector storage options and querying options via native libraries. I haven't used any of that extensively but it looks interesting.