Isn't one big problem of vector search that it is bounded by the complexity of the knn problem in multiple dimensions, whereas keyword search like bm25 is very fast and scalable because of inverted indices? Annoy and others need to sacrifice accuracy for speed I believe. Hybrid search can be used to first filter a smaller number of candidates and then rerank them using vector search. Performance wise, it is naturally pretty slow and hard to scale compared to keyword search.
The elephant in the room is the Curse of Dimensionality. As the number of vector dimensions increase, "nearest neighbor" search behaves in an surprising manner. In a high-dimensional cube randomly sampled vectors are mostly near the edge of the cube, not at all what you expect based on two or three dimensions. There's other hitches, too.
When using vector search, you're making two leaps - that your document can really be summarized in a vector (keeping its important distinguishing semantic features), and that it's possible to retrieve similar documents using high-dimensional vectors. Both of these are lossy and have all sorts of caveats.
What would be great is if these new vector DB companies show examples where the top-ten "search engine results page" based on semantic embedding / vector search is clearly better than a conventionally-tuned, keyword-based system like ElasticSearch. I haven't see that, but I have seen tons of numbers about precision/recall/etc. What's the point of taking 10ns to retrieve good vector results if its unclear if they're solving a real problem?
These vectors are lower-dimensional than traditional vectors though, aren't they? Vector embeddings are in the hundreds to low thousands range of dimensions (roughly between 128-1024), whereas TF-IDF has the same dimension as your vocabulary. It's also not just about being flat-out better, but about increasing the recall of queries, as you're grabbing content that doesn't contain the keywords directly, but is still relevant. You are also free to mix the two approaches together in one result set, which gives the best of both.
The problems with dimensionality certainly show up even with 256 dimensions. PCA-ing down to a few hundred dimensions is still a problem, and then you have to deal with PCA lossiness too!
You can try the 3 methods: keyword, knn, and hybrid on a collection in the AI domain here https://search.zeta-alpha.com/. YMMV sometimes knn has more interesting results, sometimes keyword is closer to what you wanted. Hybrid is sometimes also a good middle ground. But it fails when one of the retrievers has very bad results (because it will interleave them)
If you only have to sacrifice a tiny bit of accuracy, it's not really a problem is it? You can quantify these things experimentally to see whether it's acceptable for your use case.
Given that vector and hybrid search are used in many situations at scale, I think to make the kind of claims you're trying to make, it might help to quantify exactly when you think it will be too slow or what experiences you have had of struggling to make it fast enough.
We (at Weaviate) support separate BM25, vector, and combined hybrid search, so I can talk about the performance aspect of all of these a lot. The tl;dr is: It's complicated. What you say, may be true in some cases, but not in others. One does not always scale better than the other.
For example, ANN indexing with an index like HNSW scales pretty much logarithmically. A bit oversimplified, but roughly we see search times double with every 10x of the search space. In addition, calculations like euclidean distance and dot product (and therefore cosine sim) can be parallelized very well at a hardware level, e.g. AVX2/AVX512/neon and similar SIMD techniques.
With BM25, this isn't quite as simple. We have algorithms such as WAND and Block-Max-WAND, which help eliminate _scoring_ elements that cannot reach a top-k spot. However, the distribution of terms plays a big role here. As a rule of thumb, the rarer a word is the fewer documents require scoring. Let's say you have 1B documents, and a 2-term query, but each term matches only 100k documents. If you AND-combine those queries, you will have at most 100k matches, if you OR-combine them, you will have at most 200k. The fact that there were 1B documents indexed played no role. But now think of two terms that each match 500M objects. Even with the aforementioned algorithms – which rely on the relative impact of each term – there is a risk we would now have to score every single document in the database. This is, again, over-simplified, but my point is the following:
ANN latency is fairly predictable. BM25 latency depends a lot on the input query. Our monitoring shows that in production cases, when running a hybrid search, sometimes the vector query is the bottleneck, sometimes the BM25 query is.
> Hybrid search can be used to first filter a smaller number of candidates and then rerank them using vector search
My post is already getting quite long, but want to quickly comment on this two-step approach. For Weaviate, that's not the case. BM25 and vector searches happen independently, then the scores are aggregated to combine a single result set.
Yes, you could also use BM25 as a filter set first, then re-rank the results with embeddings, however you wouold lose the BM25 scores. If you want to use keywords just as filters, you can do that much, much cheaper than through BM25 scoring. In Weaviate matching keywords to create a filter set, would only require AND-ing or OR-ing a few roaring bitmaps which is orders of magnitides more efficient than BM25 scoring.