Shameless plug from someone not related to the project. Try https://vespa.ai , fully open-source, very mature hybrid search with dense and approximated vector search. A breeze to deploy and maintain compared to ES and Solr. If I could name a single secret ingredient for my startup, is Vespa.
Vespa looks pretty compelling; indexing looks like a dream.
I'd recommend basically anything else over a customized ES / Solr cluster. Some of the least fun clusters to manage. Great for simple use-cases / anything you see in a tutorial. The moment you walk off the beaten path with them, best of luck.
Solr has its quirks for sure, but I’ve seen multi-terabyte sized indices running with great relevance and performance. I would call it a mechanics search engine. It is very powerful but you need to get your hands dirty.
Super promising, thanks. Will definitely watch for production readiness - this direction is a big deal as we find most data is cold (logs, docs, ..), not hot, and finally starts becoming more relevant than just doing ES or our own!
Dense vector search in Solr is a welcome addition, but getting started requires a lot of pieces that aren’t included.
So I made this a couple months ago to make it super easy to get started with this tech. If you have a sitemap you can start the docker compose and index your website with one command line.
Question: I have ~10,000 128 element query vectors, and want to find the nearest neighbor (cosine similarity) for each of them in a dataset of ~1,000,000 target vectors. I can do this using brute force search on a GPU in a few minutes, which is fast but still a serious bottleneck for me. Is this an appropriate size of dataset and task for acceleration with some sort of vector database or algorithm more intelligent than brute force search?
Use an approximate method like faiss and then do cosine similarity on the results of that.
Short answer is most of these databases uses some type of precomputation to make doing approximate nearest neighbors faster. HNSW[0], FAISS[1], SCANN[2] etc are then all methods of doing approximate nearest neighbors but make use of different techniques to speed up that approximation. For your use case it will likely result in a speed up.
There's no one answer to this, but I'd say that anything past 10k vectors would benefit greatly from a vector database. A vector DB will abstract away the building of a vector index along with other core database features such as caching, failover, replication, horizontal scaling, etc. Milvus (https://milvus.io) is open-source and always my go-to choice for this (disclaimer: I'm a part of the Milvus community). An added bonus of Milvus is that it supports GPU-accelerated indexing and search in addition to batched queries and inserts.
All of this assumes you're okay with a bit of imprecision - vector search with modern indexes is inherently probabilistic, e.g. your recall may not be 100%, but it will be close. Using a flat indexing strategy is still an option, but you lose a lot of the speedup that comes with a vector database.
I hesitate to mention this, because you probably know it and are doing it this way. But some other poster mentioned "and then do cosine similarity". In this case, you're going to want to preprocess and normalize each row of both matrices to have unit norm. Then cosine similarity is simply a matrix multiply between the two matrices (one transposed), and a pass over the results to find the top-k per query using a max queue type data structure.
That wouldn't really help. Let me explain in a bit more detail. The results depends on the query matrix, which will be different for each set of queries. We have a query matrix Q that of dimension 10000x128. And we have another vector matrix A that is 1,000,000x128. We preprocess both Q and A so each row has unit norm:
Q[i,:] /= norm(Q[i,:])
A[k,:] /= norm(A[k,:])
So now with that preprocessing the cosine similarity of a given row i of Q and k of A is:
cossim(i,k) = dot(Q[i,:], A[k,:])
If you multiply QxA.T (10,000 x 128)x(128, 1M) you get a result matrix (10,000 x 1M) with all the cosine similarity values for each combination of query and vector.
If you make a pass across each column with a priority queue, you can find the top-n cosine similarity values in time O(1,000,000xn).
Now you could store the resulting matrix, but Q is going to change for each call, and we really only care about the top-n values for each query, so storing it wouldn't really accomplish anything.
I asked GPT-3 about it using an array of vectors of fragments of this page, weighted by relevance (using np.dot(v1,v2)) to the query. This is used to build the prompt for submission to the OpenAI APIs. I'm interested in storing these vectors in a very fast DB for memories.
pastel-mature-herring~> Could this matrix be compressed to binary form for storage in a binary index?
angelic-quokka|> It is possible to compress the matrix to binary form for storage in a binary index, but this would likely decrease the accuracy of the cosine similarity values.
Agreed with fzliu, you can also use https://weaviate.io (disclaimer, I'm affiliated with Weaviate). You might also like this article which describes why one might want to use a vector search engine: https://db-engines.com/en/blog_post/87
You can fit 1M vectors in the free tier of www.pinecone.io if you want to experiment. I'm not sure how fast having that many query vectors would be. (I'm a happy Pinecone customer, but only use a single query vector.)
I just tried this using my brute force embedding search library that runs on CPUs and it does it in 171s (16-core AMD). How often do you need to do this? This may be shorter than the initial indexing time for most approximate libraries if you aren't doing ad hoc queries.
It was to expect after recent ES releases. However, dedicated vector search engines offer better performance and more advanced features. Qdrant https://github.com/qdrant/qdrant is written in Rust. Fast, stable, and super easy to deploy. (disclaimer. affiliated with the project).
I don't think that there's much preventing Elasticsearch or Solr from being as fast. Vector search is expensive. Therefore combining the approach with other filtering and querying makes a lot of sense. It's just another tool for search engineers to use. Probably using vector search exclusively would go at the cost of search quality and performance.
Exactly, filtering is vital. That is why Qdrant offers extended filtering, which happens at the same stage as retrieval, not as pre- or post-filtering, with two stages and an impact on performance and quality.
It is not trivial to implement vector search functionality inside a complex framework that is not initially designed for vectors. Besides that, vector search is not only about semantic search.
A much-awaited enhancement. Saves the trouble of having to deploy a separate vector DB like Milvus.
I don't like the query syntax though. Maybe a more developer-friendly indexing+query flow is possible. Vectorize fields and queries transparently using a lib like DL4J running in the same JVM. That can further simplify both app development and deployment.