Could you turn this into a psql extension? If this is integrated into an actual database that can be used in production, this may have a future. Otherwise no one will touch this, and it’d be yet another useless and cute experiment from the academia.
edit: thank you for clarifying, it looks like this is not a new database engine and is a cache/query layer.
Thanks for the helpful suggestion! EVA uses an SQL database system for managing structured data using sqlalchemy. It runs on PostgreSQL out of the box. You only need to provide the database connection url in the EVA configuration file.
Thanks for your candid comment. We take it very seriously. EVA is already being used in production by some collaborators and we would love to support more early adopters :) Please let me know if I can DM you to get more feedback.
I’ve skimmed over the documentation and it wasn’t clear. It looked like the database was designed from scratch. If this is a caching/syntactic sugar over a mix of DB and inference queries, this is interesting and feels a lot less risky.
We designed EVA from scratch for managing unstructured data (e.g., video, audio, images, etc.). EVA leverages relational database systems to manage structured data and widely-used libraries to manage feature embeddings (FAISS library [1]). We aim to leverage decades of experience in relational database systems and reduce risk in production deployment.
Do you support weighted similarly search? I.e. when I have several embeddings and need to put a weight factor in front of the cosine similarity when I’m performing a query?
Faiss seems like an excellent choice. How do you get the vectors into it from the database? Or are they stored separately? I’m currently using pgvector and it’s not GPU optimized. But the advantage is that it enjoys the same levels of data protection as the rest of the database.
Actually, are there any vector similarity search query sample? I see the feature extractor, but can’t seem to find any similarity search samples.
EVA does not currently support a weighted similarity search. We are working on creating a notebook to illustrate similarity queries. But, EVA already supports the queries of this form:
-- Step 1: Extract objects in Reddit images using the YOLO object detector
CREATE TABLE reddit_dataset_object (name, data, bboxes)
AS SELECT name, data, labels FROM reddit_dataset
JOIN LATERAL UNNEST(YoloV5(data)) AS Obj(labels, bboxes, scores);
-- Step 2: Build index over features extracted using SIFT
CREATE INDEX reddit_sift_object_index
ON reddit_dataset_object (SiftFeatureExtractor(Crop(data, bboxes)))
USING HNSW;
-- Step 3: Retrieve the top 10 most similar images
SELECT id FROM reddit_sift_object_index
ORDER BY Similarity(SiftFeatureExtractor(Open(”“input_img_path.jpg”)),
SiftFeatureExtractor(data))
LIMIT 10;
EVA directly persists the feature vectors in a FAISS index. It does not use a relational database system for this purpose. FAISS supports retrieving the original vector through ID (required for similarity search).
We would love to jointly explore how to support such weighted similarity search queries. Please consider opening an issue with more details on your use case.
edit: thank you for clarifying, it looks like this is not a new database engine and is a cache/query layer.