I might be missing something but how is this different to amazon opensearch with ultrawarm storage? I think amazon launched this about 4 years ago right?
For one thing, it claims to support Azure Blob Storage (and GCS), so this project could be applicable for audiences who are not in AWS to take advantage of whatever markup AWS is charging for ultrawarm storage
Marqo lets you use state of the art e5 embeddings (which are significantly more performant in retrieval than the openai embeddings), and will handle the embedding generation and retrieval on lucene indexes: https://www.marqo.ai/
marqo.ai has excellent indexing throughput as vector generation and vector retrieval are both contained within a marqo cluster. You can use it with multi-gpu, cpu, etc. It's also horizontally scalable.
Just to quickly add to ukuina's comment, marqo.ai does embedding generation and vector search end to end, so you can put in documents and the embeddings are automatically generated.
Marqo provides automatic, configurable chunking (for example with overlap) and can allow you to bring your own model or choose from a wide range of opensource models. I think e5-large would be a good one to try. https://github.com/marqo-ai/marqo
Using Qdrant doesn’t require docker, like Marqo does (from the README). Any trade offs between the two? Doc chunking is an independent functionality and there are already line that help chunk with overlap etc,and also it’s not hard to roll your own.
Someone from Marqo here - if you're looking for an end-to-end vector search DB that handles vector search and transformation you should check out marqo. https://github.com/marqo-ai/marqo