I just tried this on all the papers I downloaded over the past couple months - c...

alexcg1 · on July 25, 2022

In terms of matching embeddings and performing similarity search on text/images - folks are already using the framework (Jina) for that and getting decent results.

In terms of processing the PDFs and extracting that data. idk. That depends on a lot of factors - e.g. do you need to OCR the PDFs or can just extract text directly? Either way, should be possible to write a module and then easily scale it up (Jina supports shards/replicas). Anyway, lemme know. I'm in talks with folks about this kind of shitshow...uh...use case now.

Jina supports multiple vector database backends, like Weaviate, Qdrant and others. For others (like Milvus), suggest you ask on the Slack [0] - responses tend to be fast.

[0] https://slack.jina.ai

redskyluan · on July 26, 2022

We should probably try to implement a PDF search demo on top of Milvus.. LOL