Wow, this post really took off! If anyone wants to read some of my blog posts on building PDF search engines (and the pain, torment and anguish that it causes) read:
Great stuff, I went down the rabbit hole of building something similar for synthesizing flash cards + Q/A pairs from textbook PDFs about a year ago, and I would also emphasize that PDF search is a janky nightmare to get within the ballpark of usability :')
I feel your pain my brother(?) [0] in suffering. That's why I started simple in the notebook. Even trying to go a little more complex just leads to exponential rabbit holes and footguns.
[0] based on typical HN demographics, no assumptions here
- https://medium.com/jina-ai/building-an-ai-powered-pdf-search...
- https://medium.com/jina-ai/search-pdfs-with-ai-and-python-pa...
- https://medium.com/jina-ai/search-pdfs-with-ai-and-python-pa...