For use in retrieval/RAG, an emerging paradigm is to not parse the PDF at all. B...

attilakun · 2024-07-30T11:50:44.000000Z

I do something similar in my file-renamer app (sort.photos if you want to check it out):

1. Render first 2 pages of PDF into a JPEG offline in the Mac app.

2. Upload JPEG to ChatGPT Vision and ask what would be a good file name for this.

It works surprisingly well.

qeternity · 2024-07-30T11:06:16.000000Z

I'm sure this will change over time, but I have yet to see an LMM that performs (on average) as well as decent text extraction pipelines.

Text embeddings for text also have much better recall in my tests.

infecto · 2024-07-30T11:49:44.000000Z

No multi-modal model is ready for that in reality. The accuracy from other tools to extract tables and text are far superior.

authorfly · 2024-07-30T13:07:16.000000Z

You have detractors, but this is the future.

cpursley · 2024-07-30T13:22:05.000000Z

Is anyone actually having success with this approach? If so, how and with what models (and prompts)?

distracted_boy · 2024-07-30T18:34:05.000000Z

Claude.ai handles tables very well, at least in my tests. It could easily convert a table from a financial document into a markdown table, among other things.