When the PDF is very clearly structured it's working just fine. But let's say th... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

2big2fail_47 on Sept 11, 2023 | parent | context | favorite | on: Ask HN: Are there any AI-enhanced solutions for co...

When the PDF is very clearly structured it's working just fine. But let's say the layout consists of multiple columns and complex formatting the output gets very imprecise. If the material is scanned it won't function at all.

svennek on Sept 12, 2023 [–]

That is really hard, as there are no such things as columns in PDFs, only text starting at different (x,y) positions.

Hence most (if not all) programs export the text in the order they appear in the file.

And if it is scanned, there is no text at all (but you could OCR it).

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact