Hacker News new | past | comments | ask | show | jobs | submit login

Another partner and I came up with a similar solution. It hinged on detecting the typeface and using a bitmapped (or otherwise rendered) font package to OCR letter by letter.

The PDF files that we are dealing with do not have embedded text and are not searchable, but are "digital-native," to use the term that you suggested.

Does this not exist? If not, why does it not exist?!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: