Another partner and I came up with a similar solution. It hinged on detecting the typeface and using a bitmapped (or otherwise rendered) font package to OCR letter by letter.
The PDF files that we are dealing with do not have embedded text and are not searchable, but are "digital-native," to use the term that you suggested.
Does this not exist? If not, why does it not exist?!
The PDF files that we are dealing with do not have embedded text and are not searchable, but are "digital-native," to use the term that you suggested.
Does this not exist? If not, why does it not exist?!