it dosnt make sense to use ocr for this. libraries such as aspose will do much b...

iplaw · on Oct 13, 2016

Apologies for not being clear in my OP. The PDF is a digital-native image produced from a text document, but without embedded or searchable text. Looking at the PDF in full resolution, there are not artifacts, blurry characters, or any alignment or uneven scale issues that are troublesome when attempt to OCR a scan or photograph. It looks exactly like a Word document, but without selectable or editable text.

So, I do have to use OCR, right?

niutech · on Oct 17, 2016

Maybe not. The PDF probably has an embedded text (so it doesn't blur when zomming in) but it could be either cinverted into vector curves or protected from copying (see properties). The easiest way is to change the PDF export settings in Word/Ghostscript/Distiller.