I really want to see OCR become easier to use, but I don't know why it's such a hard problem in the first place.
I believe it uses tesseract, ghostscript and some other libraries.
Speaking of ghostscript, one way to deal with problematic PDFs is to print them to file and deal with the result instead.
I'd love to just be able to search a PDF document for a string and get a list of results.
I really want to see OCR become easier to use, but I don't know why it's such a hard problem in the first place.