I tried doing this with some case studies in graduate school - any open sourced ...

gizmo686 · on Oct 23, 2012

I just started a project of scanning in and OCR`ing old school news-papers. Tesseract [1] works very well. The result is almost always completely read-able, with a few obvious mistakes (that seem like could be reduced to almost none with a fairly simple post-processor). In terms of usability, it is a terminal program that is run as `tesseract srcFile.jpg dstFile`. It also has a list of gui front-ends on the site (none of which I have looked at).

[1] http://code.google.com/p/tesseract-ocr/

bduerst · on Oct 23, 2012

Cool - Tesseract was what I came across last week while working on one of my own projects. It's the only one I know of right now, and it's nice to hear someone confirm it.