Hacker News new | past | comments | ask | show | jobs | submit login

I tried doing this with some case studies in graduate school - any open sourced OCR is very difficult to deal with.

Of course, this was a couple years ago. Can anyone recommend a library or API that offers a decent OCR?




I just started a project of scanning in and OCR`ing old school news-papers. Tesseract [1] works very well. The result is almost always completely read-able, with a few obvious mistakes (that seem like could be reduced to almost none with a fairly simple post-processor). In terms of usability, it is a terminal program that is run as `tesseract srcFile.jpg dstFile`. It also has a list of gui front-ends on the site (none of which I have looked at).

[1] http://code.google.com/p/tesseract-ocr/


Cool - Tesseract was what I came across last week while working on one of my own projects. It's the only one I know of right now, and it's nice to hear someone confirm it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: