Hacker News new | past | comments | ask | show | jobs | submit login

Hi! Thank you!

Tesseract is trained to only recognize text from images. I haven't looked into image detection yet though.

This project fits the situation where you need to digitize a bunch of physical copies / scans of documents. Sometimes these documents have images like company logos which would be useful to include in the final html page.

I'll try to take a look into it, it is a wonderful idea for a 2nd part. This current post is geared towards helping others transition into the world of data science with OCR by describing every step of the way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: