Hacker News new | past | comments | ask | show | jobs | submit login

Tesseract's true value is being one apt-get command away (i.e. opensource). Does Debian host more modern OCR systems in their repos?



Tesseract the tool is one apt-get away but the trained models are not, and I've found that they are a starting point, not a final destination. You still have to do more training on top of them for anything that isn't black text on a crisp white background.


Big mistake on my part; I should clarify I fine-tuned both PaddleOCR and TrOCR on large amounts of data specific to my domain. I cannot speak on the best out of the box “ready to go” solutions (besides cloud ones, which were quite good with the right pre and post processing).




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: