The text detection is lacking in comparison to Google's Vision API. Here is a real-life comparison between Tesseract and Google's Vision API, based on a PDF a user of our website uploaded.
> I am also a top professional on Thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well
> “ I am also a top professional on Thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well ”
> \ am also a mp pmfesslonzl on Thummack wmcn Is a sue 1m peop‘e \ookmg (or professmna‘
semces We on glg salad P‘ezse see my rewews 1mm my cuems were as weH
Although Google's API is certainly better, Tesseract.js should work similarly if you increase the font size. Screenshots taken on 'retina' devices are around the smallest text it can handle well.
"I am also a top professional on thumbtack which is a site for people looking for professional
services like on gig salad. Please see my reviews from my clients there as well"
Although Googie's API is certaihiy better,
Tesseract.js should work simiiarly if you
increase the font size.
Screenshots taken
on 'retiha’ devices are around the smailest
text it can handie well.
Edit:
A screenshot of the same text at a higher
resolution:
httgs:[[imgurxomZaN/UGu
Tesseract.js
output: httgs://imguricom[a[hiIfM
This is a neat toy, but not impressive compared to the results from tesseract-ocr/tesseract [0]:
$ curl -s http://i.imgur.com/uuFhw90.png \
| tesseract stdin stdout
Although Google's API is certainly better,
Tesseract.js should work similarly if you
increase the font size.
Screenshots taken on 'retina' devices are
around the smallest text it can handle well.
Edit:
A screenshot of the same text at a higher
resolution: https:[ZimguncomlalWHGu
Tesseract.js output:
https:[[imgur.com[a[nilfM
Notice how Tesseract.js results suffer from being unable to differentiate between n's and h's, i's and l's.
That's interesting! Given that Tesseract.js wraps an Emscripten'd copy of Tesseract, I would have expected close to identical performance. This might have to do with the way we threshold images, with the age of the tesseract version we're using, or both. I'll look into it!
Edit: In addition to those differences, I think your font size is still a bit too small. On an unedited screenshot from a macbook (https://i.imgur.com/iv4ZdSt.png) I get
Although Google's API is certainly better, Tesseract.js should work similarly if you increase the font size. Screenshots taken on 'retina' devices are around the smallest text it can handle well.
Edit:
A screenshot of the same text at a higher resolution: https:[[imgur.com[a[W7IGu
Tesseract.js output: https:[[imgur.com[a[niIfM
“I am also a top professional on thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well"
What do you mean by "emscripten'd"? Is Tesseract.js using emscripten to effectively bundle the 150KLOC of C/C++ from tesseract-ocr and the upstream dependency on leptonica [0]? If so, that's amazing!
> This might have to do with the way we threshold images,
> with the age of the tesseract version we're using, or
> both. I'll look into it!
I'd be very interested to hear about what is required to make it "match" native functionality. Please do drop me a line if/when you get it figured out! (I'm @jtaylor on twitter [1])
I've used Tesseract for a production OCR project some years ago, and can confirm too that it just doesn't work at screen resolution. On the other hand, performance on high DPI photos was quite OK. Google Vision API wasn't around at that time so I can't compare.
I spent many afternoons trying to get tesseract to read Dwarf Fortress screenshots, such as http://i.imgur.com/32vVhnH.png - including much pre-processing, such as converting the text to black and white. Alas, I never even got close.
Edit: just tried Google's, and it had one mistake for that entire file. That's pretty impressive.
I spent a weekend tying Tesseract together with Tekkotsu (the amazing open framework for the Sony AIBO) in an attempt to teach my robot dog to read. The eventual goal being to hook up the output of OCR --> Text To Speech (TTS) and have him read to me.
Alas, the low resolution of the camera was an insurmountable problem. Poor Aibo needed 40-point fonts and I practically had to rub his nose in the book. Not exactly the user experience I was aiming for.
If you're reading screenshots of a non-changing font, you could quite easily get away with plain template-matching. Simply do a run through and label the data one time.
I did something like that a few years ago when making an Eve-Online UI scraper.
Google Cloud Vision API is very expensive though -- if you can sacrifice some amount of quality, it might make sense to go with the open source alternative. At $2.50/unit the cost is absurd, and even the free trial expires after 90 days.
This is the correct pricing. One caveat: Even the free tier requires you to enter a credit card.
For higher volumes, there is also the OCR.space api. It offers 25,000 free conversions per month. It is not as good as Google, but works fine on screenshots.
Original text [http://i.imgur.com/CZGhKhn.png]:
> I am also a top professional on Thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well
Google detects [http://i.imgur.com/pSJym1x.png]:
> “ I am also a top professional on Thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well ”
Tesseract detects [http://i.imgur.com/wwbLU6g.png]:
> \ am also a mp pmfesslonzl on Thummack wmcn Is a sue 1m peop‘e \ookmg (or professmna‘ semces We on glg salad P‘ezse see my rewews 1mm my cuems were as weH