The text detection is lacking in comparison to Google's Vision API. Here is a re...

bijection · on Oct 12, 2016

Although Google's API is certainly better, Tesseract.js should work similarly if you increase the font size. Screenshots taken on 'retina' devices are around the smallest text it can handle well.

Edit:

A screenshot of the same text at a higher resolution: https://imgur.com/a/W7IGu

Tesseract.js output: https://imgur.com/a/niIfM

"I am also a top professional on thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well"

jaytaylor · on Oct 12, 2016

Your comment (zoomed in Chrome on Win 10): http://i.imgur.com/uuFhw90.png

Tesseract.js analysis:

    Although Googie's API is certaihiy better,
    Tesseract.js should work simiiarly if you
    increase the font size.
    Screenshots taken
    on 'retiha’ devices are around the smailest
    text it can handie well.
    
    Edit:
    
    A screenshot of the same text at a higher
    resolution:
    httgs:[[imgurxomZaN/UGu
    
    Tesseract.js
    output: httgs://imguricom[a[hiIfM

This is a neat toy, but not impressive compared to the results from tesseract-ocr/tesseract [0]:

    $  curl -s http://i.imgur.com/uuFhw90.png \
        | tesseract stdin stdout

    Although Google's API is certainly better,
    Tesseract.js should work similarly if you
    increase the font size.
    Screenshots taken on 'retina' devices are
    around the smallest text it can handle well.
    
    Edit:
    A screenshot of the same text at a higher
    resolution: https:[ZimguncomlalWHGu
    Tesseract.js output:
    https:[[imgur.com[a[nilfM

Notice how Tesseract.js results suffer from being unable to differentiate between n's and h's, i's and l's.

[0] https://github.com/tesseract-ocr/tesseract

bijection · on Oct 12, 2016

That's interesting! Given that Tesseract.js wraps an Emscripten'd copy of Tesseract, I would have expected close to identical performance. This might have to do with the way we threshold images, with the age of the tesseract version we're using, or both. I'll look into it!

Edit: In addition to those differences, I think your font size is still a bit too small. On an unedited screenshot from a macbook (https://i.imgur.com/iv4ZdSt.png) I get

  Although Google's API is certainly better, Tesseract.js should work similarly if you increase the font size. Screenshots taken on 'retina' devices are around the smallest text it can handle well.

  Edit:

  A screenshot of the same text at a higher resolution: https:[[imgur.com[a[W7IGu

  Tesseract.js output: https:[[imgur.com[a[niIfM

  “I am also a top professional on thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well"

jaytaylor · on Oct 12, 2016

What do you mean by "emscripten'd"? Is Tesseract.js using emscripten to effectively bundle the 150KLOC of C/C++ from tesseract-ocr and the upstream dependency on leptonica [0]? If so, that's amazing!

    > This might have to do with the way we threshold images,
    > with the age of the tesseract version we're using, or
    > both. I'll look into it!

I'd be very interested to hear about what is required to make it "match" native functionality. Please do drop me a line if/when you get it figured out! (I'm @jtaylor on twitter [1])

[0] http://www.leptonica.org/

[1] https://twitter.com/jtaylor

mumphster · on Oct 12, 2016

Here's the repo of their build process for the core tesseract emscripten build

https://github.com/naptha/tesseract-emscripten/blob/master/j... specifically the line for lepton

brazzledazzle · on Oct 12, 2016

Hey just a reminder that this is a ShowHN which has its own rules. Honesty is okay but aim to be respectful.

oriolid · on Oct 12, 2016

I've used Tesseract for a production OCR project some years ago, and can confirm too that it just doesn't work at screen resolution. On the other hand, performance on high DPI photos was quite OK. Google Vision API wasn't around at that time so I can't compare.

ajacksified · on Oct 12, 2016

I spent many afternoons trying to get tesseract to read Dwarf Fortress screenshots, such as http://i.imgur.com/32vVhnH.png - including much pre-processing, such as converting the text to black and white. Alas, I never even got close.

Edit: just tried Google's, and it had one mistake for that entire file. That's pretty impressive.

jackhack · on Oct 12, 2016

Since we're trading "I failed" stories...

I spent a weekend tying Tesseract together with Tekkotsu (the amazing open framework for the Sony AIBO) in an attempt to teach my robot dog to read. The eventual goal being to hook up the output of OCR --> Text To Speech (TTS) and have him read to me.

Alas, the low resolution of the camera was an insurmountable problem. Poor Aibo needed 40-point fonts and I practically had to rub his nose in the book. Not exactly the user experience I was aiming for.

Never got around to the TTS part.

Accacin · on Oct 13, 2016

I completely missed the word "robot" then and was pretty impressed that you wanted to teach your dog to read.

chris_wot · on Oct 13, 2016

I'm trying to fathom how you thought he loaded Tessaract into a live dog.

aargh_aargh · on Oct 14, 2016

There are two openings. One accepts anything, really. It's dangerous to try to use the other one, though.

zo1 · on Oct 12, 2016

If you're reading screenshots of a non-changing font, you could quite easily get away with plain template-matching. Simply do a run through and label the data one time.

I did something like that a few years ago when making an Eve-Online UI scraper.

bijection · on Oct 12, 2016

Upscaling worked ok for me:

Upscaled image: https://imgur.com/a/4IQA7

Result on demo page: http://imgur.com/a/A0v5C

The hammerman Tikes ﬂsosushsath: Greetings. My name is Tikes Leafsilk.

You: Rh. hello. I'm Stasbo Murderknower the Craterous Trance of Fins. Don't travel alone at night. or the bogeyman will get you.

You: Tell me about this hall.

Tikes: This is The ﬂccidental Ualley. In 123, Stasho Steamdances ruled from The ﬂccidental Ualley of The Council of Cobras in Ueilapes.

...

xigency · on Oct 12, 2016

This seems like a font issue. Would training the model for this console font help?

doctorcroc · on Oct 12, 2016

Google Cloud Vision API is very expensive though -- if you can sacrifice some amount of quality, it might make sense to go with the open source alternative. At $2.50/unit the cost is absurd, and even the free trial expires after 90 days.

Huppie · on Oct 12, 2016

I just checked because I couldn't believe that price.

"Price per 1000 units. Unit volumes are based on monthly usage." It's $2.50 per 1000 units, so 0.25 cents per unit.

Edit: And, according to the pricing page [1] the first 1000 units are free.

[1]: https://cloud.google.com/vision/pricing

ben174 · on Oct 12, 2016

Note that's $0.0025 per unit, not 25 cents per unit. Your post is correct, but, ya know, Verizon math is a real thing.

doctorcroc · on Oct 12, 2016

My mistake -- thanks for pointing that out. I think I was in shock, and didn't take a closer look.

noteone · on Oct 12, 2016

This is the correct pricing. One caveat: Even the free tier requires you to enter a credit card.

For higher volumes, there is also the OCR.space api. It offers 25,000 free conversions per month. It is not as good as Google, but works fine on screenshots.

driverdan · on Oct 12, 2016

We spent a decent amount of time evaluating Abbyy vs Tesseract vs Cloud Vision. Cloud Vision wins hands down and is very reasonably priced.

emmelaich · on Oct 13, 2016

There is a Chrome Extension for Cloud Vision which works well and seems to be free.

Written by a Google employee, see top link at http://www.imjasonh.com/projects

Omnipresent · on Oct 12, 2016

How do you test sample images with google's vision api? Do you have to sign up for a 90 day trial or do they allow you to upload images for trial.