Hacker News new | past | comments | ask | show | jobs | submit login

The text detection is lacking in comparison to Google's Vision API. Here is a real-life comparison between Tesseract and Google's Vision API, based on a PDF a user of our website uploaded.

Original text [http://i.imgur.com/CZGhKhn.png]:

> I am also a top professional on Thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well

Google detects [http://i.imgur.com/pSJym1x.png]:

> “ I am also a top professional on Thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well ”

Tesseract detects [http://i.imgur.com/wwbLU6g.png]:

> \ am also a mp pmfesslonzl on Thummack wmcn Is a sue 1m peop‘e \ookmg (or professmna‘ semces We on glg salad P‘ezse see my rewews 1mm my cuems were as weH




Although Google's API is certainly better, Tesseract.js should work similarly if you increase the font size. Screenshots taken on 'retina' devices are around the smallest text it can handle well.

Edit:

A screenshot of the same text at a higher resolution: https://imgur.com/a/W7IGu

Tesseract.js output: https://imgur.com/a/niIfM

"I am also a top professional on thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well"


Your comment (zoomed in Chrome on Win 10): http://i.imgur.com/uuFhw90.png

Tesseract.js analysis:

    Although Googie's API is certaihiy better,
    Tesseract.js should work simiiarly if you
    increase the font size.
    Screenshots taken
    on 'retiha’ devices are around the smailest
    text it can handie well.
    
    Edit:
    
    A screenshot of the same text at a higher
    resolution:
    httgs:[[imgurxomZaN/UGu
    
    Tesseract.js
    output: httgs://imguricom[a[hiIfM
This is a neat toy, but not impressive compared to the results from tesseract-ocr/tesseract [0]:

    $  curl -s http://i.imgur.com/uuFhw90.png \
        | tesseract stdin stdout

    Although Google's API is certainly better,
    Tesseract.js should work similarly if you
    increase the font size.
    Screenshots taken on 'retina' devices are
    around the smallest text it can handle well.
    
    Edit:
    A screenshot of the same text at a higher
    resolution: https:[ZimguncomlalWHGu
    Tesseract.js output:
    https:[[imgur.com[a[nilfM
Notice how Tesseract.js results suffer from being unable to differentiate between n's and h's, i's and l's.

[0] https://github.com/tesseract-ocr/tesseract


That's interesting! Given that Tesseract.js wraps an Emscripten'd copy of Tesseract, I would have expected close to identical performance. This might have to do with the way we threshold images, with the age of the tesseract version we're using, or both. I'll look into it!

Edit: In addition to those differences, I think your font size is still a bit too small. On an unedited screenshot from a macbook (https://i.imgur.com/iv4ZdSt.png) I get

  Although Google's API is certainly better, Tesseract.js should work similarly if you increase the font size. Screenshots taken on 'retina' devices are around the smallest text it can handle well.

  Edit:

  A screenshot of the same text at a higher resolution: https:[[imgur.com[a[W7IGu

  Tesseract.js output: https:[[imgur.com[a[niIfM

  “I am also a top professional on thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well"


What do you mean by "emscripten'd"? Is Tesseract.js using emscripten to effectively bundle the 150KLOC of C/C++ from tesseract-ocr and the upstream dependency on leptonica [0]? If so, that's amazing!

    > This might have to do with the way we threshold images,
    > with the age of the tesseract version we're using, or
    > both. I'll look into it!
I'd be very interested to hear about what is required to make it "match" native functionality. Please do drop me a line if/when you get it figured out! (I'm @jtaylor on twitter [1])

[0] http://www.leptonica.org/

[1] https://twitter.com/jtaylor


Here's the repo of their build process for the core tesseract emscripten build

https://github.com/naptha/tesseract-emscripten/blob/master/j... specifically the line for lepton


Hey just a reminder that this is a ShowHN which has its own rules. Honesty is okay but aim to be respectful.


I've used Tesseract for a production OCR project some years ago, and can confirm too that it just doesn't work at screen resolution. On the other hand, performance on high DPI photos was quite OK. Google Vision API wasn't around at that time so I can't compare.


I spent many afternoons trying to get tesseract to read Dwarf Fortress screenshots, such as http://i.imgur.com/32vVhnH.png - including much pre-processing, such as converting the text to black and white. Alas, I never even got close.

Edit: just tried Google's, and it had one mistake for that entire file. That's pretty impressive.


Since we're trading "I failed" stories...

I spent a weekend tying Tesseract together with Tekkotsu (the amazing open framework for the Sony AIBO) in an attempt to teach my robot dog to read. The eventual goal being to hook up the output of OCR --> Text To Speech (TTS) and have him read to me.

Alas, the low resolution of the camera was an insurmountable problem. Poor Aibo needed 40-point fonts and I practically had to rub his nose in the book. Not exactly the user experience I was aiming for.

Never got around to the TTS part.


I completely missed the word "robot" then and was pretty impressed that you wanted to teach your dog to read.


I'm trying to fathom how you thought he loaded Tessaract into a live dog.


There are two openings. One accepts anything, really. It's dangerous to try to use the other one, though.


If you're reading screenshots of a non-changing font, you could quite easily get away with plain template-matching. Simply do a run through and label the data one time.

I did something like that a few years ago when making an Eve-Online UI scraper.


Upscaling worked ok for me:

Upscaled image: https://imgur.com/a/4IQA7

Result on demo page: http://imgur.com/a/A0v5C

The hammerman Tikes flsosushsath: Greetings. My name is Tikes Leafsilk.

You: Rh. hello. I'm Stasbo Murderknower the Craterous Trance of Fins. Don't travel alone at night. or the bogeyman will get you.

You: Tell me about this hall.

Tikes: This is The flccidental Ualley. In 123, Stasho Steamdances ruled from The flccidental Ualley of The Council of Cobras in Ueilapes.

...


This seems like a font issue. Would training the model for this console font help?


Google Cloud Vision API is very expensive though -- if you can sacrifice some amount of quality, it might make sense to go with the open source alternative. At $2.50/unit the cost is absurd, and even the free trial expires after 90 days.


I just checked because I couldn't believe that price.

"Price per 1000 units. Unit volumes are based on monthly usage." It's $2.50 per 1000 units, so 0.25 cents per unit.

Edit: And, according to the pricing page [1] the first 1000 units are free.

[1]: https://cloud.google.com/vision/pricing


Note that's $0.0025 per unit, not 25 cents per unit. Your post is correct, but, ya know, Verizon math is a real thing.


My mistake -- thanks for pointing that out. I think I was in shock, and didn't take a closer look.


This is the correct pricing. One caveat: Even the free tier requires you to enter a credit card.

For higher volumes, there is also the OCR.space api. It offers 25,000 free conversions per month. It is not as good as Google, but works fine on screenshots.


We spent a decent amount of time evaluating Abbyy vs Tesseract vs Cloud Vision. Cloud Vision wins hands down and is very reasonably priced.


There is a Chrome Extension for Cloud Vision which works well and seems to be free.

Written by a Google employee, see top link at http://www.imjasonh.com/projects


How do you test sample images with google's vision api? Do you have to sign up for a 90 day trial or do they allow you to upload images for trial.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: