What do you mean by "emscripten'd"? Is Tesseract.js using emscripten to effectively bundle the 150KLOC of C/C++ from tesseract-ocr and the upstream dependency on leptonica [0]? If so, that's amazing!
> This might have to do with the way we threshold images,
> with the age of the tesseract version we're using, or
> both. I'll look into it!
I'd be very interested to hear about what is required to make it "match" native functionality. Please do drop me a line if/when you get it figured out! (I'm @jtaylor on twitter [1])
[0] http://www.leptonica.org/
[1] https://twitter.com/jtaylor