Show HN: Gogosseract, a Go Lib for CGo-Free Tesseract OCR via Wazero

yklcs · on Nov 5, 2023

I wrote a short blog post[1] on this method a while ago. I do think running WASM in embedded runtimes is a pretty good option, but overhead remains high, and WASI remains somewhat fragmented between compilers and runtimes.

I think this method really shines in Go as not having CGo simplifies a lot of things, and as a decently performant JITed runtime exists in the form of wazero.

[1]: https://yklcs.com/blog/universal-libs-with-wasm

iampims · on Nov 5, 2023

To me, this is the real value of Wasm: platform independent libraries with a standard interface that doesn’t require C.

slimsag · on Nov 5, 2023

WASM runtimes miss out on a _lot_ of optimizations that a battle-tested C compiler will perform, and sometimes requires machine emulation (e.g. Go compiled to WASM results in a virtual machine/emulation layer to run Go code.)

It can work, but it's not the fastest thing in the world.

I think languages that make working with C/C++ code much more seamless, e.g. as nice as working with Go code can be, is a better approach. Zig does this well and feels quite natural coming from Go. It can also be used to make CGO cross compilation 'just work' and alleviate many of those pains.

dmos62 · on Nov 6, 2023

I feel like inefficient but convenient has been the default trade-off in so many places during the last couple of decades. WASM is opening the doors for all kinds of new solutions. I wonder what kind of cultures will develop around it, as regards efficiency.

iampims · on Nov 5, 2023

Yes, Zig is best in class for C-interoperability.

Go’s FFI support is alright, but I find using WASM/WASI more pleasant.

richieartoul · on Nov 5, 2023

This is awesome and one of the things I’m really excited about with WASM, and specifically Wazero. The Wazero team is top notch. Now someone just needs to do this with zstd and make it fast…

mappu · on Nov 5, 2023

There's a pure-go zstd at https://github.com/klauspost/compress - it's likely faster than running the upstream zstd under Wazero.

anuraaga · on Nov 6, 2023

Just for reference I did give it a try

https://github.com/wasilibs/go-zstd

Mostly since I hadn't found `compress` supports zstd. Wazero performed reasonably well against the cgo library but was indeed much slower than this proper pure go port.

mappu · on Nov 5, 2023

Another really interesting way to approach this problem would be to adapt wasm2c to emit Go output. It should result in better performance than wazero.

dlock17 · on Nov 5, 2023

You mean this? https://github.com/WebAssembly/wabt/blob/main/wasm2c/README....

That seems like quite an undertaking. But at that point, It would make sense to cut out WASM entirely like https://datastation.multiprocess.io/blog/2022-05-12-sqlite-i...

ncruces · on Nov 5, 2023

Disclosure: I'm working on alternative Cgo-less bindings for SQLite, using wazero.

https://github.com/ncruces/go-sqlite3

One of the problems of the modernc approach (IMO) is that they're not just transpiling CPU/compute stuff, but entirely OS/platform stuff.

Each Go file of theirs is a xxx_os_arch.go that starts with 100s of OS-#defines-as-consts, and goes on to transpile fully #ifdefed code.

It also implements antithetical (in Go) stuff like goroutine local storage, because libc pthreads can't live without it.

And all IO is via direct syscalls that will never play nice with the Go scheduler, because again, this is OS level stuff.

WASM defines a cross platform CPU and an ABI, and using that for compute and the bottom OS layer in Go you get (IMO) a nicer end result.

Given the hard task of generating decent code from WASM at load time (wazero's compiler is pretty naive, a better one is being developed, but it will take seconds to generate good code for anything non trivial like SQLite) I wouldn't mind having a solution that translated to Go, or Go ASM, at build time.

donatj · on Nov 5, 2023

Oh awesome. I was really hoping a native OCR would pop up but this really is the next best thing and a more realistic avenue.

dlock17 · on Nov 5, 2023

Exactly, I expected to find one but couldn't, so I put together my own. It's not the fastest, but it'll do for my purposes.

tommiegannert · on Nov 5, 2023

Thanks for sharing!

Since OCR is a somewhat slow process, how does the WASM approach compare to running libtesseract in a subprocess and use some IPC layer to talk to Go? It would require a separate C++ compiler, but not CGo.

> one of the largest Open Source OCR

Tangential, but are there others as large as Tesseract? It seems to pop up anywhere I look.

layer8 · on Nov 5, 2023

> Tangential, but are there others as large as Tesseract?

The one serious competition is PaddleOCR, which is faster on GPU, and also works better for Chinese and other non-Western scripts.

There are some newer ML-based projects like DocTR that have been catching up, at least for some use cases.

dlock17 · on Nov 5, 2023

My intentions was a "pure Go" approach, but that is probably more performant.

I imagine just calling the Tesseract CLI from Go would be simplest if that's all you wanted.

abdullahkhalids · on Nov 5, 2023

Is Tesseract currently the best open source OCR library? Best in terms of accuracy.

How much difference is there between Tesseract and the best proprietary solutions?

ianhawes · on Nov 5, 2023

Tesseract is the current best open source OCR library.

When looking at the “best” prop solution, there are a few worth mentioning:

- If you are looking for the best OCR to DOCX solution, ABBYY OCR SDK is the front runner. Their OCR engine is not AS accurate as others I’ll mention, but their output engine (I.e. taking data beyond just the character, like bold or underlined or font name) is probably the best in the market.

- Google Document AI/Cloud Vision is probably the best all-around OCR. The 2 flavors determine whether you want to handle scanned PDFs/images (DocAI) or generalized photos (Cloud Vision). I believe they also have some level of training capabilities via Vertex but I haven’t checked it out.

- IRIS OCR.. Meh

- AWS Textract and Azure Vision are worth mentioning as contenders, but just like Google Document AI, they’re cloud based and that may factor into your decision.

- I haven’t tried DocTR or Paddle OCR

abdullahkhalids · on Nov 5, 2023

Thanks for the detailed answer.

honkotime · on Nov 5, 2023

It mentions that this is a rewrite of gosseract, however it is not a drop in replacement, so its more of a separate library in my opinion

dlock17 · on Nov 5, 2023

Technically I said reimplementation. But you are right in that it's not supposed to be a drop in replacement at all.

The only feature missing right now is Bounding Box detection, which I plan to add in the future.

technics256 · on Nov 5, 2023

Off topic but in general how does something like this compare to cloud hosted ocr solutions?

layer8 · on Nov 5, 2023

Tesseract is worse than most commercial solutions, and/or requires more pre- and postprocessing.

breadchris · on Nov 5, 2023

this is sick