Hacker News new | past | comments | ask | show | jobs | submit login

MiniCPM-V 2.6 is based on Qwen 2 and is also great at handwriting. It works locally with KoboldCPP. Here are the results I got with a test I just did.

Image:

* https://imgur.com/wg0kdQK

Output:

* https://pastebin.com/RKvYQasi

OCR script used:

* https://github.com/jabberjabberjabber/LLMOCR/blob/main/llmoc...

Model weights: MiniCPM-V-2_6-Q6_K_L.gguf, mmproj-MiniCPM-V-2_6-f16.gguf

Inference:

* https://github.com/LostRuins/koboldcpp/releases/tag/v1.75.2




Should the line "p.o. 5rd w/ new W5 533" say "p.o. 3rd w/ new WW 5W .533R"?

What does p.o. stand for? I can't make out the first letter. It looks more like the f, but the nodge on the upper left only fits the p. All the other p's look very different though.


'Replaced R436, R430 emitter resistors on right-channel power output board with new wire-wound 5watt .33ohm 5% with ceramic lead insulators'


Thx :). I thought the 3 looked like a b but didn't think brd would make any sense. My reasoning has led me astray.


Yeah. If you realize that a large part of the llm's 'ocr' is guessing due to context (token prediction) and not actually recognizing the characters exactly, you can see that it is indeed pretty impressive because the log it is reading uses pretty unique terminology that it couldn't know from training.


I'd say as an llm it should know this kind of stuff from training, contrary to me, for whom this is out of domain data. Anyhow I don't think the AI did a great job on that line. Would require better performance for it to be useful for me. I think larger models might actually be better at this than I am, which would be very useful.


Be aware that a lot of this also has to do with prompting and sampler settings. For instance changing the prompt from 'write the text on the image verbatim' to something like 'this is an electronics repair log using shorthand...' and being specific about it will give the LLM context in which to make decisions about characters and words.


Thanks for the hint. Will try the out!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: