Yes I periodically try to get scanned images of Medieval Latin and Hebrew books ocr’d and translated by Gemini and ChatGPT… sometimes the results are amazing, but you have to proofread it all because they occasionally go off the rails. They will either skip sentences, or start regurgitating sentences from another similar text that they must have been trained on. Sometimes, after helping me with several pages, Gemini will suddenly decide to announce “I’m just an LLM, and I can’t process images”, and I have to encourage it to try anyway. It’s strange. Still overall a time saver.
As for segmenting the images (header/footer/table/main text) I’ve been using Abbyy and it’s generally pretty good at it. It unfortunately often fails at footnotes in much the same way as described in the post, so it won’t get you past that hurdle.
As for segmenting the images (header/footer/table/main text) I’ve been using Abbyy and it’s generally pretty good at it. It unfortunately often fails at footnotes in much the same way as described in the post, so it won’t get you past that hurdle.