I see. I thought you also wanted to stress the difference between "lab conditions" OCR (recognize rendered text, e.g. a PDF - not just necessarily websites) and the typical "OCR over the scans of old pages using real-fonts typography, with bleeding ink, stains etc".