Without being drawn into a long protracted argument about this (this is my last post on this topic), the simple act of scanning the formatted page pre-OCR infringes the copyright of the publisher, if the publishers rights have not yet expired.
You can't OCR from nothing, at some point you need a direct copy, be it in memory or on disk, of thing being copied.
You might think so, but merely copying is infringement, at least in the United States.
There's even good solid precedent that copying computer software from disk to memory for the purpose of running it is a copy covered by copyright. A manufacturer sued a third-party computer maintenance company alleging copyright infringement by the technician in simply turning the machine on, and won.
There's even a specific carveout for software, section 117(a)(1), but the same decision held that since that section applies only to the "owner" of a copy, it doesn't apply to licensed software.
It's a pretty bonkers case. Congress explicitly overruled it by adding _another_ carveout in the same section, 117(c), which basically specifically says computer repair people don't infringe copyright in the OS by turning the computer on.
Regardless, mere copying with nothing else can be infringement.
You can't OCR from nothing, at some point you need a direct copy, be it in memory or on disk, of thing being copied.