Hacker News new | past | comments | ask | show | jobs | submit login

I bet I worked with the same company when I was with the government. We had a subcontractor who'd been hired to digitize something like 200 million paper records (they made it about 50 million in before we ran out of funding). But a small fraction of the TIFF files they generated wouldn't work with any of the tools we had on hand.

It turned out that Windows 98 shipped with an Imaging program (Licensed by MS, not written by them) which predated the standardization of the JPEG-in-TIFF subformat, but they'd basically guessed at how it would work and shipped that. The final spec (and the version of JPEG-in-TIFF nearly everyone else implemented) ended up being different. So basically nothing could read it.

We ended up calling them up every time a customer found one of these files and having them print out that image on one of their windows 98 machines, and scan the printout back in using one of the newer machines. Sure, we lost some quality, but at least the customers could access the data now.

For a time reference, these broken images were still showing up in newly scanned documents in 2011 (when we stopped working with them due to massive fraud), so they must have been using their Win98 scanner systems even then.




No, to the best we could determine, and we had a guy who liked to get into the weeds of CCITT Group 3 and 4 compression, it was the raw images themselves, and there was nothing wrong with them, some just tickled a bug. If I remember correctly, their API required stripping off the header and presenting the OCR code with some metadata and the compressed image. It's been way too long for me to remember the details, except that it was fairly obnoxious to interface to, I couldn't just hand it a TIFF in some way (helped us VARs really "add value" and earn our keep :-).

We were producing our own TIFF files using our own software that drove monster Kodak ImageLink scanners (software I in fact took over, redid the SCSI driver of, and eventually did a clean rewrite of the engine on Sun workstations), so the images and their compression came straight from Kodak, and going further, I don't recall those 600 pound beasts ever screwing up at that level.

And this was way before Windows 98, it was Windows 3.0 or by then 3.1, like in 1992, Windows was utterly naive about document image files. Which I can see was a blessing (although maybe it was losing quality, I'd long switched to NT by the time 98 came out).


We also had weird CCITT Group 4 issues, because of someone trying to be extra smart and convert TIFF to PDF without a recompress (PDF supports Group 4 compression too, so you can turn a Group4 TIFF into a Group4 PDF by just swapping the header!)

I didn't mean it was definitely the same company, just a similarly annoying TIFF issue.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: