Hacker News new | past | comments | ask | show | jobs | submit login

But if you have a huge tax document, you're likely not going to screenshot page by page. Yes, there are ways to automate this. But if you're 50 year old divorce attorney, you're going to click on the "OCR" button in your PDF reader and it will not work.



You don’t have to screen shot every page… convert the PDF to a PNG/TIFF image for every page, and OCR those. This is very easy to automate. If this is working with Unicode code points, you’re not blocking OCR, you’re obfuscating text. Anything that renders the PDF to a raster format will produce an OCR-able document.

If you’re a divorce attorney who used this to convert documents in response to a discovery request, and the opposing side had a valid reason for needing the unobfuscated text, then you’re probably going to end up having a nice conversation with the judge about acceptable formats.

Sending compressed TIFFs would probably be just as good. A bit larger file sizes, but it would be just as effective as stopping automated scraping of text. Also, less likely to piss off a judge. Any opposing firm that would be sophisticated enough to automate scrapping the text from a normal PDF would be able to OCR these files just as easily.

Or maybe you have a second site that sells the decoder, so you get to sell to both sides. Not a bad business model, if you can work it.


I don't know why you think divorce attorneys are stupid. Some are probably very well versed in tech; those who aren't know others who are. They won't simply sit there and think "oh, for some reason I can't copy-paste from that PDF, better give up the case then".

... And most attorneys simply print documents. Once the PDF is on paper, OCR-ing it back into text is just one scanner away.


I understand that. And I understand your personal use case was valid. But I think your "Human Eyes Only" name and domain is a little deceptive.


I’m not sure what reader you are talking about, but that button is most certainly not doing any kind of OCR if your technique stops it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: