In some use cases, like OCR, the accuracy of these guesses can be established in...

roughly · on March 13, 2021

I agree; I'd say two things in response, though:

1. However good the guess is, it's still just that: a guess. Taking the standard of "evidence in a murder case", the OCR can and probably should be used to point investigators in the right direction so they can go and collect more data, but it should not be considered sufficient as evidence itself.

2. OCR is a relatively constrained solution space - success in those conditions doesn't mean the same level of accuracy can or will be reached outside of that constrained space.

To be clear, though - I'm making a primarily epistemic argument, not one based on utility. There are a lot of areas for which these kind of machine guessing systems are of enormous utility, we just shouldn't confuse what they're doing with actual data collection.

sanj · on March 13, 2021

Unless they’re not: https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...

reanimus · on March 14, 2021

Did you read that article? That wasn't an OCR issue it was an image compression issue.

sanj · on March 14, 2021

I did, and I’m aware it wasn’t OCR that was the underlying problem.

But the issue manifests as characters being incorrectly identified because of an algo t

viraptor · on March 14, 2021

Same thing in a way. OCR does lossy compression from pixels to text. Both could do similar mistake for pretty similar reasons.

iujjkfjdkkdkf · on March 14, 2021

I'm not sure about the OCR example, but there are information / sampling theory limits on what can be discerned in an image, based on sampling rate (pixels basically) and optics. Any extrapolation outside these limits is proveably guessing.

Edit - re OCR do you mean e.g. from a picture of a blurred license plate we could rule in or out a subset of possible numbers, depending on how blurred, like a B could be a 8 but not a L? (And sorry if your example is unrelated). This is valid, and unrelated to super resolution, you can do this analysis with Nyquist and point spread functions.