If Google is using those pictures as captchas, it is probably because they could not write an algorithm to decipher them.
If, in the worst case, this results in spammers creating even better pattern recognition algorithms than the current cutting edge, it is certainly worth all the effort.
User responses to the pictures are not checked for correctness. Every time you see a reCAPTCHA, you can just enter a garbage value for the word that is less distorted and the system will accept it.
It's not always the less distorted one. As for data integrity, presumably Google shops each word out to multiple users so over time they get an idea of what the proper response is.
exactly, if they do it 100 times, they should get a very pointed distribution and they should know which distributions are hard for humans to read based on the way the skewness is and the approximate location of the number in their GPS database.
They could get a ton right and very, very few wrong with this system.
The real question is, how does their system identify numbers in the photo without actually knowing what the numbers are?
I'd guess that they can detect that it's a house number (look for oval shapes, or stuff that looks like characters in the usual spot on a house, etc.) but not be highly confident what the exact number depicted is.
Are you suggesting that detecting text from these mediocre-at-best photos is easy or that it's easy to determine which of the two sides is the control photo?
I'm suggesting that it's easier to detect text from these "mediocre at best" photos. Classifying objects in a visual space often deal with low quality images. In fact, the algorithms have to deal with the objects they're trying to classify at many different resolutions. While captchas are generally hard to identify even for humans, because they're contorted and confusing, the numbers in these addresses are all standardized numbers in grainy environments.
I would even go as far to argue that if these become widely used, we're going to see algorithmic "solvers" for this captcha in a matter of weeks.
Maybe Google is just crowd sourcing an algorithm for this problem! Then they will move onto another of the image recognition problems they have, and they get the spammers to make the data improving algorithms...
I don't really understand that response. What I'm saying is that if these addresses become one of the actually-checked human-verifiers, it will be easier to circumvent, because those numbers look comparably easier for a virtual classifier to evaluate.
The house numbers are easier for a computer to classify than the messy, weird contorted letters.