I'm really not a fan of captchas in general and Recaptcha in particular. Captchas are user hostile enough, benefiting the service provider at the user's expense. Recaptcha is worse because it is even more of a drag on the user and now also benefits Google.
I run across a lot of captchas where I would be very surprised if the site owner actually had a problem being addressed by the captcha.
You'd be surprised. With automated posting tools, the cost of probing many different sites and posting massive amounts of spam (in an effort to raise search engine rankings) is very low indeed. I've run several small forums, and without fail, eventually they'd be overrun with spam and I'd have to either 1) install authentication or 2) install a captcha.
I agree that captchas are user-hostile. But, when weighed against the alternative - a forum or comments section so filled with spam and nonsense as to be unusable - captchas are by far the lesser of two evils.
As a developer, I agree that captchas can be a drag on users, and I think we should put serious thought into whether or not their UX-cost is justified. Personally, I try to avoid using them on sites I build.
It's a clever way to digitize books (and now street signs) while keeping out spambots, and when I combine that knowledge with my affinity for Luis von Ahn as a person I find myself less annoyed every time I have to prove I'm not a robot.
> It's a clever way to digitize books (and now street signs)
It used to be a cool idea when it benefited everyone, because it was used to ditigize public domain books that everyone could retrieve. Now, it's only a clever way for Google to use your brain to do stuff for their own purposes.
I think everybody might be looking at this backwards. Google already has very precise gps-correlated maps and street view data, it doesn't seem like they would need our help to determine street addresses.
Recaptcha pairs a known with an unknown to verify, maybe the addresses are the known factor.
Usually if I input an address into Google maps, it places a marker somewhere near the actual address. If I use street view it seems to point me in a random direction and usually not toward the address I wanted. I think this is aimed at that.
If Google is using those pictures as captchas, it is probably because they could not write an algorithm to decipher them.
If, in the worst case, this results in spammers creating even better pattern recognition algorithms than the current cutting edge, it is certainly worth all the effort.
User responses to the pictures are not checked for correctness. Every time you see a reCAPTCHA, you can just enter a garbage value for the word that is less distorted and the system will accept it.
It's not always the less distorted one. As for data integrity, presumably Google shops each word out to multiple users so over time they get an idea of what the proper response is.
exactly, if they do it 100 times, they should get a very pointed distribution and they should know which distributions are hard for humans to read based on the way the skewness is and the approximate location of the number in their GPS database.
They could get a ton right and very, very few wrong with this system.
The real question is, how does their system identify numbers in the photo without actually knowing what the numbers are?
I'd guess that they can detect that it's a house number (look for oval shapes, or stuff that looks like characters in the usual spot on a house, etc.) but not be highly confident what the exact number depicted is.
Are you suggesting that detecting text from these mediocre-at-best photos is easy or that it's easy to determine which of the two sides is the control photo?
I'm suggesting that it's easier to detect text from these "mediocre at best" photos. Classifying objects in a visual space often deal with low quality images. In fact, the algorithms have to deal with the objects they're trying to classify at many different resolutions. While captchas are generally hard to identify even for humans, because they're contorted and confusing, the numbers in these addresses are all standardized numbers in grainy environments.
I would even go as far to argue that if these become widely used, we're going to see algorithmic "solvers" for this captcha in a matter of weeks.
Maybe Google is just crowd sourcing an algorithm for this problem! Then they will move onto another of the image recognition problems they have, and they get the spammers to make the data improving algorithms...
I don't really understand that response. What I'm saying is that if these addresses become one of the actually-checked human-verifiers, it will be easier to circumvent, because those numbers look comparably easier for a virtual classifier to evaluate.
The house numbers are easier for a computer to classify than the messy, weird contorted letters.
I'm curious how they're able to identify what parts of images are house numbers. To me, that seems like it would be a harder problem than determining what numbers are present. But I have really little knowledge in this domain so I might be completely wrong.
From a privacy standpoint, what effect does this have on us? I kind of cringed at the thought of my house number being spread around the Internet for strangers to solve. Then I realized I don't really have a problem with people solving my address so much as I have a problem with Google having a database of photographs of everyone's house.
My house isn't necessarily invisible, but having photographs of my (and every other house) sitting in some company's database makes me uncomfortable.
It's kind of like those cars that drive around recording everyone's license plate. Sure, that information is sitting out in the open anyway, but streamlining the collection of that information gives me the creeps.