Google Now Using ReCAPTCHA To Decode Street View Addresses

DavidChouinard · on March 30, 2012

Slightly related, but if you haven't yet seen Luis von Ahn's TED talk on reCAPTCHA, I recommend it highly: http://www.ted.com/talks/luis_von_ahn_massive_scale_online_c...

pbreit · on March 29, 2012

I'm really not a fan of captchas in general and Recaptcha in particular. Captchas are user hostile enough, benefiting the service provider at the user's expense. Recaptcha is worse because it is even more of a drag on the user and now also benefits Google.

I run across a lot of captchas where I would be very surprised if the site owner actually had a problem being addressed by the captcha.

quanticle · on March 30, 2012

You'd be surprised. With automated posting tools, the cost of probing many different sites and posting massive amounts of spam (in an effort to raise search engine rankings) is very low indeed. I've run several small forums, and without fail, eventually they'd be overrun with spam and I'd have to either 1) install authentication or 2) install a captcha.

I agree that captchas are user-hostile. But, when weighed against the alternative - a forum or comments section so filled with spam and nonsense as to be unusable - captchas are by far the lesser of two evils.

schiffern · on March 30, 2012

> Recaptcha is worse because it... now also benefits Google.

I fail to see how this is a down-side. Maybe it's not an up-side, but is it actually a loss for anyone?

pittsburgh · on March 30, 2012

As a developer, I agree that captchas can be a drag on users, and I think we should put serious thought into whether or not their UX-cost is justified. Personally, I try to avoid using them on sites I build.

That said, as a user, it's hard to hate ReCaptcha after hearing Luis von Ahn talk about and defend his invention: http://www.youtube.com/watch?v=-Ht4qiDRZE8

It's a clever way to digitize books (and now street signs) while keeping out spambots, and when I combine that knowledge with my affinity for Luis von Ahn as a person I find myself less annoyed every time I have to prove I'm not a robot.

a3_nm · on March 30, 2012

> It's a clever way to digitize books (and now street signs)

It used to be a cool idea when it benefited everyone, because it was used to ditigize public domain books that everyone could retrieve. Now, it's only a clever way for Google to use your brain to do stuff for their own purposes.

smackfu · on March 30, 2012

Many of those in the examples at the bottom look like trivial work for OCR.

nkassis · on March 30, 2012

Could be to see how reliable the results are for the experiment or the OCR I guess. How well they agree.

peteretep · on March 30, 2012

Which is weird - the whole strength of RECAPTCHA comes the examples being ones that computers have specifically failed to solve...

notatoad · on March 30, 2012

I think everybody might be looking at this backwards. Google already has very precise gps-correlated maps and street view data, it doesn't seem like they would need our help to determine street addresses.

Recaptcha pairs a known with an unknown to verify, maybe the addresses are the known factor.

hbar · on March 30, 2012

Usually if I input an address into Google maps, it places a marker somewhere near the actual address. If I use street view it seems to point me in a random direction and usually not toward the address I wanted. I think this is aimed at that.

sukuriant · on March 29, 2012

Wow. That's even easier to write a bot to circumvent...

herge · on March 30, 2012

If Google is using those pictures as captchas, it is probably because they could not write an algorithm to decipher them.

If, in the worst case, this results in spammers creating even better pattern recognition algorithms than the current cutting edge, it is certainly worth all the effort.

anonymoushn · on March 30, 2012

User responses to the pictures are not checked for correctness. Every time you see a reCAPTCHA, you can just enter a garbage value for the word that is less distorted and the system will accept it.

ceejayoz · on March 30, 2012

It's not always the less distorted one. As for data integrity, presumably Google shops each word out to multiple users so over time they get an idea of what the proper response is.

wtvanhest · on March 30, 2012

exactly, if they do it 100 times, they should get a very pointed distribution and they should know which distributions are hard for humans to read based on the way the skewness is and the approximate location of the number in their GPS database.

They could get a ton right and very, very few wrong with this system.

The real question is, how does their system identify numbers in the photo without actually knowing what the numbers are?

ceejayoz · on March 30, 2012

I'd guess that they can detect that it's a house number (look for oval shapes, or stuff that looks like characters in the usual spot on a house, etc.) but not be highly confident what the exact number depicted is.

ericlevine · on March 30, 2012

Are you suggesting that detecting text from these mediocre-at-best photos is easy or that it's easy to determine which of the two sides is the control photo?

sukuriant · on March 30, 2012

I'm suggesting that it's easier to detect text from these "mediocre at best" photos. Classifying objects in a visual space often deal with low quality images. In fact, the algorithms have to deal with the objects they're trying to classify at many different resolutions. While captchas are generally hard to identify even for humans, because they're contorted and confusing, the numbers in these addresses are all standardized numbers in grainy environments.

I would even go as far to argue that if these become widely used, we're going to see algorithmic "solvers" for this captcha in a matter of weeks.

justincormack · on March 30, 2012

Maybe Google is just crowd sourcing an algorithm for this problem! Then they will move onto another of the image recognition problems they have, and they get the spammers to make the data improving algorithms...

schiffern · on March 30, 2012

reCAPTCHA has always checked only one side. The new system isn't any less secure.

sukuriant · on March 30, 2012

I don't really understand that response. What I'm saying is that if these addresses become one of the actually-checked human-verifiers, it will be easier to circumvent, because those numbers look comparably easier for a virtual classifier to evaluate.

The house numbers are easier for a computer to classify than the messy, weird contorted letters.

schiffern · on March 30, 2012

Then they'll do what they did for books – introduce distortion. reCAPTCHA has already solved this problem.

Permit · on March 30, 2012

I'm curious how they're able to identify what parts of images are house numbers. To me, that seems like it would be a harder problem than determining what numbers are present. But I have really little knowledge in this domain so I might be completely wrong.

majmun · on March 30, 2012

let me guess , there is another captcha that test users to find house numbers in a picture.

ceol · on March 30, 2012

From a privacy standpoint, what effect does this have on us? I kind of cringed at the thought of my house number being spread around the Internet for strangers to solve. Then I realized I don't really have a problem with people solving my address so much as I have a problem with Google having a database of photographs of everyone's house.

Are my concerns unfounded?

notatoad · on March 30, 2012

In what way does tying the appearance of your house to its street number have anything to do with your privacy?

ceol · on March 30, 2012

It's more so the photograph of my house than knowing my street number.

enjo · on March 30, 2012

What about that makes you uncomfortable exactly? I'm genuinely curious, as I've heard that sentiment expressed many times before.

It's not like your house is otherwise invisible...

ceol · on March 30, 2012

My house isn't necessarily invisible, but having photographs of my (and every other house) sitting in some company's database makes me uncomfortable.

It's kind of like those cars that drive around recording everyone's license plate. Sure, that information is sitting out in the open anyway, but streamlining the collection of that information gives me the creeps.

__alexs · on March 30, 2012

Go remove it then? I did it for mine just to see if they would and it was gone within 48 hours.

notatoad · on March 30, 2012

I don't think most countries have this option.

__alexs · on March 30, 2012

America and Europe do at least. Use the "Report a Problem" link when looking at your house on Street View.

mquander · on March 30, 2012

If you even hinted at what your concerns actually are, then I guess people might be better at speculating about whether they are unfounded or not.