Yahoo CAPTCHA Cracked.

derefr · on Jan 30, 2008

A CAPTCHA method that just came into my head at this moment, and thus is completely unfounded and horrible: "Please compose a haiku on the topic of 'togetherness' [or something equally vague]. Your work will then be passed through a Bayesian filter specifically trained on that topic."

dawnerd · on Jan 29, 2008

I really like to use reCpatcha. Sure it's not the most secure, but it's for a good cause, and it keeps out the basic bots.

For anyone who doesn't know what recaptcha is: http://recaptcha.net/

nickb · on Jan 29, 2008

I wrote about reCaptcha before and the big problem with it is that it can be hard. And by hard I mean that the majority of people can't solve it. If you have a big site and you have a spam issue, then by all means, use it. But if you're a small site that's trying to grow and a site that considers every potential user to be precious, stay away from it because the last thing you want is to frustrate people!

Since you will not be a target early on, consider not using captcha at all since spammers don't target sites that have no traffic.

dawnerd · on Jan 31, 2008

I have to agree with you on that. I only use the captcha when I absolutely need to. What I've found to work well is have the mail go through a good spam filter before you get it.

thomasswift · on Jan 29, 2008

do you have a link? i love to read your thoughts. I have been thinking about using recaptcha, primarily because of the little refresh button and of course the captcha.

nickb · on Feb 1, 2008

it's somewhere on N.YC but I can't find it anymore :(

curi · on Jan 29, 2008

comment spammers came to my blog. which has no traffic.

kf · on Jan 30, 2008

I'm surprised the Russian hackers gave away their implementation for free. I suspect it was an unintentional leak.

Anyone else remember that Chinese website that was selling captcha hacks for 20 different types of captchas?

dejb · on Jan 30, 2008

Oh my god! The singularity has begun!

ajkirwin · on Jan 29, 2008

A long time ago I decided, that if I were to ever have a captcha system, it wouldn't use ones of that style, which are getting so.. randomized these days, it's hard for a human to read.

Frankly, I think knowledge or logic based captchas are the way to go.

"Todd is three times as old as Jane is. When Jane was ten years younger, Todd was five times as old as Jane is. How old is Todd?" for example.

Sure, takes longer to think about, but if someone can write a script to start parsing logic puzzles like that, quickly, and use it to defeat website signup authentication methods?

I'll buy that person a pint.

cglee · on Jan 29, 2008

If myspace used that system, they'd have about 99 million less users.

marcus · on Jan 29, 2008

I'd take you on that bet. Write the system and I'll write the solver. I drink Guinness :)

A better system would be to ask the user to identify the gender of a person based on an image to tell whether the picture is of a dog or of a cat and so on, these tasks are trivial for a person and very very hard for a program.

apgwoz · on Jan 29, 2008

My introduction to AI class required us to write a classifier for people in the news (20 different images of 10 different people). We were given the location of the people's faces, but we were able to get 60% accuracy using SVMs and the 32x32 block of pixels (nose, eyes, mouth region). This was the "baseline" system. Some systems were getting nearly 85+%. I must admit though, that this was a restricted dataset, but the faces were not all looking straight ahead the way eigenfaces are, and I'm sure with enough data, and enough features that sort of CAPTCHA could be defeated a large percentage of the time.

marcus · on Jan 29, 2008

You are confusing two different tasks, identifying a person out of a small comparison group is relatively easy - just deconstruct the face & compare certain facial features. There was a ton of research on the subject and even some working commercial products (my schools AI lab uses one we built as a lock).

Identifying gender is significantly harder.

You want something a lot harder, have them click on the picture of the more attractive person, use data from a hotornot type site (just make sure the data isn't public). Good luck solving that with Support Vector Machines. If you want to generate more data just use build RE-CAPTCHA type system.

apgwoz · on Jan 29, 2008

I don't see why it'd be harder with good features, and after looking at the article again, 35% accuracy was considered a success. Obviously, I'm not as qualified as you in this sense, but it seems logical based on results I've seen (again, admittedly not the same quality as you've probably seen).

hhm · on Jan 29, 2008

I checked on the "obviously, I'm not as qualified as you in this sense" by looking at the user info, and I remembered that "Ideas to monetize new artifical intelligence" thread... so Marcus, sorry if it's offtopic, but how did you solve that problem? Are you doing captchas maybe? :)

marcus · on Jan 29, 2008

Doing CAD - Computer Assisted Diagnosis - working on improving early cancer detection in Mammography.

marcus · on Jan 30, 2008

In a way I understand trying to apply the algorithm manually for each client is wasteful, negotiation with each client is tiring especially when its with a 7B company.

I'm thinking about using the idea I got of building a web-service around it, and letting people find their own uses for it.

Considering applying to YC with the idea.

hhm · on Jan 30, 2008

Thanks a lot for your reply and details! I was very interested on it.

Lockheed · on Jan 30, 2008

For photograph based captchas,the diversity will be too low and will be uneconomical to implement on a large scale.

How about animated gifs instead of static images as captchas?

utnick · on Jan 29, 2008

i could write a script to solve your system 50% of the time, which is probably good enough for spammers

marcus · on Jan 29, 2008

I didn't say you would only have to describe one picture. Make it 16 pictures and you got 1/64k to succeed.

bayareaguy · on Jan 29, 2008

I've always thought that for blog comments, the captcha should actually take the form of a few short SAT-style questions which would test reading comprehension. E.g.

"Which one of the following is not one of the points the author is trying to make"

"Which of the following best describes the authors' opinion of Java?"

"What did the author cite as his source for his labor statistics?"

boredguy8 · on Jan 30, 2008

Not only do you cut down on spam, but respondents would have actually READ the article.

That's convergence if I've ever seen it!

tomjen · on Jan 29, 2008

You better have a really good porn site if you expect me to jump through that kind of holes to use it.

Oh and Todd is 60.

derefr · on Jan 30, 2008

"Todd was five times as old as Jane is."

I believe that that semantic flaw sadly classes Todd out of existence by 15 years (that is, t = -15).

paulgb · on Jan 29, 2008

The problem is that problems that are easy to generate with a computer are usually easy to solve with one. Even if you rotated through 100 different variations, solving 10 of those would give the spammer a 10% success rate. If it is just for one small site, no one is going to break it, but there are easier ways to prevent spam on small sites (comment spam bots don't understand JavaScript for instance - something that I take advantage of on my blog).

A system like that might be good for keeping commenter quality high though :).

whacked_new · on Jan 29, 2008

Consider an image-based, interactive captcha using drag and drop.

Show a map with a bee on it. Please guide this bee to the flower, passing the pond, the bear, and the scarecrow. Now bring the bear to the beehive; you would generate a hash string based on the sequence of visits. This may involve multiple back-and-forths, which is probably bad, but fairly brainless and robust, from what it seems, as little thought as I have given it.

buro9 · on Jan 29, 2008

And highly inaccessible.