A long time ago I decided, that if I were to ever have a captcha system, it woul...

cglee · on Jan 29, 2008

If myspace used that system, they'd have about 99 million less users.

marcus · on Jan 29, 2008

I'd take you on that bet. Write the system and I'll write the solver. I drink Guinness :)

A better system would be to ask the user to identify the gender of a person based on an image to tell whether the picture is of a dog or of a cat and so on, these tasks are trivial for a person and very very hard for a program.

apgwoz · on Jan 29, 2008

My introduction to AI class required us to write a classifier for people in the news (20 different images of 10 different people). We were given the location of the people's faces, but we were able to get 60% accuracy using SVMs and the 32x32 block of pixels (nose, eyes, mouth region). This was the "baseline" system. Some systems were getting nearly 85+%. I must admit though, that this was a restricted dataset, but the faces were not all looking straight ahead the way eigenfaces are, and I'm sure with enough data, and enough features that sort of CAPTCHA could be defeated a large percentage of the time.

marcus · on Jan 29, 2008

You are confusing two different tasks, identifying a person out of a small comparison group is relatively easy - just deconstruct the face & compare certain facial features. There was a ton of research on the subject and even some working commercial products (my schools AI lab uses one we built as a lock).

Identifying gender is significantly harder.

You want something a lot harder, have them click on the picture of the more attractive person, use data from a hotornot type site (just make sure the data isn't public). Good luck solving that with Support Vector Machines. If you want to generate more data just use build RE-CAPTCHA type system.

apgwoz · on Jan 29, 2008

I don't see why it'd be harder with good features, and after looking at the article again, 35% accuracy was considered a success. Obviously, I'm not as qualified as you in this sense, but it seems logical based on results I've seen (again, admittedly not the same quality as you've probably seen).

hhm · on Jan 29, 2008

I checked on the "obviously, I'm not as qualified as you in this sense" by looking at the user info, and I remembered that "Ideas to monetize new artifical intelligence" thread... so Marcus, sorry if it's offtopic, but how did you solve that problem? Are you doing captchas maybe? :)

marcus · on Jan 29, 2008

Doing CAD - Computer Assisted Diagnosis - working on improving early cancer detection in Mammography.

marcus · on Jan 30, 2008

In a way I understand trying to apply the algorithm manually for each client is wasteful, negotiation with each client is tiring especially when its with a 7B company.

I'm thinking about using the idea I got of building a web-service around it, and letting people find their own uses for it.

Considering applying to YC with the idea.

hhm · on Jan 30, 2008

Thanks a lot for your reply and details! I was very interested on it.

Lockheed · on Jan 30, 2008

For photograph based captchas,the diversity will be too low and will be uneconomical to implement on a large scale.

How about animated gifs instead of static images as captchas?

utnick · on Jan 29, 2008

i could write a script to solve your system 50% of the time, which is probably good enough for spammers

marcus · on Jan 29, 2008

I didn't say you would only have to describe one picture. Make it 16 pictures and you got 1/64k to succeed.

bayareaguy · on Jan 29, 2008

I've always thought that for blog comments, the captcha should actually take the form of a few short SAT-style questions which would test reading comprehension. E.g.

"Which one of the following is not one of the points the author is trying to make"

"Which of the following best describes the authors' opinion of Java?"

"What did the author cite as his source for his labor statistics?"

boredguy8 · on Jan 30, 2008

Not only do you cut down on spam, but respondents would have actually READ the article.

That's convergence if I've ever seen it!

tomjen · on Jan 29, 2008

You better have a really good porn site if you expect me to jump through that kind of holes to use it.

Oh and Todd is 60.

derefr · on Jan 30, 2008

"Todd was five times as old as Jane is."

I believe that that semantic flaw sadly classes Todd out of existence by 15 years (that is, t = -15).

paulgb · on Jan 29, 2008

The problem is that problems that are easy to generate with a computer are usually easy to solve with one. Even if you rotated through 100 different variations, solving 10 of those would give the spammer a 10% success rate. If it is just for one small site, no one is going to break it, but there are easier ways to prevent spam on small sites (comment spam bots don't understand JavaScript for instance - something that I take advantage of on my blog).

A system like that might be good for keeping commenter quality high though :).

whacked_new · on Jan 29, 2008

Consider an image-based, interactive captcha using drag and drop.

Show a map with a bee on it. Please guide this bee to the flower, passing the pond, the bear, and the scarecrow. Now bring the bear to the beehive; you would generate a hash string based on the sequence of visits. This may involve multiple back-and-forths, which is probably bad, but fairly brainless and robust, from what it seems, as little thought as I have given it.

buro9 · on Jan 29, 2008

And highly inaccessible.