Breaking the Silk Road's Captcha

lstyls · on Sept 26, 2014

This a great writeup and super easy to follow along with. The figures are really nice!

One observation: training a neural net to classify segmented characters is probably overkill. The author observed that the font never changed, but never ended up exploiting this fact. After the very effective preprocessing, thresholding, etc the characters are almost identical to the 'average' representations the author generated!

I bet it would be enough simply to classify an unknown character by the letter that it shows highest correlation with.

throwaway15213 · on Sept 26, 2014

He added the corpus to the repo so you can try it: http://www.reddit.com/r/programming/comments/2hisfk/breaking...

malgorithms · on Sept 26, 2014

Agreed!

Also, given the successful character extraction, and the knowledge that (1) the font doesn't change, and (2) they're just translated and rotated, I think performing those operations on the individual characters could've yielded a pretty perfect success. Simply try a whole bunch of shifts and rotations on a given character until it matches a reference, almost exactly.

mieko · on Sept 27, 2014

This is all true. The first thing I built was an awesome hammer, and I had a large amount of nails to deal with.

In retrospect, I'm sure there was a good deal of more focused, surgical solutions that'd save some overhead.

contingencies · on Sept 27, 2014

Same thinking here: IIRC there are some pretty good open source OCR programs that would make short work of the individual characters, with no need to re-assemble and/or neural networkify. However, the boilerplate code he gives is generally applicable to less braindead captures, which is a great resource for others.

patio11 · on Sept 27, 2014

FYI: Captchas are generally considered "broken" at between 1% and 10% rates of success with automated approaches, because attackers can run hundreds of thousands of requests, generally "for free" at the margin. There is no practical difference in the amount of abuse suffered by a site with a 90% captcha and a 9% captcha -- the first one just requires 10X as many HTTP requests to abuse.

This is one of the unfortunate "math favors the bad guy" consequences in a lot of anti-abuse filtering tasks. (Anti-spam research has similar problems, which is why the main innovation wasn't making filters better but radically increasing the cost of getting caught, via burning the reputation of the offending IP. IP addresses are a lot more expensive to acquire in quantity than packets.)

mieko · on Sept 26, 2014

Author here. Here's the corresponding proggit thread: http://www.reddit.com/r/programming/comments/2hisfk/breaking...

oftenwrong · on Sept 26, 2014

That was a surprisingly simple and easy-to-follow write-up. I will have to try some captcha-breaking for myself soon.

praeivis · on Sept 26, 2014

Silk Road used ReCaptcha long ago and it finished bad: http://krebsonsecurity.com/2014/09/dread-pirate-sunk-by-leak...

bencoder · on Sept 26, 2014

No it didn't, I believe that image is just an example of "a captcha" for the article

goldmouth · on Sept 26, 2014

Very cool and well-written post.

I've created many similar programs to defeat captcha's. I would classify this as a medium severity bug, you would still need to brute force the passwords on a terribly slow and intermittent connection.

magerleagues · on Sept 26, 2014

I feel like a much smarter programmer after reading that.

blueintegral · on Sept 26, 2014

Could that last step be considered a kind of Levenshtein distance measurement?

usrname · on Sept 26, 2014

And more + reddit captcha

https://github.com/dawjan/Open_Me/tree/master/Captcha%20Crac...

Also /.. is php tor

krispyfi · on Sept 28, 2014

The lesson? Include a developer API with your site, so people don't have to undermine your security to use it.

_RPM · on Sept 27, 2014

I believe that the Silk Road was built on the CodeIgniter frameowkr for PHP.

ultramancool · on Sept 26, 2014

Why wouldn't you just pay a captcha breaking service to get a near-100% success rate? Less noticable for botting and $10 will buy you around 10k captchas on antigate or deathbycaptcha. Don't really need to log in and out that much, so that'd probably be plenty.

gwern · on Sept 26, 2014

What a question to ask on HN of all places.

ultramancool · on Sept 29, 2014

I'm to assume people on HN have no desire to do something productive or useful with their time?