Hacker News new | past | comments | ask | show | jobs | submit login
Breaking the Silk Road's Captcha (github.com/mieko)
238 points by DavidChouinard on Sept 26, 2014 | hide | past | favorite | 19 comments



This a great writeup and super easy to follow along with. The figures are really nice!

One observation: training a neural net to classify segmented characters is probably overkill. The author observed that the font never changed, but never ended up exploiting this fact. After the very effective preprocessing, thresholding, etc the characters are almost identical to the 'average' representations the author generated!

I bet it would be enough simply to classify an unknown character by the letter that it shows highest correlation with.


He added the corpus to the repo so you can try it: http://www.reddit.com/r/programming/comments/2hisfk/breaking...


Agreed!

Also, given the successful character extraction, and the knowledge that (1) the font doesn't change, and (2) they're just translated and rotated, I think performing those operations on the individual characters could've yielded a pretty perfect success. Simply try a whole bunch of shifts and rotations on a given character until it matches a reference, almost exactly.


This is all true. The first thing I built was an awesome hammer, and I had a large amount of nails to deal with.

In retrospect, I'm sure there was a good deal of more focused, surgical solutions that'd save some overhead.


Same thinking here: IIRC there are some pretty good open source OCR programs that would make short work of the individual characters, with no need to re-assemble and/or neural networkify. However, the boilerplate code he gives is generally applicable to less braindead captures, which is a great resource for others.


FYI: Captchas are generally considered "broken" at between 1% and 10% rates of success with automated approaches, because attackers can run hundreds of thousands of requests, generally "for free" at the margin. There is no practical difference in the amount of abuse suffered by a site with a 90% captcha and a 9% captcha -- the first one just requires 10X as many HTTP requests to abuse.

This is one of the unfortunate "math favors the bad guy" consequences in a lot of anti-abuse filtering tasks. (Anti-spam research has similar problems, which is why the main innovation wasn't making filters better but radically increasing the cost of getting caught, via burning the reputation of the offending IP. IP addresses are a lot more expensive to acquire in quantity than packets.)


Author here. Here's the corresponding proggit thread: http://www.reddit.com/r/programming/comments/2hisfk/breaking...


That was a surprisingly simple and easy-to-follow write-up. I will have to try some captcha-breaking for myself soon.


Silk Road used ReCaptcha long ago and it finished bad: http://krebsonsecurity.com/2014/09/dread-pirate-sunk-by-leak...


No it didn't, I believe that image is just an example of "a captcha" for the article


Very cool and well-written post.

I've created many similar programs to defeat captcha's. I would classify this as a medium severity bug, you would still need to brute force the passwords on a terribly slow and intermittent connection.


I feel like a much smarter programmer after reading that.


Could that last step be considered a kind of Levenshtein distance measurement?


And more + reddit captcha

https://github.com/dawjan/Open_Me/tree/master/Captcha%20Crac...

Also /.. is php tor


The lesson? Include a developer API with your site, so people don't have to undermine your security to use it.


I believe that the Silk Road was built on the CodeIgniter frameowkr for PHP.


Why wouldn't you just pay a captcha breaking service to get a near-100% success rate? Less noticable for botting and $10 will buy you around 10k captchas on antigate or deathbycaptcha. Don't really need to log in and out that much, so that'd probably be plenty.


What a question to ask on HN of all places.


I'm to assume people on HN have no desire to do something productive or useful with their time?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: