Hacker News new | past | comments | ask | show | jobs | submit login
How to break a Captcha system in 15 minutes with Machine Learning (medium.com/ageitgey)
257 points by ageitgey on Dec 13, 2017 | hide | past | favorite | 99 comments



>Since we have the source code to the WordPress plug-in, we can modify it to save out 10,000 CAPTCHA images along with the expected answer for each image.

This is the key to doing this in the easiest way possible. The training set is created almost automatically!


Hah! Come to think of it, CAPTCHA is basically the definition of "security through obscurity," except worse: it's "security through the difference between obscurity to machines and clarity to humans," and that's a game that will keep getting harder to win.


This is supposedly a feature and not a bug of CAPTCHA, at least according to the apocryphal story John Lafferty told me. IIRC, this was at CMU so he must have been referring to the 2003 CAPTCHA claim by von Ahn et al. The idea was primarily to stop spammers, but also secondarily to make them more useful. The argument was that if spammers managed to "break" CAPTCHA, then whatever technology was used to break it would necessarily be a useful (compared to 2003 knowledge) advance in AI, so it was a win-win whether it stopped spammers or just made them do free out-of-band research.


Yes, back in the day, CAPTCHA was a good way to use human time trying to submit forms to both block spam-bots, and do useful work (e.g. label training data). This worked well when ML was bad at image recognition, but that's no longer the case. Unfortunately, as ML got better at the tasks, CAPTCHAs forced humans to do more work, and we're now at a point where either the machines are better, or it's not worth the humans' time. If your site has a CAPTCHA, there's a good chance I'll just close the tab and move on.


If one uses Google's capture, then most likely that capture already has enough data on you, so it doesn't ask anything.


Greetings, comrade! I see by your comment that you don't often use VPNs, or tor, and even limit your use of incognito/private tabs[1]. Thank you for being a good citizen! Carry on!

/s NNAlphabetPopulationSentimentBOT 0.1alpha-3zulu-beta-5-mark20

[1] We are getting better all the time at tracking activity in private/incognito tabs, but there are still some gaps particularly when users manually enable extensions like ublock in incognito mode. This hurts the user's experience on the web, and we suggest only enabling google extensions in incognito. It's incognito even without privacy-enhancing extensions. We promise. Nevermind that we know enough about you to let you bypass our captchas.


Brave is my primary browser running under Firejail. I do not use private tabs. Rather I just delete all browser files time-to-time. After that Google capture do ask questions but after a couple of times it stops.


What a ...relief?


I had to put Google capture on a site running a forum with about 300 daily visitors. Otherwise number of bots that passed through email confirmation and tried to write comments with ads was 20-30 daily. It removed all bots.


I can't find the link, but I remember someone doing some experiments awhile back suggesting that the "I'm a human" checkbox just checks if you have Google's tracking "PREF" cookie installed. I don't remember if there seemed to be device fingerprinting involved as well. In any case, if I use one of those and don't have to click on street signs, I know it's a sign I should tidy up my browser state.


For whatever reason, probably my use of AdBlock/uBlock/PrivacyBadger/Disconnect/etc, I get the captcha 100% of the time from google.


I use Brave as my browser. It blocks ads much better than extensions. In addition my primary search engine is DuckDuckGo. Still most Google captures do not ask questions.


In this argument, how do we keep AI from destroying the internet?


If by "the internet" you mean "the ad-supported web," then we can't, and that may not be a bad thing.


I have a tangential question regarding this post.

I have notice more recently at I am asked to identify object(signs, roads etc.)using multiple iterations. It's not unusual to be asked to identify these on 3 separate iterations. This is despite identifying all cells correctly on both the first and second passes.

I don't believe it was always this way. What is the reason for this? Is there some heuristic in the captcha program that decides to ask for further identification or is this randomly generated?


Google is literally using you as unpaid labor to label/validate their image datasets.

My guess is that either (1) your first attempt at validation didn't match what their automated system identified so it didn't have high enough confidence to proceed or (2) the first image is one you are classifying from scratch but the second/third image is the one that you are only validating against their guess.

In either case, it's amazing to me that Google is able to get the whole world to do unpaid menial work for them by offering a free product to website owners.


The "identify the grid squares which contain a sign" type CAPTCHA isn't clear about whether you should be selecting all grid squares that contain mostly sign, or all grid squares that contain ANY of the sign. I imagine that particular one causes a fair discrepancy in what people select.

In fact I want to know what other people choose to do for those one. I'd like to guess that Hacker News types would disproportionately choose the latter option.


A million times this.

- Do the posts or supports of the signs count as part of the signs?

- What if it's a sign but not a street sign?

- If the camera is looking across a road, but the pavement isn't visible, does that image contain a road? I suspect humans are answering that differently, but it's not as bad as the sign one.

- Storefronts. Sometimes I can't tell what I'm looking at. There's a building with lettering on it, but I can't tell whether it's a store much less a storefront.

Google needs to add a tutorial for humans on how to answer those captchas, because there's not enough context for us to figure out what google really wants.

If they'd dogfood this and require their employees to solve a captcha to login to their workstations, this problem would have been solved yesterday.


I think they shift selection grid from user to user, so they get a statistical outcome with a great precision of where the sign really is. If some data points stand out, they should be simply ignored.


Yeah I do any of the sign.


Interesting, my assumption has been that it must reside mostly or entirely in the square. Now I'm wondering are the instruction intentionally vague?


it's amazing to me that Google is able to get the whole world to do unpaid menial work for them by offering a free product to website owners.

I'm surprised, since google bought youtube in 2006, that google doesn't let people add translation subtitles to videos on youtube, which is something people have historically been actively willing to do for free, compare multiple submissions against each other to gauge accuracy, then map that to the audio of the video to train and machine based language translation datasets. Always seemed like a killer way to leverage what people would do for free to advance the state of the art.


Ha, I believe Duolingo is doing the same - getting their users to translate huge bodies of text while learning new languages


The creator of Duolingo was also the creator of ReCaptcha I think. He has a really interesting ted talk about it here: https://www.ted.com/talks/luis_von_ahn_massive_scale_online_...


>In either case, it's amazing to me that Google is able to get the whole world to do unpaid menial work for them by offering a free product to website owners.

Its an example of the https://en.wikipedia.org/wiki/Principal%E2%80%93agent_proble....


Historically the most reliable way to break a CAPTCHA was to pay a human to solve it via a sketchy service, for eg $3/1000.

The round trip time of those services can be quite long, narrowly within the CAPTCHA timeout window. By forcing some users to solve three iterations of the puzzle (the exact instructions are “stop when there are no more images containing <cars>”), Google forces multiple round trips to a CAPTCHA solving service. Also, the solving service and its client need to account for the expectation that the final image will have no <cars>, increasing complexity.

This increases cost 3x, slows solving rate 3x, etc. Ultimately google can never stop the problem of outsourcing CAPTCHA solving to a human, so their best option is to increase the cost and complexity of doing so.


This comes at the expense of normal users who have to solve three iterations of the puzzle.

One co-op term I wrote a simple VBA script to open up Google and search for some addresses. It was a small 2-3 day research project. Well unfortunately for me, I subsequently had to solve captchas every day for the next month and a half, and although about half of them would be instantly solved, when I had to redo CAPTCHAs I could feel my sanity slipping away.


Google is stingy with search results, even though its index exploits the content we all create. They run millions of automated requests on my server - in fact Google is using more CPU than my regular visitors, but I can't run on their servers even a few thousand automated queries.

They don't allow even mild automated searching, the price per thousand searches is prohibitive and you can't get past the 1000'th result so you need to use combinations of words and other imperfect means to go deeper.

I would have liked to do deep web searches in order to build datasets for ML, but unfortunately there's no search engine (or is too expensive) for bots.

It's 2017, years after the Snowden leaks, and we still use Google and don't have a f*#ng search engine for creators, we only have search engine for 'customers'. We need our own search engine for many things - privacy, control over filtering, mass search, deep search, for creating novel projects, to be sure Russians haven't tampered with it, and so on.

I'd say Google's refusal to accept bots is akin to a lack of net neutrality and places limits on what kind search based open source projects can be created - I know it sounds entitled, and maybe I should just build my own crawler and index - but you don't solve a more complex problem as a preamble to solving a simpler problem and we don't want thousands or millions of private search engines pulling the same content and duplicating indexes - we still need a community solution.

If they love net neutrality, they should support search neutrality as well, for the maker community. It's the same argument as for net neutrality: to protect innovation from incumbents. Search is like the last mile and Google is like a nasty pipe throttling ISP.


>"They don't allow even mild automated searching, the price per thousand searches is prohibitive and you can't get past the 1000'th result so you need to use combinations of words and other imperfect means to go deeper."

Can you elaborate on what you mean by price per 1K searches and a 1K search result limit? Is this a B2B Google offering for search? Is this offering posted somewhere?


I think he means that the rate limiting when scraping google gets way more aggressive past the first page.


I’ve often felt the same as you, and I think there is room to disrupt google in this area: a search engine with an API, and a search engine that covers the “deep web” (not Tor, but content that is not currently indexed, like a lot of content in apps).

FWIW Bing has an API.


ReCAPTCHA has a "Security Preferences" parameter where you can set the level of difficulty of the test. If you set it to "Most Secure" most or all users are presented with the multiple image identification scenario you describe. Easiest will simply present the "I am not a robot" checkbox modal which does some browser fingerprinting and other tests in the background. I believe there is also a heuristic that determines the difficulty of the test to present users if you leave the setting on the default medium difficulty. Could be mistaken, but this is my takeaway from implementing reCAPTCHA a few times.


Oh this is interesting. I am almost certain this is what I am seeing. The funny thing is that the times I have gotten the most secure setting is for something like buying an ebook or RSVP'ing for an event. Nothing in the least bit sensitive. So I'm guessing people think they are practicing good security when all they are doing is making for a terrible user experience.


This makes me think,why not use trained humans to come up with challenges that can't be solved easily or efficiently with training data. As an example, a team of 2,000 humans can come up with about 10,000 challenges an hour to be solved by 1,000,000 visitors an hour. The key being humans trained to generate questions and challenges that they themselves wouldn not have to solve as part of any regular activity or event.

The labor cost can obviously be outsourced(or not) and it makes sense to charge as a service to sites.

However,this makes me ask one more question - what if popularity in ML creates meanial jobs for humans similar to my theoretical solution? Everyone looks at the benefits of ML,but all good technologies get abused and it's hard to imagine even more ML countering abused ML.


Creating menial jobs isn't a problem. People being in situations where they need to settle for menial jobs is the problem.


What questions?

It would need 1) Not easily searchable 2) Not math-based 3) Easy 4) Internationalizable

I would see "spot the grammar mistake" as one that furfills 1-3, but can't figure out oone that does all 4.


Cool writeup! Although I agree those captcha's are fairly trivial.

In college I wrote a term paper on breaking Microsoft's captcha (which is a little harder but not by much) twice: first with a simple template-based classification method and then a CNN approach.

https://www.dropbox.com/s/jfp5xbv3eh589f6/6_857_CAPTCHA.pdf?...

At the end, we go over approaches that would help captchas fight attacks. I think the quick flickering approach would work best (split the image into uneven parts, flicker them quickly so the human eye can read the aggregate image but any single slice doesn't show the full picture, and the superimposed image is incorrect)


Cool idea and thanks for sharing!

One of the challenges here (which I'm sure you are very aware of) is that perception tricks that fool computers like flickering images also can block out users with different types of visual impairments. Sometimes users with even minor or infrequently-symptomatic visual impairments won't be able to read an image[1] that uses a special "trick" like this.

For example, consider the risk of triggering an epileptic seizure with flickering. At a certain point it becomes an accessibility/legal issue.

[1] The animated example from nytf3's paper - please note that in contains strong flickering: http://people.csail.mit.edu/recasens/images/captcha.gif


Would flickering even be necessary? Why not just overlay several transparent GIFs/PNGs? It’s still hackable (so is the flickering solution), but you could also add in a few more tricks to make it more work for the hackers. For example, combine the layers dynamically into a single image with a separate HTTP request to retrieve the (random) positions of each layer within that image. (Just a thought...you could make it as simple or as complex as you want.)


At that point, you could have your captcha-breaker wait for the page to finish rendering, screenshot the relevant portion of the page, and solve from there. Seems easier than trying to download and stitch together the transparent GIFs or decode the jumble of HTTP requests.


That seems more like security by obscurity - as soon as somebody realises you are doing that, they can visit your site with headless Chrome and break it easily.


But whatever process the human eye uses to piece together a flickering image should itself be fairly simple, right?


couldn't you just capture a sample of frames and take the mean ?.


I really liked this article. Even with a very limited basic understanding about machine learning or image processing, I was able to understand what the author was talking about. Well done.


Good article! I don't know what you call those new Google captcha, where you have to press "I am not a robot" and then you are given a bunch of images and a question that goes something like: "Select all images that contain cars" Do you consider that a CAPTCHA? Or is it something else? Now that's one system that I would like to see beaten. From the article:

    Yep, it generates 4-letter CAPTCHAs using a random mix of four different fonts.
    And we can see that it never uses “O” or “I” in the codes
    to avoid user confusion.
this seems to be a very simple CAPTCHA system to beat from a ML problem perspective, right?


Yep, this is a pretty simple CAPTCHA picked to illustrate a concept for teaching purposes. But more complex CAPTCHA images don't necessarily make anything more secure depending on how the whole system works.

In fact, one of the most effective attacks against Google's ReCAPTCHA is more simple than anything in this article - you just request the audio version of the captcha, feed it to Google's own Speech Recognition API (or a competitor), and give Google back it's own result [1]

[1] https://github.com/eastee/rebreakcaptcha


According to the linked blog, this worked for about a week before Google flagged the IP and started delivering an obfuscated audio challenge that can't be recognized by their own recognition API.


| Do you consider that a CAPTCHA? Yes:"Completely Automated Public Turing test to tell Computers and Humans Apart"

But funny enough, I believe that data is used to help with machine learning to identify the objects, so we are CAPTCHA'ng ourselves into harder and harder CAPTCHA's.

The "I am not a robot" types are interesting because they don't just test you, they monitor your behavior. "If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck."


I would like this NOT to be broken. reCAPTCHA is currently one of the few captcha systems the still work to some extend. It locks out most spam bots and keeps my sites clean. Of course it's a good idea to try and break it, to see how "secure" it is.


Unfortunately the latest revision of reCAPTCHA has become such a pain that I've come very close to just giving up on signing up for whatever service it is I'm trying to use that has it implemented.

I have to go through several attempts to verify myself because Google didn't like that I missed one box with a car in it and have to start all over looking for street signs.


I often use a VPN and browser extensions that make the new one always asks me to verify. The new tests are just terrible (choose all photos with an apartment building??). It will sometimes take me a few minutes to complete the test when it asks me to check all boxes with a car and the slow fade animation that takes 5 seconds starts. Then it decides that I didn't choose them well enough and starts over again. The old CAPTCHA was much better and I didn't feel like I was just feeding Google's street view.


I've found that disabling JavaScript actually makes the test easier. I often fail the js version but I rarely fail the no-js one.


Next version will ask you to come wash some cars at the Googleplex...


Yeah, it frustrates me a lot too. If I need to pick the squares with street signs about half the time they want me to choose the squares that have the pole and half not! Ditto with the square with the tiny corner of the sign.


I became really good (or really bad) at them, that once I had to go through >8< rounds of reCAPTCHA to get through....I seriously was thinking it was never going to end!


One thing I find works really well is context-based question captchas. You ask human visitors some very simple question about your site, and unless the robot brute-force, it's very hard for robot spammers to get it right.


Yep, I saw this in rtl-sdr.com! They ask, for instance what does the S in SDR stands for, which obviously any legitimate commenter in such a niche site knows.


I think it mostly works by inspecting your browser environment to see if it looks like a bot. I'm not really convinced that the pictures do much.


Many times I don't even get to the second step. I only click "I am not a robot" and I've passed.

Are Captcha's a paid service and my examples are sites buying into only the lowest level of ID?

Or is this something akin to a decision made vs my access frequency/behaviour where the algo decides it's reasonably sure I'm a human so no need to waste bandwidth on sending and confirming images?


If you're not asked anything after checking the box then your browser's fingerprint has already been determined to be a real human by enough partner sites using the Captcha. Try opening a private browsing window and checking the box to see the second verification part.


They're run by google, so if you're a known "good account" it just detects that you have their cookie/are logged in and lets you through. Pop it open in incognito and you should get one.


I imagine that those are hard to beat with machine learning precisely because they stump Google's own algorithms. By answering them, you're helping train their neural networks (which is why I rather resent having to answer them).


> I don't know what you call those new Google captcha

Dunno what to call them, but I consider them educational interactions with AI [0]

[0] https://xkcd.com/1897/


I'm pretty sure you could break these trivial captchas with nothing more than a kNN classifier. There is no need to involve deep neural networks here.


The model used in the post (LeNet) is very small compared to what we typically think of in terms of modern deep neural networks. The LeNet architecture itself was published in the 1990s and was designed for character recognition so this is a natural application of it. kNN would work for many characters in this example, but you would run into a problem with overlapping text. kNN on raw pixel intensities requires a nice segmentation of the ROI. More image processing/segmentation/morphological operations could be applied to help in that case, but given a small network architecture that will naturally learn these filters a tiny CNN works well with little preprocessing.


You might be right. But who cares if using a deep neural network takes 15 minutes? There’s also no need to use a chainsaw to cut down a Christmas tree, but if you’re already holding a chainsaw...


Is dnn a chainsaw? I dunno. More like a Home Depot multipurpose power tool.


When someone solves a problem with a thing and you're not that impressed, the best response is "cool" or "learning new things is fun" or, like, nothing.

Telling someone they didn't need to do it with that tool is presuming that a) you know why they did it, b) that you are able to judge the suitability for that of that tool for that purpose and under those constraints, and c) that there is no one else who has ever needed to use that exact tool to solve that exact problem under those same constraints.


While KNNs are simpler conceptually, DNNs are fairly easy to put into practice, thanks to Keras and other libraries. A KNN approach would likely involve more code - for things like Data prep


The advantaged of Dnn is that they are easy to understand, very well supported, solve a wide variety of problems, and gpu costs are very rapidly coming down. Plus you can use them to play games!


very likely, something like PCA + KNN does well on mnist. I think what will start to get tricky is rotation invariance.


The book is $645?!?!?


Hi, Adrian here, author of the book you are referencing. You are referring to the highest tier of the book. There are other lower tiers as well that are cheaper.

The highest tier (again, which you are referring to) includes 800+ pages, detailed experiment journals on how to reproduce the state-of-the-art publications (ResNet, SqueezeNet, VGG, etc.) on ImageNet (which is 1.2 million images). I demonstrate how to implement each model from scratch and then train them, detailing which parameters to change and when. The highest tier is for people looking to train really large networks on massive datasets where you could be spending thousands of dollars in the cloud for GPU costs (you can't train these networks without a GPU, or ideally multiple GPUs). I've also included the pre-trained models as well if people want to get started with them and skip training. This tier is really for researchers/practitioners who need to save time and finances by starting with experiment journals that detail how to replicate the results.

The lower tiers are for people just (1) getting started with deep learning in context of computer vision and/or (2) looking to apply best practices. Each book also includes video tutorials/lectures once I have finished putting them together. Realistically I should rebrand the book as a course as it's much more in line with something you would get from Udacity (only with more theory and more detailed code and implementations).

If anyone has any questions about the book do feel free to ask.


No offense, but your book website looks lot like a late night TV ads and frankly leaves a bad taste.

The way I like to buy a book is go to Amazon, look at table of content, read few pages and most importantly read some reviews. Your book currently doesn't even appear in Amazon search (or even Google search). Despite myself being quite active in the field, I had never heard about your book before (I know of at least other dozen books on the subject). I wonder this is why you might have relatively much lower volume and such a high price to make up for your revenue target. I would think putting your book on Amazon would increase your volume by an order of a magnitude (or two) and help reduce price to may be 1/7th or 1/8th without requiring tiered pricing (which again is a huge turnoff) while increasing your net revenue actually more than before (probably by an order of magnitude). You might want to look in the theory and economics of price-volume curves.

The biggest problem with your book website is that you as an author comes out as hard-selling hard-charging marketer who wants to maximize profits and make a sell like an old car salesman to anyone who is walking by rather than experienced calm expert for who learning, teaching, academic honesty and integerity is more important than making money. Again, not saying this is who you are, it just feels that way from the style and content of the book website. Hope this helps.


I feel this advice is well intentioned but fails to reflect the realities of building and marketing these types of niche products and courses.

Setting aside issues of style and what other sites are doing it's very clear what the book is about, what's included and what makes that valuable.

The prices certainly are steep compared a "book", but it really is quite a bit more (more like a self study course in ML) and really targeted towards businesses that need to make this happen within their organizations.


I don't know about his books(courses?), but his site PyImageSearch is one of the more well known sites regarding computer vision. I certainly got a lot of help from the site (Thanks Adrian).


Thanks, I'm glad you've found the site helpful! :-)


It's pretty standard landing page style these days for self-published books (plus ____), because it works.

If there was a way to tone it down for the referrals coming through Medium from HN (tough to track through 2 sites) or using some is-logged-in-to-HN hack (which would probably piss off people even more if detected) maybe it could be dialed back a hair to catch people like you, but it's probably not worth the effort.

The reply from the author seems to be perfect: https://news.ycombinator.com/item?id=15914307


I see. Rebranding as a course might help. If I were you I'd also consider A/B testing without the hard-sell/marketing hype style page. Personally I equate these sorts of 'You're probably wondering... "Is this book right for me?"' things as massive scammy red flags and a complete turn-off.


A/B testing is absolutely on my roadmap. I'm finishing up a few bonus chapters for the book/course now. Based on previous A/B tests on the site (for other products) I have a good idea of what works well for the general audience of my viewers. But I totally understand that the messaging will not work for everyone and it's something I hope I can address in 2018. Thanks for the comment.


You should absolutely rebrand as a course/training material, since it seems you're shipping much more than a printed book. As a researcher, experiment journals are really valuable, since in countless occasions I find the final, published articles / sample codes are 100% "draw the rest of the fucking owl" [1] material.

PS: Typo in the #release_bar: "has been offically released!"

---

[1] http://i0.kym-cdn.com/photos/images/original/000/572/078/d6d...


Thanks for pointing this out! Typo has been fixed. I also appreciate the suggestion.


Since Machine Learning via CNNs and MCTS became mainstream, is there any CAPTCHA that makes any sense today?


There are some tasks that haven't been solved well by machine learning algorithms yet. Like visual question answering. However I'm not sure how easy it would be to get them into the form of a CAPTCHA without making it easy for a bot to just guess and get it right a lot of the time. And even though they haven't been solved well, it's probably good enough to get past a CAPTCHA a lot of the time.

I expect these sort of systems will move to 'social proofs' - do you have a long-standing account etc.


Why not ask the kind of questions Cyc was designed for? Maybe it will spur research in that area.

https://en.m.wikipedia.org/wiki/Cyc


Great post, but could you not have saved a lot of time by generating CAPTCHA images with a single characters, instead of separating them after the fact?


They could have done that but then the data wouldn't have been the same as it would be when they were trying to solve real CAPTCHAs. The OpenCV part where they found the characters in the CAPTCHA leads to some messiness in the training data, which will also be there in the 'real' CAPTCHA data when the system is tested. I'd say training the model on this messy data would lead to better results, especially for the case where the letters overlap.


I think the point is to use a realistic example


One wouldn't even need to generate those 10k images.

Simply integrate the code together and generate on the fly. Much faster and simpler!


The problem is that CAPTCHA's have evolved. Some you have to click objects in an image, others you have to click something to prove you are not human.

Unfortunately, I think those mentioned in the post, will be a thing of the past.


And sadly it seems the bots have improved even more then the CAPTCHA's. Googles latest variation of "Click on all boxes that contain X until none are shown" takes a significant amount of time to complete. It also seems that the faster you are in clicking the images the longer the next image takes to load, often leading to me hitting submit only to have to start all over again since the next image still contained what Google was looking for.

Sometimes I think it might be easier to just run a ML algorithm to complete them for me...


To me it's very strange that someone is using some alternative to google captcha which has 2 obvious advantages:

1) it's not hackable

2) with every click you contribute to the driverless cars vision improvement.


> with every click you contribute to the driverless cars vision improvement

Interesting I hadn't heard that. Supporting links:

--neural nets captcha analysis--

https://spectrum.ieee.org/tech-talk/robotics/artificial-inte...

--obligatory XKCD--

https://xkcd.com/1897/


I think it's kind of obvious - most of those pictures you face in the captcha are bridges, cars, street signs, these are free training data for google cars.


Author did not explain how he created X_train, Y_train, X_test, Y_test.

I am pretty sure he was fast enough to create it in 10 seconds or less.


How difficult is it to include the separation step as part of the ML pipeline to make it end to end?


It can be almost as easy, but it can be made as hard as you want. These kind of Captcha exercises are a fun way to test ideas especially if you want to work on "attention" models (spatial Transformers, ROI, deformable convolutions, soft attention, hard attention). You can also try to "read" one letter at a time using a RNN. You may even test the new capsule networks for their rotation invariance. All these different network architectures, encode various strategies one can use to decode a picture. Obviously all will work at > 99% on the simple cases, but as you increase the captcha difficulty you will see that the more modern architectures can trade computation for increased accuracy.


this is exactly what the "you only look once" model does. It will give you bounding boxes and a classification for every object in the image it recognizes.

https://pjreddie.com/darknet/yolo/


I'll be impressed when they can decipher my grandmother's Suetterlin handwriting.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: