This is irrelevant. Speech-to-text costs $0.006 per invocation (for < 15 seconds) [1], or you can solve 166 captchas for $1. There are already services out there which will solve captchas for $0.50/1000 [2], an order of magnitude cheaper. The fact that Google has a service which will do this inefficiently changes nothing about the threat/cost ecosystem. CAPTCHAs aren't about being a perfect defense, they're about increasing cost to operate at scale.
There are free speech recognition services, such as Wit.ai, and a number of others that have generous free tiers. Also, it's likely that spammers will not bother finding a free public API, and will just use the endpoints of Google Voice Search, or any other similar service from other companies.
The unCaptcha paper and the team's research is very much relevant, because it informs the public about the effectiveness of these security systems, and it helps website admins consider these threats and possibly adapt to them.
I would love to pay money (or do some sort of proof-of-work hashing) rather than solve Google's infuriating, privacy-hostile CAPTCHAs. I rather suspect that Google, as an advertising and consumer-surveillance firm, gets rather a lot of information out of the system.
I definitely preferred it when recaptcha was helping archive.org digitize books for the good of humanity. Felt much more altruistic than helping Google train neural network.
That's a funny, very accurate point. I would absolutely pay 10c to have a service automatically bypass Google's CAPTCHA for me every time I encounter one.
From your link, it actually says $2.99 per 1000 ReCaptchas so $0.30 for 100. Still cheaper than the automated method of the author.
Also, the automated method is probably more reliable than humans and much faster. And the cost of the speech-to-text API could be lowered by using cheaper services or an in-house model.
(Looking at other services, they all seem to agree on the $0.20-$0.40 range, mostly dictated by the hourly wage of their workers)
Funny story: I can no longer moderate Disqus on my site because it pops up reCaptcha, and I don't even attempt reCaptcha anymore. And I can't ask for support because it pops up reCaptcha. And I can't export my data and delete my account because -- you guessed it -- it pops up reCaptcha.
One of these days I'll gird my loins and go into battle to convince a bot that I'm not a bot. One last time.
Sometimes it won't accept correct answers either and you're just wasting time training their classifier for minutes. I found the audio challenge to be much faster to get past, so I've started switching to it a couple of months ago, then finally decided to automate it with a browser extension.
If you manage to force google to serve you the noscript version, it tends to accept correct answers the first time with no re-challenges. The javascript version of recaptcha won't let me through no matter how many times I give it correct answers.
However websites must specifically allow the noscript version to be used; by default it's disabled for all websites.
What do I have to do to force the noscript version to activate? At this point it's getting impossible for me to pass captchas, and the audio challenge is no help since I just get half a word or some other unintelligible utterance.
Sometimes disabling javascript using e.g. umatrix can do it, but only in cases where that website operator has elected to use the most permissive setting (this seems to be rare.)
4channel.org is one site that allows noscript captcha, you can try it out there. But I've rarely seen it possible on other sites.
They also punish you for blocking cookies, using first party isolation, fingerprinting resistance, etc. ReCaptcha v2 operates much like ReCaptcha v3 (which does away with the question/answer interrogation entirely) without telling users it works that way. v2 will frequently reject correct answers just to punish users who aren't sucking up to the Google surveillance system.
Is this related to the Google account? Because since switching to firefox and everything not-google I have been constantly harrassed by recaptcha everywhere.
I think it's mostly the google account (and not blocking access to its cookies), but (as an FF user) I feel that just being on FF instead of Chrome is also a strike against you.
I've seen this a week ago on reddit. The researched told google about this vulnerability and Google doesnt care about it, they are totally OK with it. You can see here that captcha doesn't block robots, but blocks people and makes browsing inconvenient. reCaptcha is a way google mines data from us for free.
> Google doesnt care about it, they are totally OK with it
Google hasn't said they don't care about it, where did you see any of that ?
They merely allowed the code to be released despite it still working against the current. Previous experience (namely the original uncaptcha) prove that they intend to find a way to fix it.
> You can see here that captcha doesn't block robots, but blocks people and makes browsing inconvenient.
Total BS, remember it's not Google that uses it, it's website owners (us), if what you claim was indeed true we wouldn't be using it, we would use something else that did what we wanted.
> reCaptcha is a way google mines data from us for free.
Of course, through the visual selection it displays when "unsure", although I do not know the detail it seems pretty obvious that once it's sure you're human it sometimes ask you to detect things in picture anyway so as to provide training data (for maps, waymo, image search, whatever ...)
I 100% agree but it's not like captcha are a complicated things or that we didn't all more or less switch to it from something else.
I can say for my current needs right now that if created a non-subscriber posting content page tomorrow I would use a captcha because it removes enough bot to be worth it, and I would go with recaptcha because I find it the better one for end users (I as a user prefer to see it on websites compared to other solution).
100% nonsense. When I use Firefox with 3rd party cookie blocked, it does not accepts my correct answers. Waste 5 minutes of my life to beg google to let me in.
Allow 3rd party cookie, log in to google and I only have to check a box.
Same when I use VPN, it does not accept any of the correct answers.
So when Google sees that I am trying to protect my privacy, it punish me by having to work for them.
One more thing. If try to use audio challenge in the first case, it directly told me that I am using some method to solve captcha and they won't allow it. So much fun.
I have literally never been asked to solve one of those image recognition recaptchas in my main browser profile. (While it happens once a month in incognito windows.)
So it's not at all obvious that known humans are being asked to solve captchas just for the purposes of training.
It barely asks you if you use Chrome and/or are logged into a Google account. ReCaptcha is how you make Firefox and IE/Edge users without a google account hate you.
Because believe me, if I get asked to click another 50 cars without good reason, (3 failed logins would be a good reason) I'll blame your site for being dumb and not google.
I have only been served image captchas since forever. I literally thought the warped text captchas had been phased out. I literally never see anything but image captchas.
I think the warped text ones have been phased out, but AFAIK it's in favor of the ones people mentioned above and some black magic for detecting humans without needing to click things.
(I work for Google, on nothing related to browsers or recaptcha, this is purely my impression from encountering it logged in and out.)
For the record, the one I meant are the second solving: sometimes recaptcha ask you one (I believe it is genuine), and then after you succeed it ask again with another set of picture (sometimes another question), which I believe is for training.
I have it semi-regularly (like once or twice a week); but I also have some automated tooling using my account AND I travel quite often so location testing probably flag we as weird.
Remember that the original recaptcha also did that with text to help train OCR (it would send a known word and a unknown word, if you succeeded at the known word it would record the answer for the unknown one, and after enough people gave the same one train it as the proper OCR'ed text).
I feel that this depends on how fingerprintable your browser is. Signed in to Chrome? You’ll likely see nothing. But logged out and using Safari’s Private Browsing? You will likely have to do it multiple times.
I have the feeling this is also configurable somewhere.
A few months ago, my bank added the image-clicking one to its login screen. I've always gotten past that one on the first attempt. But with the same profile on the same Firefox, all other sites always take multiple tries.
Google allows website operators to configure how sensitive they want v2 or v3 to be, but in practice it makes little difference. The only apparent effect on v2 is that the least sensitive setting permits the use of the noscript version. The least sensitive setting of v2 will still harshly hassle and punish users who aren't in compliance with google, unless they use the noscript version. Then and only then, it lets people through for correct answers every time.
The 'owner' of the recaptcha API key can set a slider on a scale from "Easiest for users (some security features turned off)" to "Most secure (all security features turned on)"
> They merely allowed the code to be released despite it still working against the current. Previous experience (namely the original uncaptcha) prove that they intend to find a way to fix it.
I'm using Buster[0] for this purpose, and it relies on the same method. Available on Firefox, Chrome and Opera in their respective add-on stores, and no additional steps are needed (like in this project).
From my experience, it works perfectly in a default session and not at all in the private browsing mode. I've never bothered to figure out why is that (possibly some other add-on interfering).
reCAPTCHA relies on things like Google cookies to lower the "user is a bot" risk score. Higher risk scores (such as when you go via a blank slate browsing session) result in more/more difficult challenges.
That's just code for "it rejects correct answers to frustrate you." If you manage to get the noscript version of the captcha with otherwise the same browser state it will accept a correct answer the first time nearly every time. Presumably this is because they didn't bother to implement their "hassle the user" code in the noscript version; it's probably neglected by google since it's disabled by default.
For instance, the sloooow fade in of challenge tiles... what legitimate purpose does that serve? That's not there to make it harder for bots. That's there just to hassle and punish real humans that google dislikes because they don't buy into the google 'ecosystem'. The more they dislike you, the slower the fade in gets. The fade-in can be several seconds long in severe cases.
I run a combination of uBlock Origin, Privacy Badger and Firefox's tracking protection. Can confirm, tiles take 5 seconds to fade in, I have to do 3-5 rounds of it, and unless it's really important I'll just tell reCaptcha to piss off.
I've built the same in the past to solve ReCaptchas and my question is:
Why on earth did they publish this?
I've kept it secret because Google will close this loophole and probably make it more difficult for disabled people to verify that they're humans. And Google is not dumb: They already know that speech recognition "breaks" their bot detection, just like screen readers - this is about accessibility. Publishing stuff like this will increase the pressure so they will be forced to "improve" their bot detection system - which simply means that even more people won't be able to solve those captchas.
Heck, some weeks ago I've tried to solve a ReCaptcha for literally 10 minutes! My answers were right, it was a matter of discrimination. My point is: My bot automation is able to solve a Captcha faster than a human being. This is silly and ineffective.
And about the people who've published this: they think they do someone a favor with this. But I can't see how it's in anybody's interest to release this into the public (especially on a site like HN where Googlers are reading).
If they would propose a better solution for website owners to secure their sites, fine.
But everyone who's talking about "vulnerabilities" like this makes it more difficult for real people to access the websites that they want to use. I know disabled people who can't solve those captchas - it's just too much of a hassle while it's easy for my bot automation to do it.
We should really ask ourselves what we're really trying to improve here.
I used to work at SoundHound and 3 years ago we had some weird illegitimate looking accounts using our Houndify platform.. turns out they were for breaking recaptcha. It was a bittersweet verification that our voice recognition was ahead of Googles but we had to put in protections against that sort of abuse so we weren't enabling spammers...
From what I can read on the twit and GitHub, the researcher hasn't proven this works at scale.
The point of recaptcha is blocking "captcha farms" or automated bots from abusively creating accounts, buying tickets, etc.
The author hasn't demonstrated that this attack is effective in those scenarios. The only thing he has shown is a very convoluted way for a human to solve a recaptcha (harder for 99.9% of humans than the standard recaptcha experience)
That would explain why Google didn't care about them publishing this.
Because the twit summarizes the interesting bit, while the project page not so much. Which is good to motivate people to learn more about the project, right?
Ironically sad that this protection is thereby most easily defeated by its purpose of accessibility. Pessimist in me thinks there is no good deed possible in this world.
You would think they could apply a hidden track at a certain frequency that is inperceptable to a human or a pattern of changes in volume to prevent it being passed through their API.
Of course the next step will be to resample the sound to remove the steganography...and the arms race continues.
Steganography prevents for people to use their exact service while making it more difficult in general prevents people from using any existing service.
I think a simpler solution would be to check if a text to speech API request resulted in the correct answer as part of validating a captcha submission.
Isn't the text based ReCaptcha pretty ancient at this point? Google's current version is either the simple checkbox (which I assume is checking various things) or the image based version where you have to click on traffic signs. I like to assume that's a live feed from a Waymo car and I'm saving lives.
> "Isn't the text based ReCaptcha pretty ancient at this point?"
That was v1, they shut it down in March 2018. You won't see it anymore anywhere.
> "Google's current version is either the simple checkbox (which I assume is checking various things) or the image based version where you have to click on traffic signs."
That's v2. v2 will present you with a simple check box if you're very compliant with the google surveillance system, or will present you will image challenges if you're not (or if it's just in the mood.) v2 is very capricious and will reject correct answers from users google wishes to punish for, e.g. using firefox, using adblockers, using resistfingerprinting, blocking google's cookies, etc.
The recently released v3 is the worst of them all; it does away with the image challenges of v2 completely. The user never interacts with it directly, never has an opportunity to persuade v3 that they're a real human by answering any sort of questions. It's nothing more than a measure of how compliant you are with google's surveillance.
> The recently released v3 is the worst of them all; it does away with the image challenges of v2 completely. The user never interacts with it directly, never has an opportunity to persuade v3 that they're a real human by answering any sort of questions. It's nothing more than a measure of how compliant you are with google's surveillance.
I did a very quick experiment with reCAPTCHA v3:
* using Firefox in private browsing mode (not logged-in to anything)
* with a VPN
* using uBlock Origin
* Do-Not-Track on, disabling 3rd party trackers
* Only went to one page and filled one form with garbage data
My score was 0.7, which is pretty decent I would say.
I did a similar experiment using Ghost Inspector (a platform for automating browser testing, something similar to Selenium, but not sure what they use exactly), and my scores were consistently 0.1.
I'm also a bit suspicious of Google, and have trouble with the fact that this is the only solution on the market, and it's free for websites to use. But I'm not sure your statement is entirely accurate judging from my very limited experience.
On Firefox, if disable 3rd party cookie, it never accepts my correct answer. I have to fight 5 minutes with google to prove that I am human. Same when I use vpn. Allow 3rd party, logged in to google account, no vpn and just one check.
Edit: If try to use audio challenge in the first case, it directly tells me that I am using some method to solve captcha and they won't allow it. So much fun.
I think nothing appears if my score is above google's (because they decides my mark in the end) pass mark. If not recaptcha appears which is what happens to me and I go through the same process.
My mistake. I checked now. With vpn, firefox, 3rd party cookie block I got 0.3 score which is too bad. Google logged in, allow 3rd party and vpn gives 0.9.
I get 0.3 with firefox right now when blocking their cookies, using first party isolation and resist fingerprinting. In the past I've gotten as low as 0.1 with a similar configuration.
This happens to me whenever I use the Galaxy Browser. First request is always "rate-limited". When I immediately reload the page, it goes through. I always thought it was a cheap tactic to have people install their app.
A refresh usually helps, and this "feature" isn't exclusive to Firefox.
For some reason, it is exclusive to the links opened from another apps and doesn't appear when accessing a link directly (via a refresh). That might narrow down your search a bit.
It happens to me often when I open a reddit link from twitter in an embedded WebView (using the bacon reader app) so it's unlikely to be a mozilla issue
"The team has allowed us to release the code, despite its current success."
First off permission was never needed to release the code. Second, Google's interest in captcha is not to protect websites but to further their machine learning algorithms. Unless the captcha mechanism was destroyed to such an extent nobody used it they will be happy to accept captcha requests from automated systems. Google may seem like it sometimes but it is not your friend.
[1] https://cloud.google.com/speech-to-text/pricing
[2] https://2captcha.com/, the first hit I found with the search [captcha solving serving price]
Disclosure: I work for Google on security and cloud, but not on anything related to captchas or speech to text.