It used to be some benign OCR task, now it's "please identify street signs from photographs we've taken". Once they can correctly classify street signs, what else will they want? "please click on the faces you recognise?"? "please click on the faces of your friends that are labelled with the correct names?"?
I've started opting out of Google Captcha's wherever doing so isn't too burdensome. If I see a Google Captcha and I don't really want to get to the other side, I just close the tab and forget about it, rather than subject myself to their capricious gatekeeping.
Some people I know online have basically started buying "captcha tokens" --- the same ones that spammers use when they want to bypass them in bulk. Depending on your morals (keeping in mind these services are mainly third-world-country sweatshop labour), it might be a better choice than spending your own brainpower on such things, and a way of giving Google a virtual middle finger.
I use the Tor browser pretty frequently so I've gotten pretty good at these. It's much easier once you realize that there are three sources of data that determine correctness: basic computer vision, hurried/lazy humans, and a lot of bots. It's not about truth it's about following the crowd.
- Forget the slivers. If it's not more that 75% of the square forget about it.
- Anything not in the foreground might as well not exist.
- Poles != Signs If you saw pole sticking out of the ground you wouldn't call it a sign would you?
- It's not about whether it's a storefront, it's about whether it obviously looks like one. If you spend literally any time thinking about it it's a no.
- Mark the most prominent vehicle and skip everything else. Only mark a second vehicle if they're equally sized.
So that means we have to think like bots not humans. Otherwise it wouldn't have these rules and people would easily solve these. What's the point of captcha?
Aside from that, I'm pretty sure his very first sentence, "Apparently I am incapable of identifying street signs", is correct. He keeps clicking on traffic signs, not street signs.
I remember 15 years ago making a neural net captcha OCR cracker (for DNS registrations). And after about 3 hours I have my accuracy, recall and confusion numbers and I'm not really getting them any better. This was a captcha like from those old PHP libraries. Many colors (sometimes bright yellow on white GRMBL), varying background color, a few things on the background that were clearly not letters (like large circles).
Now in order to train this network I had to download 1000 or so captchas and solve them manually, so I had written a quick program that allows a person to solve captchas. So I figured "let's see how good it is", and used the program to test myself (I figured, worst case I have some more raw data).
Turns out the neural net was 2% better than I was. It actually scored 2% better on captcha solving than I did, and this was when I was trying really hard, pausing when there's doubt and giving it 5 seconds thought. I was a bit surprised that trying really hard or just going with my first guess didn't make more than a percent difference.
Then I got someone who had nothing to do with IT to solve 50 captchas and calculated ... precision was 8% below the neural net (also known as she'd do about 1 in 6 wrong, whereas I only had one in 10-12 or so. The neural only had one in 14-15 wrong. It was a pretty sizable difference).
And lastly I figured, this is not possible. I'm somehow messing this up. After all, I should have made that same 8%+ errors on the original set of 1000 captchas I downloaded, did the network really learn to read better than I can, while being fed close to 10% falsehoods* ? So I generated the set where it consistently errors. And ... Yep. It did. About 40% of those was just me doing the ground truth wrong. I fixed them, but it only improved the CNNs performance slightly.
* of course of those 10% wrong would still almost always have 4 out of 5 letters correct
I feel like users should be compensated for giving Google their time and mental energy since this is literally making users manually tag their machine-learning corpus for free.
I always feel dirty when completing them, for this exact reason. Most places I'll just leave, unless I absolutely must reach the website.
The way I see it, you have two options:
- Allow Google to extensively track all your activity online, or
- Do free work for Google
And Google's convinced many webmasters to embed their recaptcha, so no matter where I go I have to give something up to Google. At least I felt better about it when I was helping OCR books.
> Thats just how the internet work. Content must be paid for somehow.
Not true. Some (most?) of the best content on the internet is there completely for free, because people believe in it and put it out there out of their own pocket.
The internet is worse, because of the content that is paid for by the methods you describe, than the internet of the 90s when content was there by the methods I just described.
I do think there's some value in paying artists and writers for their work, with, you know, money, because those people have to eat too. But I refuse to buy into this narrative where "I pay you for content and you give me content" (a consensual transaction) is the same as "I request content from you and you pretend to give it to me for free while secretly violating my privacy, stealing my attention, and otherwise screwing me without my consent". These are not the same thing.
This is not "just how the internet works", it's how a few people have decided they want the internet to work because it's profitable to them, and we don't have to put up with it.
> The internet is worse, because of the content that is paid for by the methods you describe, than the internet of the 90s...
Yeah, when people comment that online content must be subsidized, I always find it a little ironic. They're contributing information and analysis to a discussion. Who's paying the subscription fee?
How could such a lifestyle, where they give away their mental energy for free, possibly be sustainable?
The frequent rebuttal of "but my comments are low quality and barely worth anyone's time!" is fun too.
Let us analyze this further. Suppose that Alice browses 4chan. 4chan uses a Google image-classifying captcha. Alice is viewing user-generated content posted for free, including 4chan advertisements. When Alice chooses to post, she completes a captcha for Google and her post is sent to 4chan.
Who gets paid for 4chan's content? Alice does not profit. 4chan's users do not profit. 4chan 'profits' via advertising revenue but they actually operate at a loss.
Who is left to get paid? Google. Google gets a human-completed captcha for its machine-learning projects.
I look forward to the demise of the Web and its replacement with a content-addressed alternative where your views will be technologically obsolete.
Alice profits by not having her post be drowned out by a flood of automated spam. 4chan's users benefit by not having to look for interesting content among a flood of automated spam. 4chan's operators benefit by not having to moderate a flood of automated spam.
If captchas weren't useful to 4chan, they wouldn't make use of it. When you see a system that's obviously not beneficial to anyone, it's useful to take a step back and think about the incentives that keep things as they are. Because if it's so obviously wrong, why hasn't it been changed yet?
I'm not sure how a content-addressed web would handle these problems, but there'll need to be some hoops to jump through before a significant number of other people become willing to look at your content.
The PlayStation store, where literally the point is to pay for content, puts a reCaptcha on log in, and (for me) it quite often triggers the "Please flag all images containing cars/signs/shop fronts" version.
I miss the days when you just had to help them digitally encode writing. Now I'm in the same "is the pole part of the sign" situation as this guy all the time.
Does anybody know if captcha's are triggered or are made harder by ublock? It seems to me like captchas are easier on my work chrome profile, which does not have any extensions since I only use it for internal sites.
I think they are made more difficult by when Google observes "bot-like" activity coming from you. When they started the "I'm not a robot" checkbox, I used to be able to just check the box and it would let me through, every time. Then, one day I had to download several dozen files, and each download had one of these checkboxes. The first 10 or so files, it let me through like it always did. Then it started giving me one CAPCHA screen for the next 10 files. Then 2 screens. Then 3, and so on. That was several years ago, and ever since then it always gives me a long series of difficult CAPCHAs every single time. I guess Google still thinks I'm a bot even after several years of solving these things pretty consistently (I'd say I have about 90% success rate).
For the record, I do tag boxes containing only sign posts, or only tiny slivers of signs.
Part of the reason these are so difficult is that it will intentionally fail you sometimes even when you're right to prevent malicious bots from learning off of it, also a decent number of them are not fully solved by Google (they show you both ones they know and ones they don't).
I get these all the time when using VPN/proxy. These are basically harassment for users.
They replaced the text captcha with these. Google knows that these stops humans from going in. But, they don't fix it because they need to harvest data. So, Google is evil. There is no doubt on that.
I think it depends a lot on your IP too. I tried to do some google captchas in TOR browser once and it took forever! Just like this video, even when you got it right it still kept asking for more..
Happens relatively frequently when using a VPN. If, for whatever reason, the IP I'm using is currently getting the captcha-hell treatment, shifting to a different exit address clears it up 95%+ of the time.
I recall that hackers were sending captchas to porn sites. "Want to see this picture, then solve this captcha." And then the answers were redirected back to the original, legitimate site.
Technically the sign support sticks aren’t signs. And some of the signs clicked on weren’t street signs, if you identify street signs as those put up by a government agency to regulate or assist with driving on a street. Also, he clicked a hotel, which isn’t a storefront. When Google says storefront they mean a multi-tenant store facing a street with a big sign indicating the store’s name or purpose. Sometimes these locations are closed, but they rarely look like hotels, houses or apartments. I have also had to pass, twice, by saying nothing was relevant and not clicking anything. I’ve also had it show me streets with cars and had it ask me to select storefronts... (Trick questions, maybe to see if I click wrong then change my mind?) And I’m much more likely to get it wrong on my iPhone than on a computer, perhaps Google doesn’t have as much mouse movement data that way, and can’t fingerprint an iPhone as uniquely?
Yes. This. And, is it a storefront when it's a lemonade stand outside a house? Is it a road sign if it's a for sale sign? Is it a road when it's the road into a carpark?
We are working on hashcash.io - replacement for captcha. no more frustration to guess what google meant, just computer proof of work and you are good to go
Please, do not drain my batteries, waste resources of my planet and make me (or my employer, or my coffee shop, or my train company, or my library) pay electricity to prove I'm not a robot.
Please, do not make Internet an energy hog more than it already is.
Your last statement theoretically could be addressed by tweaking the algorithm settings such that it consumes at most [the time it takes for a human to solve reCaptcha] x [average power consumption of a smartphone]. Not sure if it would still hold its purpose, though.
Looks interesting, and fairly attractive presentation... but how does it work? This is essential to describe, but are no details about that on your front page!
Also, your text needs some work; there are grammatical errors and some phrases could be improved. (I can help with this, if you want.)
So at the end, when he says despondently, "I'm not a robot!"...
Paranoid fantasy: it's a PhilipKDick/BladeRunner thing. You see, actually he is a robot (as, apparently, are most of us), and this test is designed to prove it. It's a precursor to the Voight-Kampff machine...
I suspect the bad guys using captcha won't go to the good low barrier captcha because there's no microcent transaction in it. It's a Gresham's law case: bad captcha with revenue drives out good captcha with low overhead
Hopefully more sites will upgrade to the new Invisible reCaptcha by Google. I am using them on my sites and it is certainly less annoying and the user experience is better.
Interesting side note, when Google isn't completely convinced you're not a robot, it'll actually swap the invisible captcha out for these photo picker ones.
Whenever I see this, I intentionally select incorrect boxes. Let the AI eat those incorrect data.
If sufficient people do this, these AI free loading stuff will stop.
I've started opting out of Google Captcha's wherever doing so isn't too burdensome. If I see a Google Captcha and I don't really want to get to the other side, I just close the tab and forget about it, rather than subject myself to their capricious gatekeeping.