Hacker News new | past | comments | ask | show | jobs | submit login
Google’s new reCAPTCHA has a dark side (fastcompany.com)
1091 points by ProAm on June 27, 2019 | hide | past | favorite | 542 comments



Google has been doing the same with reCAPTCHA v2 [1]. They are aware of the legal risk of outright blocking users from accessing services, so reCAPTCHA v3 contains no user facing UI, Google merely makes a suggestion in the form of a user score, so the responsibility to delay or block access and the legal liability that comes with it falls on websites.

reCAPTCHA v2 is superseded by v3 because it presents a broader opportunity for Google to collect data, and do so with reduced legal risk.

Since reCAPTCHA v3 scripts must be loaded on every page of a site, you must send Google your browsing history and detailed data about how you interact with sites in order to access basic services on the internet, such as paying your bills, or accessing healthcare services.

It's needless to say that the kind of data that is collected by reCAPTCHA v3 is extremely sensitive. Those requests contain data about your motor skills, health issues, and your interests and desires based on how you interact with content. Everything about you that can be inferred or extracted from a website visit is collected and sent to Google.

If you'll refuse to transmit personal data to Google, websites will hinder or block your access.

[1] https://github.com/w3c/apa/issues/25


Your comment adds a lot to the conversation, so I don’t want to be more contrary than necessary.

It’s nonetheless a shame that it’s so universally misunderstood how ad-supported megacorps make their money that even highly sophisticated users of the web still talk about the value of personal data (source: I ran Facebook’s ads backend for years).

Much like the highest information-gain feature for the future price of a security is it’s most recent price: ad historical CTR and user historical CTR (called “clickiness” in the business) are basically the whole show when predicting user cross ad CTR. The big shops like to ham up their data advantage with one hand (to advertisers) while washing the other hand of it (to regulators).

As with so many things Hanlon’s Razor cuts deeply here: if your browsing history can juice CTR prediction then I’ve never seen it. I have seen careers premised on that idea, but I’ve never seen it work.


> It’s nonetheless a shame that it’s so universally misunderstood how ad-supported megacorps make their money that even highly sophisticated users of the web still talk about the value of personal data (source: I ran Facebook’s ads backend for years).

That may be the case for some people, but that is not my complaint, nor that of many folks I know.

I simply don't care how FB, Google and other surveillance outfits make money. I don't care about marketers' careers or their CTRs. I don't even care about putting a dollar value on my LTV to them.

I care about denying them visibility into my datastream. It is zero-sum. They have no right to it, and I have every right to try to limit their visibility.

Why? None of your business. Seriously - nobody is owed an explanation for not wanting robots watching.

But I will answer anyway. It is because of future risks. These professional panty sniffers already have the raw material for many thousands of lawsuits, divorces and less legal outcomes in their databases. Who knows what particular bits of information will leak in 10 years, or when FB goes bankrupt? I have no desire to be part of what I suspect will become a massive clusterfuck within our lifetimes.

If you're correct that this data has so little value, then it is more likely it will leak. FB and Google are the equivalent of Superfund sites waiting to happen, and storing that data should be considered criminal.


If I could upvote this comment twice, I would. This succinctly summarises my views on the subject. We shouldn't have to justify _why_ we don't want our private information harvested by these companies. I would still feel remarkably uneasy even _if_ Facebook and Google were demonstrably benevolent citizens of the online world, but we've seen time and time again how invasive and malicious they can be. The fact that both of these companies have political ambition makes the entire situation much scarier. Count me out.


They have no right to it, and I have every right to try to limit their visibility.

That's entirely fair! But also: You have no right to use my website, and I have every right to limit your access.

Recaptcha is simply part of this negotiation.


Is that so? What about the webmaster who simply wants to combat bots using his page, is the extent of data gathering on Google's behalf just part of the deal? What if selling user data is against the webmaster's ethics? "Don't use it I guess" Sure, except that no one in the exchange was told the extent to which this data is used, or what for. Users of Google's Captcha aren't told about this exchange. I disagree entirely that it's a matter of voluntarily opting in and out of Google's domain. Their business model depends on becoming inescapable, and they're not being honest about how their services collect our data.


> Their business model depends on becoming inescapable

What will happen with v3 if I block gstatic.com? Will I be given the highest threat score?


Wait, now you have me wondering: If this is just javascript from another domain, what's preventing bots from proxying requests, intercepting this one, and replacing it with a dummy function that returns a "no threat" score?



So... the answer is, nothing prevents it?


When the API reports a failed verification, the webmaster knows that the response has been tampered with?


> You have no right to use my website

Of course.

> Recaptcha is simply part of this negotiation.

It is only a negotiation if I know it is there.


I'm sure it will be mentioned in the 40 page privacy and cookie policy that pops up on every website asking you to agree before continuing.


Which is not compliant with GDPR.


And Im sure it wont be explicitly stated, but simply rolled up under some paragraph as a blanket statement, something to the effect of:

"We track all of your activities and provide third parties the ability to do so as well - to provide a better user experience - and we may or may not sell or distribute the collected data at our own discretion and continued use of this site grants us permission in perpetuity. Further, should you decide to sue us, you agree to binding arbitration at a venue chosen by us, conducted by an arbiter of our choosing, in which case you promise to lose regardless of outcome. If you disagree, please leave the site now but just know, by being here and reading this, you have already granted us this power and we've mostly already collected what we needed from you. Thank you. Stop wasting our bandwidth now. Fuck off!"


> It is only a negotiation if I know it is there.

You're not going to get through a site with a properly implemented captcha just by blocking it.


The point is if you see a captcha you can actively decide not to use it. If the website has no such box then you can't decide not to use the captcha


That's not a sinister "land grab" by Google, that's a fundamental aspect of the web, predating the advent of JavaScript. You reveal your identity quite thoroughly to the individual hosting services.

And it's difficult to imagine legislating that away, as it's sort of fundamental to all network computing.


Privacy Badger will know it is there


You’re commenting on HN, you know it’s there.


I'm commenting on HN. I've been developing for the web for 24 years. I don't know how and when my data is collected or shared most of the time.

The idea that FB and Google are openly making a trade with users is ludicrous. I'm horrified that you either sincerely believe that there's a fair negotiation happening or that you don't care (given your employment history).


And what about the other 99.99999% of people that use the web? Do they also understand what is going on behind the scenes?


I'm here, and struggle to follow many of the threads on HN. As a father, I don't really see how I can effectively prepare my kids for a surveillance internet.


I didn't, but assume this is the case with everything. I mostly care about giving my data away for free (cut me in please), but none of my non-HN commenting roommates knew. Is their privacy less important than mine?


I do, and I can make an informed choice. Unless your website has a very eclectic audience, I’m not the only one using your services.


Except it happens on government-owned sites from the local to the national level where I have EVERY right to visit, especially as my tax dollars are paying for it and it’s for services that are available to the general public.


That's not totally true. If you provide access to your web site to people then in many places in the world you can't limit that access in a way that discriminates against protected classes.

In the US for example you can't set up your web site in a way that accessing it discriminates against people with disabilities.


That's what's great about GDPR. It makes privacy a fundamental right that can't be bargained away, much like you can't sign a contract binding you to slavery and you can't accept a bonus from your employer in exchange from losing your mandated breaks.


You don't have a legal right to limit access for the disabled, which is what services like reCAPTCHA3 are doing.


The bigger issue, for me, is for things like Facebook gathering data on third-party sites that I had no idea* were feeding the information back to Facebook.

An even bigger issue, for me, is having my face added to their facial recognition algorithms, despite never once tagging myself in a photo. Is there a way to opt out of this?


Recaptcha v3 is an invasion of privacy and a blatant violation of the GDPR.

It's as much of a negotiation as offering someone to pay through perpetually indentured servitude is: it's illegal and immoral.


This. Absolutely this!

Do I think Facebook/Google/etc are abusing my data right now? Probably not.

But do I think that large scale collection of my data could be abused in the future? Most definitely. If the Cambridge Analytica scandal has taught us anything, it’s that having access to this data is rife for abuse and often it might happen in unexpected ways.

And do I owe an explanation for wanting some basic privacy? Absolutely not. If a random stranger stopped me in the street and asked me lots of personal questions there wouldn’t be an expectation that I have to respond. Yet the likes of Facebook and Google seem he’ll bent on turning the discussion around when it’s data collected online.


> FB and Google are the equivalent of Superfund sites waiting to happen, and storing that data should be considered criminal.

I would think, that in the EU, under GDPR, collecting, transmitting and storing that data is in fact criminal, or at least subject to heavy fines. And under GDPR it won't help to just note the data collection in the TOS or ask the user for permission (under threat to not allow access to the service). So I really wonder how google plans to run this in europe.


Exactly! And this is my problem with Apple too - sure , they do some things right and are more conscious of "user privacy" than others, but at the same time they have also started abusing this to further spy on their users.

What use is "we are transparent with our users about the data we collect" when the user does not want you to collect the data in the first place? And they give you no option to opt out of such data collection? (And for what - just so that they can create a better ad network that can better exploit us with our own data?)

(And don't get me started on Safari spying and all their "anonymous" cookie collection crap without giving the user any choice in the matter, essentially forcing everyone of their users to opt-in to be profiled through their browsing history).


And you never know what those corporations could do with the data politically. Recently there has been a lot of talk about how these types of companies seem to favor certain politics. Who's to say that they won't use this data in the future for influence?


You could either stop using these services or (as I suspect) you find them too valuable to dismiss entirely quarantine them to a VPN/incognito interaction in less time than it took to type that comment.

I don’t want to single you out personally but there’s a broad trend on HN of bitter-sounding commentary on the surveillance powers of these companies by people who can easily defeat any tracking that it’s economical for them to even attempt let alone execute that reeks of sour grapes that a mediocre employee at one of these places makes 3-20x what anyone makes (as a rank and file employee) anywhere else.

Again, you’re not likely part of that group, but seriously who hangs out on HN and can’t configure a VPN?


> You could either stop using these services or

How do you stop using a service when you have little or no indication that it does something like this before hand, and afterwards the privacy is already gone?

If I use a site and view my profile page and the url contains aa account id or username and some google or facebook analytics is loaded, or a like button is sitting somewhere, how am I to know that before the page is loaded? What if I'm visiting the site for the first time after it's been added?

It doesn't even matter if I have an account on Google or Facebook, they'll create profiles for me aggregating my data anyway.

> quarantine them to a VPN/incognito interaction

Which does very little. I spent a few hours this morning trying to get a system non-unique on panopticlick, but the canvas and WebGL hashing is enough to dwarf all the other metrics. There are extensions to help with that, but for the purpose I was attempting, were sub-optimal (and the one that seemed to do time-based salting of the hashes wasn't working right).

So, I don't have any confidence that a VPN and incognito really does much at all.


> How do you stop using a service when you have little or no indication that it does something like this before hand, and afterwards the privacy is already gone?

It is small comfort for the average user, but the way you do it is use noscript. It makes the web awful, sure, but it won't happen to you.

> It doesn't even matter if I have an account on Google or Facebook, they'll create profiles for me aggregating my data anyway.

I sort of wonder what you envision this actually meaning. If I spam your website and you add a DoS filter for my IP, should I complain you made a profile of me? If when a user tries to log in I check the referrer to see if it contains a proper URL, have I violated your privacy?


> I sort of wonder what you envision this actually meaning.

I mean it to respond to the common response people sometimes give in conversations like these, which is "that's why I don't use Facebook" or "that's why I stopped using Google services". For this conversation, whether you use Facebook or not is irrelevant, they still gather your information, and in the same way myriad other advertisers (or however they bill themselves) do through online tracking. Google and Facebook are large, and have a portion that's easily visible, but they are not the whole problem by a long shot.

> If when a user tries to log in I check the referrer to see if it contains a proper URL, have I violated your privacy?

No. Noting which door a customer came into your store seems fine to me. That by default customers come in wearing the logo of the last store they visited is weird, but entirely something they can control. Having people shadowing all your customers while in the store looking and listening for tidbits they can report back on to get more info about those people is pretty creepy. As you suggest, the way to get around most of that is to dress blandly and say nothing.

Here's the thing, we're a market economy. There's a transaction going on, where we're trading away something (our information and privacy) to a company for some product, or possibly the right to view a product we might consider buying. How many people are actually aware of this transaction? If they aren't aware of the transaction, there's a name for that when it's a regular good, and it's theft (or fraud). The difference here is that most of our government systems don't apply any rights of ownership to this information, so our regular rules don't apply. I admit, they may not make sense to apply entirely, but at the same time, it's obvious that something is lost in the transaction, whether the person losing it realizes it at the time, or views it as important enough to make a big deal about when they notice.


> Google and Facebook are large, and have a portion that's easily visible, but they are not the whole problem by a long shot.

I meant more like in a literal sense, but okay. Point taken.

> No. Noting which door a customer came into your store seems fine to me. That by default customers come in wearing the logo of the last store they visited is weird, but entirely something they can control. Having people shadowing all your customers while in the store looking and listening for tidbits they can report back on to get more info about those people is pretty creepy. As you suggest, the way to get around most of that is to dress blandly and say nothing

These human metaphors are powerful, but don't map at all to basic analytics concepts. There is no person watching you. There is no intelligence judging you. There are a series of conditions in a deterministic system provoked by your actions. If we could have done this before now, we would have because it's a whole hell of a lot more ethical.

> Here's the thing, we're a market economy.

I dunno where you are but I'm in the US which is most definitely not "a market economy" without a whole hell of a lot of qualifiers.

> There's a transaction going on, where we're trading away something (our information and privacy) to a company for some product, or possibly the right to view a product we might consider buying. How many people are actually aware of this transaction?

Roughly as many, I imagine, as folks who realized the shopkeeper could see them enter and leave. Most folks know local proprietors can and will kick you out and put up a photo if you act up.

> The difference here is that most of our government systems don't apply any rights of ownership to this information, so our regular rules don't apply.

This is just flatly false. I don't know what you're thinking writing this, but it's clearly neglecting copyright and patents. For what it's worth, I think the later is a bad system an the former is in desperate need of reform to sharply limit it.

> it's obvious that something is lost in the transaction, whether the person losing it realizes it at the time, or views it as important enough to make a big deal about when they notice.

I am trying to read your comment in the spirit it was intended rather than the literal delivery, so please forgive me if there is a subtle impedance mismatch here but...

Welcome to the future, I guess? The top 50% earners of the world has access to computers that would have once bankrupted a nation to produce, and the options are still surprisingly good for the next quartile. With that power, it means that the people around you are going to start noticing things and making decisions about them with the information they can now process.

Ideally, this will be a distributed thing, but right now due to the nature of our society, authority of this sort is highly concentrated. But the dam has broken. A total surveillance system for up to a modestly sized city, with realtime tracking and long term data storage, is well within the reach of anyone with $10000USD to spend on hardware. They can self-host it. The banality of this cannot be overstated. It's boring to do this now. It's not new ground. So much so that average people can monitor their homes with it, or know if their friends have gone missing with it.

To some extent, there is just no undoing this. Society will have fewer secrets and those secrets will be much more deliberate, and the only response that can work is to change your attitude.


> There is no person watching you. There is no intelligence judging you. There are a series of conditions in a deterministic system provoked by your actions.

I don't think it's creepy because there's a (theoretical) person watching me, I think it's creepy because they're cataloguing all my actions in a systemic was which pierces the veil of perceived privacy (mostly through anonymity).

> I dunno where you are but I'm in the US which is most definitely not "a market economy" without a whole hell of a lot of qualifiers.

I'm not sure how to respond to this without a specific criticism of how you think it's incorrect. That said, it's somewhat tangential to the point, even if it would be an interesting conversation.

> Roughly as many, I imagine, as folks who realized the shopkeeper could see them enter and leave.

I don't know. If every time I entered my local 7-eleven someone picked up a clipboard, flipped to a specific page, looked back at me, nodded to their self and then marked something on the page, I might decide to go somewhere else, at least most the time. If I knew the info was shared with all the other 7-elevens, and the local grocery chain, and some hardware stores, that makes me want to use all the places less.

> This is just flatly false. I don't know what you're thinking writing this, but it's clearly neglecting copyright and patents. For what it's worth, I think the later is a bad system an the former is in desperate need of reform to sharply limit it.

I said "this" to qualify what I was referring to (personal information) and distinguish it from other types of protected information, of the type you reference.

> To some extent, there is just no undoing this. Society will have fewer secrets and those secrets will be much more deliberate, and the only response that can work is to change your attitude.

I don't think that's the only response that can work. It's the only one that works completely, as deciding to not care is always a solution to caring, if you can pull it off.

The alternative is new laws. Are they perfect? No. Will they solve the problem adequately? Likely not. Do they have a chance of making a positive difference across the board for massive amounts of people by empowering them with regard to their own information? I dunno. Maybe? I think it's worth pushing for though. Otherwise, why do we have minimum wage and labor laws? At some point we could have thrown our hands up and said "screw it" about that stuff, but people pushed for it, and while they aren't perfect, I think we're all better off for them.

I don't believe there will be any perfect solution to this ever, or even a good or acceptable solution all that soon. I do think it's still worth raising my voice over, because I think there are some possible futures that are better than others with regard to privacy and personal information, and I think that's worth pushing towards.


You use something that blocks scripts (like uMatrix) with an aggressive ruleset. On some sites you'll need to allow things to make them work. If they are loading trackers from the same servers that they load content from, you can't do much without wasting more time than you want. I'd say it breaks most of the tracking though.

More sites than you'd expect work without js or with first-party js only. It's annoying when you need to read a news site, because those are usually bloated garbage. Not a huge loss.


This was already with uBlock Origin. Also tried combinations of Ghostery and Privacy badger. All of it made very little difference for panopticlick, and that's probably a low-bar compared to what's common these days.


I don't care if every site that I browse using this VM knows that I'm Mirimir. I don't even try to hide that.

What matter is that my personas using other VMs, through other VPNs or Tor, don't get linked to my meatspace identity, to Mirimir, or to my other personas. And that's doable, I think.


Yes, and you go through quite a lot of effort to achieve that, given your other comment.

My main point is that the amount of effort you have to go through to achieve that is very high, and I wish it was considerable lower. There are technological changes that could help with this, and legal changes that could help with this.

I think a comfortable place would be if you visit the same online location using your main browser using one IP, and a private browsing instance of that same browser on another IP (through a VPN, proxy, or just new public lease), it would be nice if there was some expectation they didn't immediately have a high degree of certainty you were the same individual. For the general populace, this falls on its face.

Tor has quite a few mitigations to help here (e.g. simulated window/screen values), and Firefox has started to adopt some of them, but as mentioned here on HN frequently, Firefox sometimes has problems with CAPTCHAs and certain sites (I haven't had those problems, but I'm also not usually using it through a VPN), and I know Tor is sometimes blocked outright.

The point is that until most these protections (technological and hopefully some legal) are mainstream, completely protecting yourself is a double edged sword, since you also ostracize yourself from some sites and services. Tor is the equivalent of walking around in padded, baggy clothes and a ski-mask. Sometimes, like in the snow, it may seem fairly normal. Other times, like at the beach, it may preserve your privacy, but it's very uncomfortable and may cause people to avoid you, if not outright shun you and run you off. If everyone starts wearing masks and covering their hair, if you do the same you probably have a fairly high degree of anonymity and privacy through it.

In summary, I think Tor is a useful and necessary tool, but nowhere near sufficient for where I think we need to be generally.


> Yes, and you go through quite a lot of effort to achieve that, given your other comment.

That's true. However, it's mostly one-time effort. There are Linux and TrueOS workspace VMs, pfSense VMs as VPN gateways, and Whonix gateway and workspace VMs. All in VirtualBox.

There's ~no configuration required for the Whonix VMs. You just need to point the gateway VM to the pfSense VM that ends the desired nested VPN chain. And if there are multiple Whonix instances, rename the internal network that the gateway and workspace VMs share.

For the Linux and TrueOS workspace VMs, it's just like any OS install. You do have more machines to maintain, but mainly that's just keeping packages up to date. All of the devices are virtual, so you don't have driver issues.

Setting up the pfSense VMs is the hardest part. But once that's done, you can use them for years. pfSense is pretty good about preserving setup for OS upgrades. And there's a webGUI for changing VPN servers. But it's harder than using a custom VPN client.

So yeah, it's not so easy. However, someone could write an app that papered over most of the ugly parts. That even automated VM setup and management.


I assure you that a clean browser and IP will break any surveillance that I know about.


No, a clean browser and IP with the combination of what fonts I have installed, how my video card renders a canvas and WebGL instance (which may be affected not just by the video card you have, but the driver version used with it), my screen size, and a few other system level items that come through may or may or may not be enough to uniquely identify you. Along with linking to a prior profile if you screw up one time (or load a URL that has identifying information they can use), and you're busted.

So, sure, a clean browser and IP and never logging into a site you're previously visiting might be enough, but who does that, and doesn't that halfway defeat the purpose?


You gotta compartmentalize.

My meatspace identity uses a desktop that hits the Internet directly. It displays no interest in technical matters. Just banking, cards, shopping, general news, etc. It never accesses HN, or any of the other sites that Mirimir uses. Or that any of my other personas use.

Mirimir uses a VM, on a different host machine, and hits the Internet through three VPNs, in a nested chain. Some other personas use different VMs on the same host, connecting through different nested VPN chains. Some are Whonix instances, connecting via Tor, and reaching Tor through nested VPN chains.

So basically, each persona that I want isolated uses a different host machine and/or VM, a different browser, and a different IP address.


I appreciate the information-theoretic validity of your argument, but if you think that one of these firms cares enough about your buying preferences to burn enough compute to find that correlation then you either work for the CIA or are mistaken.


It doesn't take a lot of compute resources to have multiple profiles, and when evidence of a high assurance level (a referring URL that is known to designate a specific user of a major service) to link it with other profiles that also have that designation.

To me, that seems par for the course for any service that's generating profiles of browsing behavior and trying to make any sort of decisions based on it. It reduces cruft and duplicate profiles while also providing more accurate information. Why wouldn't it be done?

> the information-theoretic validity of your argument

The portion about canvas, WebGL and AudtioContext hashing is not theory at all, it's well known practice from years ago. Jest the other day here there was a story about some advertiser on Stack Overflow trying to use the audio hashing to tracking purposes.

Hell, if you get enough identifiable bits of entropy, you can probably assume weak to strong level matching using a bit-level Levenshtein distance that's low enough.


GitHub is always at your disposal. NV doesn’t sell the consumer cards to enterprises. So on AWS a multi-GPU box will cost you about 12 dollars an hour. If you can disambiguate, let’s just say 85% of profiles absent IP or cookies, well I think you just broke the academic SOTA and I’d love to make some calls.

Cheat sheet: you can’t.


I sorta understand the sentiment where there are tools like :

https://amiunique.org/ https://browserprint.info/

whose results would have you believe that one's footprint is very unique. I'd be interested in hearing more about why this is hard to implement into an efficient process.


> GitHub is always at your disposal. NV doesn’t sell the consumer cards to enterprises. So on AWS a multi-GPU box will cost you about 12 dollars an hour.

I don’t see how this is related to the claim, since it doesn’t solve the problem. But the advertising company that I let run code on my website will certainly do the job pretty well, I’d say.


I was pointing out that it’s a commercially applicable of a very strongly worded claim that I know would be expensive to test because I’m optimizing GPU intensive code at the moment. I don’t know where in this thread I generated so much ill will for trying to add knowledge to the conversation, but I’m not making shit up.


There are tools that will supposedly do this to a high degree of accuracy. Are you saying that they are fake/don't work as well as they'd want us to believe?


not much effort goes into de-duping customers? genuinely curious

I would've thought that would be a pretty useful exercise


> You could either stop using these services or ...

Are you serious? Have you tried not using their services? Try blocking Google Analytics, Tag Manager, ReCaptcha, fonts, gstatic,... What you will see is that you can no longer access much of the Internet. Want to participate in StackOverflow? Good luck if you block Google.

My beef is not with them trying to find my data when I'm on their site(s). They are however everywhere, on almost every site I visit. Coupled with their (impressive) technical provess it is beyond creepy, and there is simply no way one can avoid them.

I don't know what the solution is or will be, but as far as I'm concerned, this should be illegal.


> Try blocking Google Analytics, Tag Manager

Blocking those two doesn't seem to break much, does it? I have uBlock Origin and/or Privacy Badger block them everywhere.

ReCaptcha on the other hand…

Just this week I needed it to complete the booking of an airline ticket and just now buying a high chair for my son. And today I've completed the blasted thing ten times in a row because of a game installer that was failing at a certain point (GTA V's Social Club thing); each attempt to figure out what was wrong meant completing the ReCaptcha again.

Fire hydrants, parking metres, pedestrian crossings, road signs, hills, chimneys, steps, cyclists, buses — that's what the internet looks like in 2019.


Unfortunately politically acceptable regulation only deters new ventures because it makes the costs of compliance too high.

The right vehicle for this is antitrust, but if you think you can sell that in this climate then I’ve got a great deal for you on the London Bridge.


The costs of compliance are not too high. Compliance is actually ridiculously easy for new companies: they need to collect only the data they need. That is all there is



Yes. Your point? It’s actually ridiculously easy to be compliant with GDPR.

Edit: That is, ridiculously easy for new companies. Incumbents have been hoarding data for too long and it was actually harder for existing companies to become compliant.


If you don’t think that lawyer fees scale linearly with regulation complexity you’re either an early Uber employee or mistaken.

When you’ve built a social consumer business in Europe that is profitable after compliance, send me a term sheet.


I enjoyed reading what you said as a different perspective on the backend of ad technology vs privacy up until this comment thread.

I didn't build a profitable social consumer business in Europe after compliance, but I was part of a team that implemented compliance for a long existing company within the US due to them having clients and client's clients in Europe. They're profitable. Do you want my term sheet? Or are you weakly attempting to flex while complaining that people's basic right to privacy is preventing you from earning obscene amounts of money?


As I’ve mentioned I think elsewhere in the thread I left that business in no small part because it didn’t feel right to be in anymore. It was at a significant cost. I’m really lost on where in the thread I started to sound like a shill for business practices I (knowledgeably) don’t care for.


What do you estimate the implementation costs of GDPR are? I've seen some research that put the numbers in the 10's of billions IIRC

It feels like a regulatory moat for the big players who can afford it. Sorta like a complex VAT policy.


Those numbers are for existing companies who have been hoarding and selling user data with utter disregard to existing laws and user privacy.

If you do everything right from the start, the costs are minuscule.


Why would you need lawyer fees? How is GDPR complex?

It literally is:

- you only store data you require to run your business

- you delete data if customer requested deletion

- you give the customer their data if they ask for it

If your profitable business is built upon selling customer data wholesale to third parties, then good riddance.


Google and Facebook et al stores and processes PII on non-customers, without informed consent given from users.

It's still early days. We'll see what will happen when the DPA's and the courts have fielded a few high profile cases.


This! I hope it costs them dearly. I have never (willingly) given them consent to have my data, yet I know they have loads of it, just because other people I know are careless with data about me.


> You could either stop using these services

No you can't. Facebook creates shadow profiles for every single person in the world. If any single one of your friends has WhatsApp, Facebook has your phone number. They have your phone number and the entire address book of your friend, who probably has friends in common. If two of your friends have WhatsApp and they both have your number...

You see where I'm going here? There are pictures of me on Facebook that I did not put there. From friends or friends of friends.

I'm not even scratching the surface of what Google knows with GPS and WiFi connections.

No one consented to any of this bullshit.


There’s a reasonable argument in there, but it applies to any world in which digital cameras are cheap.

This is in a sense the worst kind of argument: superficially correct but really meant to tap into a popular groundswell of sentiment.

The question isn’t “can FB use an off-the-shelf CNN to identify me personally” but rather:

“If it weren’t FB who would be doing it instead?”

and:

“Should cheap digital cameras be illegal?”


> The question is “If it weren’t FB who would be doing it instead?” [...] “Should cheap digital cameras be illegal?”

Those are a complete non-sequitur.

Facebook (and Google) analyse every single photo that goes through their system with state-of-the-art ML (it's so good that it almost beat humans at matching faces ~5 years ago). This is a scale of surveillance which the human race has never encountered before in our history[+], and is a serious problem that we (as a society) need to make a decision on. In many countries, car license plates are OCR'd and automatically tracked whenever they travel on almost any main public road. Facial recognition in public places and on public transport is becoming a prevalent problem. And wearing masks is illegal in many countries -- meaning there is no way of "opting out" of the pervasive surveillance in the physical world. None of these things were nearly as commonplace (or even technologically plausible) ~30 years ago.

Cheap digital cameras are a completely unrelated topic. And if such large-scale surveillance was made illegal then nobody would be doing it legally, and those doing it would be held accountable for the public health risk they pose. We don't let people build buildings with asbestos any more.

[+] The Stazi and KGB only really had filing cabinets for tracking people and physical surveillance measures. The Gestapo didn't even have that (the Third Reich had census data which was tabulated using IBM machines in order to track who was Jewish within the Third Reich).


I think you overestimate the degree to which SOTA computer vision is applied to a lot of images online, and I think bringing East Germany into it is pretty out of line.


I think you're feigning offense to avoid addressing the substance of his comment.


There's a very good reason to consider negative outcomes of the past in discussions such as this. Let's pretend companies like Google and Facebook are totally on the up and up; pretend the company that aims to facilitate a user tracking search engine for China that is doing things including literally blacklisting searches such as "human rights", is on the up and up.

The reason what these amazingly benevolent companies are doing and collecting matters is because the systems we build today are precisely what will power the dystopias of tomorrow. As the GP mentioned, Nazi Germany used census data to select and track their victims, aided by some primitive computational technology built for the Nazis by IBM. In spite of how primitive all of this technology was, it ended up being quite effective at enabling them to achieve their ends.

Now compare this to the systems we're building today. Genuinely bad people do, and will, manage to take power in any system. It's not a question of if, but when. And these systems that we're building will be at their disposal. It's the same reason that in politics if you're considering granting the government more power you shouldn't think about today, but about tomorrow. Not do I want "this" administration to have those powers, but do I want future administrations - whom I will vehemently disagree with, to have those powers?


Most people here can avoid the impact of climate change - do you think we shouldn't talk about that either?

These are societal problems. It's good to care about people beyond yourself, and to talk about the professional ethical responsibilities of software engineers with regards to corporate mass-surveillance.


How about our friends and family? Should we configure a VPN for them too?

Btw the argument you just made applies to any form of surveillance or censorship. Just because your can still find functional VPN services for China, is China's great firewall OK?

And what happens when web services start blocking VPNs?

Netflix does it quite successfully. And I'm sure Cloudflare could provide such a service for free.


Just so you know, that commercial VPN is almost certainly spying on you.


I’m not making a moral argument for the surveillance state, I wear Curve25519 on one arm and the word “citizenfour” on the other.

I agree that there is a vast and almost impossible to regulate overreach by these companies. Your argument is extremely compelling.

But when HN users complain about being spied on I smell a FAANG rejection letter.


> But when HN users complain about being spied on I smell a FAANG rejection letter.

You’re projecting Ben.


> when HN users complain about being spied on I smell a FAANG rejection letter

I work at a FAANG: here’s my complaint about being spied on.


So your comments should be at the top. You’re knowledgeable on the subject and even if we disagree that should be upweighted in a perfect world.


:/


People care about others, not just themselves.


Unless the topic is affordable housing, that is.


I agree, but search “HN levels.fyi” to understand that we’re in the minority on that.


If you think their argument is compelling, why are you insulting them?

(This is a genuine question. Many of your comments have added to the discussion and I've upvoted them. But I've also downvoted many that haven't.)


> reeks of sour grapes that a mediocre employee at one of these places makes 3-20x what anyone makes (as a rank and file employee) anywhere else

This is not an argument and moreover not even true: there are companies that pay well and don’t collect reams of data on their users.


You didn’t address my argument and unless you’ve been on more comp committees than me then I would annotate that as sources needed.


What argument? What possible position could "if you can't configure a VPN, you're probably mediocre" be an argument for?


Like I said: it’s not an argument, it’s an attack. Plus I’m sure that there’d be many people here able to counter your claim regardless of the compensation number you drop.


A VPN will not help you against advanced behavioral browser fingerprinting like in this new Captcha. Not only do they have lists of VPN servers anyway, if you inadvertently log into your Google account once from the VPN (e.g. by launching your browser from your normal account), then the VPN IP(s) will be forever associated with your account and normal IPs, and they already know from the Captcha data that you're one and the same person. All the VPN does is adding the information that you sometimes use VPN servers of company such-and-such.


This is a ridiculous argument. Advanced technical competency can not be a prerequisite for maintaining personal privacy.


We’re on a site premised on entrepreneurship, and you’re pointing out what sounds like a big market gap. I angel invest now and then, if you have a plausible way to make two billion people care about something that we agree could be better my email is in my profile.

Even from the inside I didn’t see a way, but I’ve been wrong before.


Yes, looks like the industry cannot solve that problem alone, just like the electricity and chemical industries somehow didn't achieve clean air and water out of the goodness of their hearts. Another market gap. Or, wait, a case for government regulation.


Your HN profile appears to be blank, actually. Is this you? https://github.com/benreesman


It takes less time to lock my front door than to configure a VPN, but burglarizing an unlocked house still is and should be illegal.


> or (as I suspect) you find them too valuable to dismiss entirely quarantine

You are wrong. I block the known IP blocks of the big surveillance shops and a lot of the small ones[1].

> sour grapes that a mediocre employee at one of these places makes 3-20x what anyone makes

Are you sincerely saying you believe people who are uneasy about surveillance are just jealous?

[1] Twitter is currently an exception, I was playing with something. But I'm going back to blocking them soon.


> sour grapes

seriously?


search “HN levels.fyi”


I appreciate your comments in this thread. But could you please stop baiting people on this point? If there's one thing I've learned from running HN it's that the generalizations about the community that people come up with are invariably wrong. They're overgeneralized from a small sample of what the generalizer happened to notice—and since we're far more likely to notice what rubs us the wrong way, the results always have have sharp edges. In other words, people remember most the things they most dislike, then tar the whole with it. To borrow your phrase, the actual TLDR is less interesting.


Thanks for the mild rebuke dang, I think you do a great job meta-moderating this community.

I wish I had stayed out of this from the beginning, I see no merit in arguing about whether HN has some themes. I’ve been watching it daily for a long time as you can tell from the age of the account.

If you want to do something that would be both a good call as a mod and a favor to a longtime user, just whack this whole thread. I was trying to chime in with some knowledge but just wound up pissing everyone off.


I have to say I strongly disagree—I thought your contributions were excellent, and HN lucky to have you contributing on a topic that you know a ton about. If I contributed to your feeling otherwise then I wish I hadn't posted!

One thing I can offer from years here is: never underestimate the silent readership (I'd say silent majority but...associations). The vast majority of readers don't comment and most don't vote either. It doesn't mean they aren't following and getting a lot out of what you wrote. Usually it's only the most-provoked segment of the long tail that is motivated to respond. That's fine, it's the cycle of life on the internet—but it doesn't represent the whole community.

Please comment more.


> Again, you’re not likely part of that group, but seriously who hangs out on HN and can’t configure a VPN?

Recaptcha tracks users / devices, not IPs. A VPN won't help, it'll only lower your score. At that point: not allowing them to track you just means you can't use large parts of the web.

"You don't want that GPS tracker installed into your skull? Well, we won't force you, of course, but public transportation, government services and most grocery stores can only be used by GPS-skull-people"


As an aside, I read recently that gps is a system where your device reads signals sent from space and does not reveal your location. Very neat.


Wild speculative hyperbole hurts the case of people like you and I who care about doing something positive on the ground today.


Is it though? I'm somewhat lucky, because my government is generally technologically behind and loves literal paper trails, but yours isn't. Plenty of .gov sites use recaptcha. Sure, you can still visit those sites, it's just that, unless you pass a captcha test, they can't verify that you're actually a person (and not a Russian bot) and can't let you do certain things. If you want to use those government services, you need to allow Google to track you, or maybe they'll add a "sign in with Facebook" option so you have a choice.

With invisible captchas, you can't even sit down and solve a higher number of riddles to prove that you're really human and know what a fire hydrant looks like even though you look kinda strange. If Google doesn't believe that you are human, tough luck. Unless you have a personal connection or a solid Twitter following that an amplify your concerns, nobody at Google cares. Does your government care? It makes their life easier and normal citizens never really had problems with it.

DHL makes me solve a captcha to login and buy postage stamps. There probably are, or will be, public transportation companies that use recaptcha. It helps them to combat voter fraud (crime, abuse, election meddling, fake news, lots of things) if they know where (on the web, for now) you've been in the last 6 months.

You don't like the "implanting" part, because that's unrealistic? Just wait 20 years, and it may not be your head, but an RFID chip in your hand (yeah, those exist already). Until then, carry your gps tracker around and install their software on it, so it can collect data on your behavior to make sure that you're not a criminal.


It is not "wild speculative hyperbole" not to give the benefit of the doubt to companies that have repeatedly demonstrated that they are not entitled to the benefit of the doubt.


GPS tracker installed in people’s skull sounds hyperbolic to me.


How about peoples pockets? Its hyperbole, but not a huge reach.


I think it's worth pointing out that the comment you replied to didn't mention money, advertising, or CTR. People are concerned about data collection for more reasons than that. You've seen these attempts and entire careers about it without "juicing" CTR, so perhaps that isn't the true intent.


I admit that I inferred the proposed intent for grabbing maximum personal data, but if you’re interested in anecdotes from the trenches: no one below senior director level gets a couple million in stock for any other reason than they pushed CTR by a few basis points. What I was trying to say is that seen through the lens of mechanism design no one is incentivized to query the like button table because there’s no upside in it.


I'm not sure I understand correctly. Are you saying that all the personal user data is in reality not as valuable as everyone says it is? That is, all those megacorps are collecting terabytes of mostly useless data?

Then why is this data collected and archived in the first place?


I was never involved in those decisions but I suspect that when you’ve got a multi-dollar CPM and your biggest pain in the ass is pouring concrete and running power fast enough that a few PB of spinning disks are cheap enough that you hang onto it in case you ever find a way to make it useful.


That sounds logical. It’s also exactly the reason many of us don’t want to give up our information to these companies. There is absolute uncertainty as to how it will be used in the future.


Because it costs practically nothing. If the cost is zero and the expected value is greater than zero, then no matter how little value it has it's still rational to collect it.

The problem is that the individual bears a very small risk of something very bad happening: "consider the hypothetical case of a gay blogger in Moscow who opens a LiveJournal account in 2004, to keep a private diary. In 2007 LiveJournal is sold to a Russian company, and a few years later—to everyone's surpise—homophobia is elevated to state ideology. Now that blogger has to live with a dark pit of fear in his stomach."

https://idlewords.com/talks/haunted_by_data.htm

The individual of course gets no benefit from the small chance the company can monetize on this data trove. So even though chances are they aren't harmed at all by this data collection, arguably the expected value of the benefit/harm to the individual is negative (harmful). But that doesn't change the data collector's calculation, of course. That's why government regulation is necessary.


Yes, that's exactly what he's saying.

It's mostly all collected because it's easier for them to collect it than not to collect it, and nobody is stopping them from collecting it.


The fact that so much potentially sensitive data exists in a few repositories is in itself a bit foreboding. Who knows what companies will be able to glean from it one, five, or twenty years down the road?

My behavior on the web being tracked by corporations with little incentive to do right by me is worrisome.


I’m more concerned that they’re designing the next version of the Web right under our noses than that they know what kind of sneakers I’m 8% more likely to buy.


However, I’m concerned that the data also allows them to see that I am 93% more likely to vote for a certain political candidate, 22% more likely to contract a chronic disease in the next ten years, and 16% more likely that I will have a friend that homosexual.


I'm not thinking about ad delivery, I'm thinking about behavioral analysis. Knowing how a person thinks and acts can be a very useful weapon in the wrong hands, and FB and the like have done nothing to make me think their hands are the right ones (I don't think any are really.)


I'm not sure where to add this comment, but I just wanted to briefly say that I appreciate your contributions to this topic. Both in terms of content and tone/delivery. These seem like constructive and valuable comments to me, so thanks!


What other reasons do you think the original post was implying for Google to collect all this data?


I think the implication was that the leadership is hanging onto all that data because of an immediate fiduciary obligation. I suspect that it’s more in the nature of when you’re running a business in which a few hundred million QPS is slow that you archive in case it ever becomes useful.


Unless the point of your comment was to deny that Google is collecting this data at all (because, according to you, there's no financial incentive), I don't see the relevancy of your criticism. The complaint of the top level comment was that Google is collecting extremely personal data on us. Your response is that Google doesn't have an immediate financial incentive to do this. If you're not actually denying that Google collects this data, why does that matter? For most of us, the fact that our personal data has some financial value to a corporation is irrelevant to the fact that we don't want them to have it.


That's the annoying part of it. They try to collect everything about me, down to my favorite color and the brand of tea I am drinking, and they can't even deliver a semi-relevant ad. Best they can do is to bombard me with shoe and riding classes ads for 6 month after I search for "weight of a horseshoe" and stuff like that. They kill the privacy, they make 99% of the sites unusable without an ad blocker, and at the end it doesn't even amount to them making relevant ads...


If I were you I’d be more worried that Google de facto controls whatever we’re calling HTTP these days than that they have a BigTable entry that ties a browser you once used to a preference for Earl Grey.


This is a false dichotomy: I don’t see why I can’t be worried about both.


do you deviate from social norms in any way more significant than tea preference? Have you ever? No extreme political opinions? No fetishes? There's really nothing in everything big tech companies have about you that would be worse for you to have leaked?


I am worried by both. In fact, I am worried by more than these two things about Google, but listing them all would probably take this discussion way off course.


No offence, but your posts in this thread appear to be projections, and they derail the conversation.

The main topic we discuss is corporate surveillance. We are concerned about all the personal data that leaves our control. We are worried that evading this type of surveillance becomes increasingly difficult.

Some HN users may know how to mitigate these risks, but most people may not know how to defend themselves against corporate surveillance.

This is why me must speak up now, and not just for ourselves.


I am falling behind replying to all the comments that this has generated.

For the record I am inked all over with anti-equation group stuff: I agree that these companies are too big and powerful (and I would know).

I just don’t see a solution with the present judiciary. If anyone has a bright idea my email is in my profile.

I will thank you all in advance for not shooting the messenger.


Your HN profile appears to be blank. I think I found your email on GitHub though?

For the record, many of your comments here have been thoughtful, and I've upvoted them. I've also downvoted many where instead of responding to other people's thoughtful comments, you just insult them instead. Those are also the ones that other people seem to be downvoting. I don't think anyone is shooting the messenger here.


I am not experienced at replying to several comments a minute, I’m sure I made some errors in judgement during that process, but I think this thread is as active as it is because people want to know how this shit works, not because I’m the apex troll of pushing people’s buttons. FB and Google are in a dubious market position here but they take a lot of flack on HN for how highly they pay and how hard the interview used to be.


Can you provide any evidence that personal data doesn't improve CTR prediction for companies like Google/Facebook?

You state yourself that Google/Facebook publicly claim to advertisers that personal data improves CTR prediction. So I have a hard time believing that personal data isn't useful.


I’m already on a shaky limb being so candid about how the business actually works. If you want the opinion (albeit a little dated but still relevant) of someone who doesn’t give a fuck about who the truth pisses off I recommend a book called “Chaos Monkeys” written by a former YC (exited) founder.


> If your browsing history can juice CTR prediction then I’ve never seen it. I have seen careers premised on that idea, but I’ve never seen it work.

Isn't demographic targeting exactly that, based on your browsing history? Will showing an ad for a car wash have the same CTR for people that liked car products as for people that did not like car products? Or is your point that it still has to be a human that inputs "this is about car things, please show it to people that like car things" and it's not a magic AI that optimizes it automatically? And in that case: isn't that just a matter of time? Build the profile today, build the tech that uses it tomorrow?


Part of the point of objecting to big data surveillance is that we ultimately don't know how it's being used, despite what companies claim about its use.

Can I believe you? ...even if you're telling the truth, big corps can hide their most malicious practices from most of their own employees.

To me, it doesn't matter how e.g. Facebook actually uses my data today, because even if they're telling the truth they could change their policies tomorrow, or get hacked, or some third party (incl. the gov't) could get hacked, etc. It's better as a user to try and prevent such data from ever existing in the first place.


> if your browsing history can juice CTR prediction then I’ve never seen it

That's great. So if my browsing history is useless then you won't mind not trying to snoop on it.


Anectotally, I keep no browser history and do not feel my experience with captchas is different than a user who does.


I would contend that the reason for that is that none of the engineers involved get paid more if that experience is different.


> source: I ran Facebook’s ads backend for years

Why would anyone ever trust a goddamn thing you have to say about their data?

Unless they pay your salary and are asking you to give your expertise on hoarding and abusing user data, obviously.


We've banned this account for repeatedly breaking the site guidelines and ignoring our many requests to stop.

If we allow users to harass and attack people who have genuine expertise for posting here, does that make HN better or worse? Obviously worse. Mob behaviors like this are incompatible with curiosity.

https://news.ycombinator.com/newsguidelines.html


Me spilling tea about the business is far more in the spirit of a whistleblower than a shill.

I have nothing to gain and everything to lose by shedding light on one of the most powerful entities in existence.

But TLDR it’s not as interesting as people like to think.


You can gain internet points on a social website…


Check my join date, I have less than 500 points. If I was just like “foo is evil” every time foo came up up I’d have like 10^3 more.


> Since reCAPTCHA v3 scripts must be loaded on every page of a site, you must send Google your browsing history and detailed data about how you interact with sites in order to access basic services on the internet, such as paying your bills, or accessing healthcare services.

> If you'll refuse to transmit personal data to Google, websites will hinder or block your access.

I wonder how true this really is. 20% or so of web users have ad blockers, and most ad blockers block scripts like Google Analytics out of the box. It isn't hard to see that most of them will not make exceptions for a new Google tracking script. So any site that does any kind of testing at all is going to see that ~15% or so of their users drop off if they block users who don't have a reCaptcha v3 score. The only sane business decision in response to this is to go with some alternative.

(Of course, there will be some sites that continue to block users, it's just that they will mostly be the sites that already block users running ad blockers.)


Even UBO doesn't block ReCaptcha by default, so I don't see Rv3 being added to easylist anytime soon.


It doesn't block it because it's generally not active on all pages of a site. The description of v3 sounds more like Google Analytics and will probably be treated similarly.


I find a v3 block to be unlikely for the same reason v2 isn't in easylist - too much friction for the list users. Websites will likely break completely when performing actions if they don't receive any sort of verify token from the browser. It would probably be best to have a list for recaptcha v3 as a built-in optional filter so that users know they've enabled it and know why websites might be breaking.


On the other hand, the lists haven't removed ad / tracker blocking for sites that block ad block users. I still see "disable adblock to view this site" occasionally. I think leaving rules in place that result in blocked access to sites, but letting through Google tracking scripts on every page because it results in the same would be a huge misjudgment on the part of the list maintainers.


I'm not sure there's a significant legal difference in the end. If someone could demonstrate that alternative browsers regularly get a lower score than Chrome, that seems like a pretty good antitrust case.

Or were you referring to the risk that individuals would sue Google for getting blocked from random, potentially essential websites?


Not GP but they most likely meant the second. The V2 prompt blocks people from accessing services, which could be construed as damages at scale.

You do bring up a good point about the V3 being potential antitrust issue, but that has always been a potential problem even with earlier versions of recaptcha. With V3, it's also deferring the liability to the webmaster. The action that the website takes with the score is up to them - in the end it's just a number.


From the service provider and devops perspective I find reCAPTCHA beautiful. It brings down malicious form fill, form spam, user creation and password brute forcing rates.

Also as a VPN user, I found out that migrating to more expensive, higher grade VPN, solved a lot of my problems.

In the end it is not privacy, not your VPN that matters from the service provider point of view. It matters that your IP address is spewing malicious garbage. I do not want to spend time sorting it out, as I can focus my activities to revenue generating tasks. Harming some cheap VPN users in the process is collateral damage, but I rather take it than build a form with a perfect attack mitigation and 10x cost.

I hope to see some alternative for reCAPTCHA that does not come with such a strong privacy oriented risks. hCAPTCHA https://www.hcaptcha.com/ seems to be interesting, also monetization point of view. But they are not yet well established company and I do not know what other risks their approach would bring.


I don't even use a VPN and have lots of issues solving google's captcha...


Potential other causes

- Your ISP is a source of a lot of malicious traffic

- You have some browser extension or other adjustments that makes it harder to analyse you as a genuine web browser

For example, using a browser automation like Selenium testing triggers "hard" reCAPTCHA. Not sure if this because of some automated API that Selenium exposes, or just because your browser profile looks virgin (no cookies) without any prior reCAPTCHA solves.


I use pretty standard extensions... uBlockO, decentral eyes, smart referrer... I just wish that companies would stop using Google's reCAPTCHA service.

Also my IP address rarely changes and I don't think that any malicious traffic is coming from it.

And I have Comcast, so I hope that they didn't blacklist all of us...

(I did talk bad about Google a few times though, maybe that's it)


Those aren’t extensions that an average user would install.


Just Smart Referer alone is a likely culprit. Masking or having no referer is a prime attribute for low-level bots.


oh... so i should not be able to use any websites because of the extensions I use?


You should not be able to use any website that the host doesn't want you to use. That seems pretty straightforward. There's a strong correlation between profiles that look like yours and bots. Why should the web admin do free labor for you to put together a sufficiently nuanced bot-detection system to tell the difference, when the one they have is clearly good enough for them?


Stop using smart referrer. It has no legitimate purposes. Referrer URLs are not the problem. You look like a bot and you going to get locked out of sites.

If you're actually concerned about that kind of data leakage, you want NoScript, full stop.


> Since reCAPTCHA v3 scripts must be loaded on every page of a site, you must send Google your browsing history and detailed data about how you interact with sites in order to access basic services on the internet, such as paying your bills, or accessing healthcare services.

I don't believe this is true. You only need to include the JavaScript on pages which actively use the reCAPTCHA score. For example, you might only include it on the login and user registration pages.


Did you read the article?

> To make this risk-score system work accurately, website administrators are supposed to embed reCaptcha v3 code on all of the pages of their website, not just on forms or log-in pages.


Google recommends that you include the code on multiple pages, however, it makes it clear in the official docs that this is absolutely NOT required for the reCAPTCHA v3 system to work.

So if the article stated that websites were required to put the code on multiple pages (as the comment I replied to did) then the article is factually incorrect.


Isn't the idea that they can decide whether it's a user or a bot based on what the user does in general, not just whether their browser executes JS on this page that you want to protect?

Running headless chrome is trivial, so just having it sit on the one page where you need to check it won't help much. Collecting more data on the user's action on your site will provide a much clearer picture, much like a video from somebody walking through a store will help you make a decision about whether he's trying to steal something than a single picture of him standing at the check out.


The big "if" here is whether or not Google is actually factoring the user's activity into the score. For all we know, there could be a 80/20 split between "Google account activity" and "human-like behavior on website" when Google outputs a trust score.


Is it that different from the way Google Analytics works?


It is, in the sense that it's easy to disable Google Analytics by disabling tracking in Firefox, and there's no consequences. If a website uses reCAPTCHA, and you have tracking disabled, the website will break.


Works for me. Assuming one uses something like Privacy Badger and it if it were programmed to block reCaptcha, these websites that require recaptcha will go the way of anti-adblocker popups. People will simply say no and hit the X and go to their competitors.


Sure, my gov (Brazil) uses reCaptcha on the page where you can check your electoral status (For example: if you can vote, where, and if not, what is missing). Where can I find a competitor for that?


Ask your political representative why they're relying on a foreign ad service to manage their government websites.


You should expect a similar impact on your privacy.

The important difference is that unlike Google Analytics, reCAPTCHA v3 is inescapable. You cannot prevent the collection of your personal data, because then you would loose access to large portions of the web.


You can’t block recaptcha!


Why not? Is it always self-hosted?


I think they meant "you can't block reCAPTCHA and still access services behind it" - technically you could add a rule to uBlock Origin etc. to block it, but then you'd be unable to use those site/services.


> Since reCAPTCHA v3 scripts must be loaded on every page of a site, you must send Google your browsing history and detailed data about how you interact with sites in order to access basic services on the internet, such as paying your bills, or accessing healthcare services.

From a technical pov, how does one access a user's browsing history from client-side javascript. Isn't that something the browser should protect? or do you mean more that since the reCAPTCHA gets loaded on each page, Google can track what that IP is visiting by where reCAPTCHA gets loaded?


If the script is loaded in the host site's context (it would have to be), it knows your current location and can use DOM APIs to inspect your browsing history (on the current site, at least) and I believe on first page visit it will also be able to identify what website sent you there. It could also potentially register event listeners to watch what links you click.


"reCAPTCHA v2 is superseded by v3 because it presents a broader opportunity for Google to collect data, and do so with reduced legal risk."

And if you use something to prevent tracking - in my case Brave - reCAPTCHA is a huge pain that often takes dozens of clicks to make it through - delayed by Google to wait out bots.

Some times I think reCAPTCHAs main goal is to bring back those opposing tracking back into the fold of Chrome with painful recaptchas.


This comment has been detached, originally it was a reply to https://news.ycombinator.com/item?id=20295333.


in other words this is a callout to all webmasters:

please consider not using recaptcha.


the amount of sites asking me to identify fire-hydrants and traffic lights has dramatically increased since I turned on content-blocking that ships out of the box in firefox. Have been aggressively blocking already before (combo of: UBO, NandoDefender and steve black#s host file), but since turning on this inside FF ... there is no peace.

Anyone got some URL's that I can block all captcha attempts or does it mean I have to also sinkhole www.google.com[1] ?

( I don't have a problem not being able to access captcha enabled sites. )

[1] quick check tells me I would have to banish this endpoint which sucks because I'd have to parse the URL on every request and can't do it in DNS: https://www.google.com/recaptcha/api.js


If the v3 script is supposed to be installed on all pages of the website, in order to track the user's actions, I don't understand how that can be done without explicit user consent under GDPR.


As long as the ONLY processing of the data is for fraud detection/prevention, then GDPR specifically allows it as a “Legitimate Interest”

Recital 47: “The processing of personal data strictly necessary for the purposes of preventing fraud also constitutes a legitimate interest of the data controller concerned…”

Recital 71: “decision-making based on … profiling should be allowed where expressly authorised by … law … including for fraud or tax evasion monitoring and prevention purposes”


First of all the "legitimate interest" part only works if the publisher can prove that the user data is only used for the stated purpose.

The fact that a third party server handles this is a problem. Because then the publisher has to have a data processing agreement in place with the third party.

This is what makes Google Analytics problematic too. The collection of analytics for improving the service can be a legitimate interest, however the data amendment for Google Analytics basically passes the blame on the publisher. I don't think many publishers read carefully Google's data processing amendment, otherwise they would drop usage of Google Analytics. Actually most publishers aren't even with GDPR for more serious reasons, like not anonymizing the user's IP or sharing data with Google for the purposes of ads targeting.

And there are many questions to be asked here.

Is that data private, for the use of the publisher in question, or is this a shared pool of knowledge between publishers?

If the later, then we have a problem, because even if there is a legitimate interest, it only applies to the publisher being visited. Can a user be blocked due to a profile that was built on another website? We are in murky waters.

---

Then there's always the question ... does the publisher really have a legitimate interest?

Claiming that you can have one under the law, doesn't mean you actually have it. There's a set of conditions that you have to comply with.

For example for the purposes of preventing fraud, at the very least you have to be able to show that fraud is possible. Just because you have a login form that's about managing the user's color preferences on the website doesn't mean that you can transmit the user's traffic to Google.

The requirements for legitimate interests are hard to comply with. And I have a hunch that in this case many websites won't comply.


There are a lot of sites that are totally unusable on Firefox regardless how much you use ff.

I do all my mobile browsing on FF yet when I try to use some websites I always get this Recaptcha failed error(1) while it works flawlessly on chrome though I never use it often. Try it, maybe it will happen for you too.

Same happens on most sites which show you that "checking your browser" page via cloudflare too.

The web is very unusable unless you're using chrome because of such antics.

(1) https://cdn3.imggmi.com/uploads/2019/6/27/0dd96b25707ce6e236...


It's even worse when you're running a VPN (especially one of the major public ones). When I see reCAPTCHA I basically give up as sometimes I have to go through 6 or 7 full sets to be let into a site. It's the evil of the internet this.


reCAPTCHA on VPN is difficult, but on the Tor network, they are downright impossible. I've never been able to get past it, even after a few dozen painful attempts. That means Google services are entirely off-limits over Tor, even Search, which is a disgrace.


> That means Google services are entirely off-limits over Tor

If only it was Google services alone. CloudFlare loves serving up a ReCAPTCHA for Tor users before they can even passively read site contents. That hugely expands the damage done.


Install the PrivacyPass Firefox or Chrome extension. It was developed by Cloudflare, Firefox, and Tor in partnership. It has you answer a ReCAPTCHA and using some crypto magic, generate a bunch of CAPTCHA bypass tokens that can't be traced to your specific computer.

https://support.cloudflare.com/hc/en-us/articles/11500199265...

https://blog.cloudflare.com/cloudflare-supports-privacy-pass...

https://blog.cloudflare.com/privacy-pass-the-math/

https://github.com/privacypass/challenge-bypass-extension


Does not work with Tor.

The plugin requires "privacy passes". Those passes can be obtained by solving captchas, but when trying to do so, one is greeted with this message about being blocked: https://i.imgur.com/qXJfl6J.png


Slightly off-topic, but the users who use Tor regularly, how do you do that? For me, it has been terribly slow every time I tried to use it.


On Tor I get roughly 700 KB/s speeds, which isn't terrible for me


And what is that compared to any regular browser?


Try rebuilding your Tor circuit when this happens.

https://tb-manual.torproject.org/managing-identities/


This sort of breaks tor though, doesn't it? Tor works really well if you stay on the same circuit for a while since it reduces the chances you have a compromised circuit. If you start getting recaptcha to block every exit node except those you control, you essentially have amplified your effective strength on the tor network.


Tor is already broken for an adversary with that capability.


This sounds pretty good, but you still have to pass a captcha in order to get a pass, and sometimes that is impossible (or at least I just give up because I lost interest after 20 puzzles).

If it was developed in conjunction with Tor, how come it doesn't come bundled with the Tor browser or Tails?


they have a patent on giving out unbeatable challenges when the computer thinks it's dealing with a 'malicious agent'.

https://patents.google.com/patent/US9407661B2/en


So if you're running the wrong combination of addons/VPNs/browser you're denied access to half the web because Big G says so? And now they're aggressively pushing sysadmins to install silent data harvesting scripts on every page of their sites? WTF more will it take to get people interested in breaking up these monopolies?


Not denied access, they just keep serving up challenges to you until you give up, WAY worse than saying "You appear to be a bot, sorry".


I hear Google is a fun place to work and that they pay well.

Until software developers care -- nothing will happen.


Were Carnegie Steel or Bell fun places to work? Probably, they had cash spilling out their ears.

Monopolies need to be broken up because they threaten the free market and consequently our way of life - not because employees revolt.


From what I've seen (and most of it's anecdotal) things do appear to be changing. There are already people who won't go anywhere near Facebook now for personal ethical reasons, and even concerns that it might hurt future career prospects.


That's Juniper, not Google.

Juniper patented saying "No" to a client.


oh, sorry about that, it's conveniently grayed out in the corner :)


Tor users don't want to be running reCAPTCHA at all. There's a few privacy problems for people who run that or other ambitious cross-site snooping. Usual stuff (requests, cookies, JS fingerprinting, etc.), behavioral fingerprinting, and very detailed monitoring of what information you were accessing/reading and possibly even entering.


You can hardly blame anyone for blocking Tor traffic. You might not be using it for abuse but a large volume of abuse originates from it.


>You can hardly blame anyone for blocking Tor traffic.

Yes I can and do. It's bad enough that some websites won't let you do certain things over Tor, but preventing access to the website entirely is unacceptable. I made this account and comment entirely over Tor.

I don't see how it's okay to block Tor. That generic claim is made, but how are your spam measures doing if you couldn't handle Tor spam?

>You might not be using it for abuse but a large volume of abuse originates from it.

There is infinitely more ''abuse'' coming from Google, and yet it seems most every page I visit contains Google malware.

On principle, I hold the idea that Tor should be a first-class citizen and not disadvantaged in any way. Notice that Google's ''HTTP/3'' is over UDP, which Tor doesn't work with; I don't find that a coincidence.


https://blog.cloudflare.com/the-trouble-with-tor/

> like all IP addresses that connect to our network, we check the requests that they make and assign a threat score to the IP. Unfortunately, since such a high percentage of requests that are coming from the Tor network are malicious, the IPs of the Tor exit nodes often have a very high threat score.


And that will never change if significant services keep blocking Tor users. Thus we have a feedback loop effectively fighting privacy...


Somehow I doubt most Tor users are really just in it for privacy for general browsing, especially since it's so slow and limited. You can get a VPN for that. Unless you're a total privacy purist, there's not much incentive to use Tor unless you're buying drugs/something else illegal or just curious to look around the dark web.


Tor is free with no signup / cc required. This makes a huge difference, especially for younger users. Did for me back then, at least.

Initially it was slow, yes. But totally fine the last few years for normal browsing and reasonable downloads. Speedtest.net, speedtest.googlefiber & fast.com just now gave me 5, 6 & 10Mbps for whatever server in Ghana i got. Only the high ping causes loading times to still be a bit annoying.

But right now the biggest reason not to use Tor for anything "legit" is the many services blocking you, since indeed most current Tor users are not what those services want and the race to the bottom of Tor will continue, if we haven't reached it already.


Tor is slow if you're used to browse with a 50 MB internet connection speed.

My own connection doesn't go over 1.6MB download speed, and only if the weather is clear and I have the wind in the back.

You can now achieve a 500KB or more speed in most Tor connection, which is enough to have a confortable browsing experience, imo.

The real downside is the google captcha, which happens sometimes to even denie you to solve a captcha in the first place for web pages where there is no user input.


>You might not be using it for abuse but a large volume of abuse originates from it.

Given that Tor is a tiny percentage of Internet traffic, most of the abusive volume out there has little to do with Tor.


Sometimes Google will flag you as robot and never allow you to pass no matter how many you get right. It's total horseshit


I'm assuming you are not logged into a Google account during this? What happens if you create a throwaway Google account while on Tor? Or is that also impossible?


I remember that these days google requires a phone number. Finding a throwaway number is hard, especially in some countries.


I find they don't want a phone number if you sign up to youtube and opt to create a new gmail address instead of providing an existing email addr. Whether this works consistently, though ...

edit: also didn't try it over tor


Thanks for verifying that for me. I thought it was just me being horrible at figuring out what they want.


No, the same thing happens to me. I often run ProtonVPN + Firefox with uBlock Origin and a couple other privacy-related addons.


I prefer to see the silver lining in this. If Google wants to break the web for Firefox, fine. I'll keep using (and evangelising) FF, and the sites that are broken won't get FF traffic. I believe that FF is doing the right thing far users, and Google, while in a powerful position is currently on the losing side of history with respect to privacy. Apple is taking that fight to them, and putting budget behind inte convincing average internet users that privacy is cool, and Google abuses your privacy.

The walled garden approach worked for a while for Microsoft, and it's working for now for Google, but eventually, it stops working. Once people leave, walled gardens keep them away.


Only the majority of the Internet isn't a walled garden is it? It's more like a minefield because you don't know whether a site is going to use recaptcha and block/hinder your access.

You can't just opt out of using half the Internet because you value privacy, and nor should you have to. This requires legislation to stop.


As long as government sites don’t use it, it’s fine. You don’t need legislation for every single offense, perceived or otherwise. If you don’t like ReCaptcha, block it with an ad blocker and if a site requires it, don’t use it and let them know why. Also let all your friends and random strangers on the internet know why.


I have the same experience, some pages don't work on FF but fine on Chrome. I like to apply Occam's Razor, but with so many users it seems to me as if that's either by design, or certainly there is little desire to fix the issue.


Worst part is my chrome installation is 100% fresh with no browsing history and FF has cookies and history older than an year ago.. still google trusts Chrome more than FF?


If they looked for identifying information in cookies or browsing history people would be even more upset and spammers would just simulate it with browser bots... which is why I believe it takes a black box approach to each detection regardless of external state. Besides obviously the cookies set within the iframe of the recatcha.

This of course doesn’t help explain why Firefox is so heavily targeted by what’s supposed to be a neutral utility like Google Analytics...


I've heard that being signed into your Google account can make the challenges simpler, presumably reducing things like the noise and the slow-fade load animations.


That too could be isolated to a single reCAPTCHA session, keeping within the scope of a single iframe or page load.

The idea of tracking your history across multiple reCAPTCHA loads across multiple domains to build a user profile is what sounds like a giant privacy red flag, even though it's entirely possible given the current implementation.

Additionally asking hosts to include JS directly onto their domain which sets 3rd party cookies/data across every page in addition to tracking referring domains is equally a bad idea. reCAPTCHA 2/3 does require loading 3rd party JS directly on page, which I'd imagine is necessary to create callbacks in the frontend upon verification (as iframe content messaging is very awkward):

https://developers.google.com/recaptcha/docs/v3

Ideally the JS simply loads an iframe of the captcha HTML and handles the callbacks from events in the iframe. That's it. It shouldn't be touching anything else on your website. I'd be curious to see a reverse engineering to see how much the JS really does...


To be fair, its not super-hard to follow the incentives there...


reCaptcha isn't able to read your non-Google cookies or history, so most of that isn't being considered.


https://codelabs.developers.google.com/codelabs/reCAPTCHA/in...

Yeah, no. It certainly can read non-google cookies on the page (not httpOnly cookies, though).


I'm not sure what the link is meant to show, but "cookies on the page" is very different than the years worth of user history and cookies that GP mentioned.


I was under the impression sites using Google Analytics were included as a reCaptcha signal.


The signals aren't documented (for obvious reasons), but I'd be surprised if Google Analytics were a signal. These things are usually kept separate, and Analytics is a lot less user-specific under GDPR as the anonymizeIP flag is now very common.

That said, I've no evidence one way or the other!

My understanding is that it comes down to information they can read about your browser (does this look like a bot environment?), and heuristically how the user has behaved since the JS has been loaded (mouse movements, time between actions, etc).


I know if I was running a mechanical turk or bot farm, I'd be using a Chrome user agent via puppeteer. I'm not sure WTF they are doing other than being malicious against non-chrome.


Also Google punishes Firefox users by forcing them to click on pictures 2-3 times more than Chrome.


Same with Brave: I'm logged in into a gmail account and a custom domain hosted on gmail, yet every time there's a reCAPTCHA widget on the site, I have to do it 2 or 3 times before I'm let in.

One trick that seems to help fool that awful piece of tech: click slowly on the images, as if you were thinking a second or two before each click. Maybe click a wrong image and deselect it again. In other words, behave like a slow human, and it seems to work better than if I solve it as quickly as possible.


I also have the feeling that making mistakes — selecting an image that looks like a traffic light but isn’t — sometimes results in faster admittance than being surgically accurate.

Again, being slower and more error prone seems to be rewarded.


I don't even know what the right answer is in a lot of cases. There's a bit of the traffic light casing at the edge of a square, does that count as a traffic light, or only the lamp itself?


The right answer isn't what it is looking for. Its more looking for the way you try and work out the answer.


Which I intentionally and repeatedly fail anyway, because I'm not training Google's AI so they can sell it for use in drone strikes.

If this reduces the world Google allows me to access, it doesn't diminish mine because of it.


Other than the occasional reCAPTCHA gaslighting (which does occasionally block some service if it gates logins behind it) that we're all familiar with, I have completely excised Chrome from my life and am able to go to most any website without issue. That's with uBO and Privacy Badger running


That's odd. I never had issues with it on Firefox. Most of the time I just check the box and it's happy, sometimes I have to do an image puzzle. And that's with ublock origin. Maybe it depends on country or isp? My work place has its own /16.


Kickstarter's login has not worked for me with firefox for over a year. People just don't test with anything but chrome and it shows.


Funny enough I had to wait on the 5 second Cloudflare check to access that image. However I am using Chrome. That check I have found to be rather annoying. I assumed it would do it once, but it seems I have to go through it daily on sites I use regardless of which browser or device I use.


That image link is protected by cloudflare too! Irony intended?


The sad and depressing state of the Web in 2019: proprietary third-party JavaScript and cookies are required to view a simple, innocent JPG.


And it'll only get worse. From the article:

> "If you have a Google account it’s more likely you are human"

So, in the future if we don't keep signed into our google account(and let google know every article we read and every website we browse), we'll be cut off from the half of the internet or even more. The amount of control a handful of companies have over the internet is suffocating to know!


I have better luck if I log into Gmail :(


You can view your reCaptcha V3 score here: https://recaptcha-demo.appspot.com/recaptcha-v3-request-scor...

I get .7 on my iPhone, I’m guessing that my liberal use of Firefox containers and the cookie auto-delete extension on my desktop will give me a much lower score and cause me to have to jump through extra hoops at websites that implement it, just like the reCaptcha V2 does.

Edit: I also got 0.7 on Firefox with strict content blocking (which is supposed to block fingerprinters), uBlock Origin, and Cookie AutoDelete. I get 0.9 from a container which is logged into Google.


With Firefox fingerprint resisting turned on and with Ublock Origin/UMatrix, I get a score of 0.1. And I'm not even on a VPN; I'm sure on my home network I'd have an even lower score.

To me, it feels like Google's entire strategy behind reCaptcha is to make it harder to protect your privacy. We've basically given up on the idea that there are tasks only humans can do, and to me V3 feels like Google openly saying, "You know how we can prove you're not a robot? Because we literally know exactly who you are." I don't even know if it should be called a captcha -- it feels like it's just identity verification.

I don't think this is an acceptable tradeoff. I know that when reCaptcha shows up on HN there's often a crowd that says, "but how else can we block bots?" I'm gonna draw a personal line in the sand and say that I think protecting privacy is more important than stopping bots. If your website can't stop bots without violating my privacy, then I'm starting to feel like I might be on the bots' side.


> it feels like Google's entire strategy behind reCaptcha is to make it harder to protect your privacy

For the irony, I'm still logged into GMail and it still works perfectly, as basic HTML, even with google.com forbidden to run scripts. But it's the flippin' reCaptchas all over the place that make me temp-allow google.com, and then a reload later, temp-allow gstatic.com and reload again. Only then I get to use someone else's site normally, and I can disallow again... it's irritating. And then, this.

BTW that page plainly says the scores are samples and not related to reality. Refresh a few times and watch it change. 0.3, 0.7, and 0.9 seem to be my lucky numbers. I see everyone else getting those and 0.1.

Please stop reading things into it oh it's too late. Maybe they suddenly started seeing this page hundreds of times in the referrer and added that bit afterward, I don't know.


Dunno if it's changed recently or if I just didn't refresh enough before, but I'm now seeing basically random numbers as well.

If anyone wants a fun weekend project, I would love for there to be a few public sites I can reliably check my production score on.

I'm not sure it matters though, since I'm just ignoring most sites that use reCaptcha now. For sites I can't ignore, I've taken to emailing them with my requests instead -- recently I tried to use Spotify's internal data export tool and it wouldn't let me past. If you're not going to let me use a website to manage my existing account, then your support team can do it for me.


I see 0.9 I loaded https://recaptcha-demo.appspot.com/recaptcha-v3-request-scor.... several times and the score did NOT change.


Not sure how much Ublock Origin makes a difference. I have a score of 0.9 with it turned on.


I think this score is fishy. Ran the test three times and got three different scores.


I get the exact same score no matter what browser I use, despite uBlock Origin & Privacy Badger & Decentraleyes, even in private mode and with a VPN connection from a country I normally don't use. Hmmmmm...


When I just keep reloading, I get either 0.9 or 0.1. I get 0.1 more often. Interesting.

Maybe some browser extension can monitor the score and tell me what it currently is on each page load, when reCaptcha is used on some website. I'd just keep reloading, until it's good, and then try the captcha.


Same. FF dev, uBlock, Decentraleyes

Changing the FF content policy from Standard to Strict appears to have no impact on the score.

Opening in a Private window drops it to 0.7 for me. I have a bunch of add ons allowed in Private Browsing, so not surprised it only dropped a little.

Of course, if you have 3rd party frames and scripts disabled globally via uBlock, it doesn't even load.


Ublock Origin + NoScript on FF 60.7.2esr and got 0.9 as well.

[edit] tried in a private window and got the same score.


Does it change if you set privacy.resistFingerprinting=true in about:config?


FF private window + UBlock + Resist Fingerprinting = 0.1 for me

In my main FF window with UBlock + Resist Fingerprinting, logged into a ton of Google accounts, I also got 0.1

Going to guess that without fingerprinting data they are probably going to give you a 0.1.


Do you need to restart FF with that? After setting it to true and using a private window, FF still registers a score of 0.9.


First try in Vivaldi's private mode got me still a 0.3 . Then I tested it while being logged into Google and it went to 0.9 . However, when I tried it again in private mode, I got 0.9 there too. Temporary fingerprints show quite the effect.


I also get 0.1 with the same config as you, except that I had uMatrix disabled (which if anything, should improve the score in Google's eyes)...

so why are they having you solve image puzzles if they know that they are going to fail you? even if they know that you are human...


Firefox Focus, 0.3. You seem to have triggered something outright penalising.


It seems totally reasonable that Google knows you're not a bot if you have a Google account. This isn't the problem, although it hides the problem.

The problem is that they aren't trying harder for users who aren't logged in.


I’m just waiting for the AI-generates fake people and whatever way they will come up to monetize that!


Your privacy isn't nearly as important as you think, and as long as you continue to overvalue it, you'll continue to be unwilling to trade it for convenience.

That's on you, not Google.


Using Firefox with uBlock and Cookie-Autodelete I get 0.1

Using Chrome, even incognito and with uBlock I get 0.7

(╯°□°)╯︵ ┻━┻. F you, Google, this is blatant bullying, technically unjustifyable abuse of your stranglehold over the whole web platform.


To offer a different datapoint:

On FireFox with uBlock on and logged into my corporate gmail I get 0.9, switching to a private tab I get 0.7. This is with every privacy setting turned on in the FF options.


I also have a similar result (0.7) using my browser at work. I am using containers, uBlock, privacy badger and auto-delete cookies.


> NOTE:This is a sample implementation, the score returned here is not a reflection on your Google account or type of traffic.


This comment should probably be higher up in the thread.


It is both funny and sad to read this thread.


Using chrome on my phone I get 0.9, but if I switch to Firefox I get 0.1.

This is essentially going to let Google gatekeep the web if you aren't using their services.


Really? I don't think so. I get a 0.9 on Google Chrome, and a 0.7 on Firefox. I heavily use Chrome and I have not used Firefox apart from maybe testing some local websites. Despite this I still got 0.7 on there. I expected lower since I don't use the browser.


On a flip side: you really should check privacy settings in your Firefox, it seems Google can track you easily there. ;)


I use Firefox with Google container and uBlock Origin and Privacy Badger and also get a score of 0.7

How can I get better privacy settings?


I was being sarcastic - high score on captcha probably means G knows too much about you. That said, I don't think the scores are reliable. It is possible (probable even) that G is still running experiments.


I get 0.1 continuously, possibly because I have resist fingerprinting enabled in Firefox. I'm not changing anything to compensate that score; it shows I must be doing something right. If I encounter a reCAPTCHA I will continue to (usually) just leave the site it's on.


Same, the way to look at a low score is "I'm getting privacy right".


Contrary to the results here, using Firefox + uBlock with DNT and tracking protection enabled, I get a score of 0.9. In private browsing mode it's 0.7.

I wonder how many people here are using a VPN or accessing from a non-western country -- I'd bet those are much bigger factors


Were you logged into your Google account? That seems to almost guarantee a .9


Yes, although not when private browsing of course.


FF logged into Google account: 0.9

FF incognito window not logged into Google account: 0.7

FF incognito window not logged into Google account through VPN: 0.3

FYI I have uBlock, pi-hole and a bunch of privacy widgets enabled


This looks like a RNG: I got 0.7, 0.9, and 0.1 successively. It can't make up its mind whether I'm almost certainly not a bot (0.9) or almost certainly a bot (0.1)?


Perhaps the rapid, repeated identical requests outweighed the initial factors which gave you a positive response


Might very well be. I also get errors on hacker news about "can't process requests that fast". When asking about it (initially because I thought votes didn't work randomly), the limit is a few requests per second. Turns out I click faster than that, either by reading a whole comment thread and making up my mind whose comments were most helpful (to upvote all at once) or by navigating too fast.


from the link

>the score returned here is not a reflection on your Google account or type of traffic

I got random scores as well. It looks like this is just a sample of the data structure that the service returns, not the actual score.


That would be a useless site, but that's not how I read it. I understand it as "this is not that Google thinks your account is a bot, it's that this request might be made by a bot. And since you didn't use this site as a normal website, it also doesn't score your type of traffic, just this one request". You might be right, but it really does seem to be doing a request to their API.


>That would be a useless site

looks like it is a demo of the API for people wanting to consume it. knowing what the payload looks like is not useless at all in this case.


Documenting requests' format and their return values is documentation and doesn't require an interactive site that looks totally real and makes you expect a real (rather than a dummy) answer. Which is not to say it's impossible, but it would be weird/unlikely. Usually when there is an example api request in documentation, it's a real (live) request, too, and this isn't even a documentation page.


> This looks like a RNG

Come on, how is everyone in this chain so blind. It's literally in bold and the single largest block of content on the page:

NOTE:This is a sample implementation, the score returned here is not a reflection on your Google account or type of traffic. In production, refer to the distribution of scores shown in your admin interface and adjust your own threshold accordingly. Do not raise issues regarding the score you see here.


> Come on, how is everyone in this chain so blind

Please see the sibling comments (that were there before yours) where this is already being discussed, before being insulting.


I too got 0.1 even though I'm not on a VPN, and have a stock FF installation with just uBlock addon. I think my ISP may have some part in it but still 0.1 score is 100% bot right?

I'm also logged into google and fb which also doesn't affect my score. Only shows how broken their algorithm is :(

edit: just tried it with chrome and my score jumped to 0.9! So definitely not my ISP. It's just my browser that Recaptcha doesn't like. If you put two and two together that's really evil shit, even for Google!


I got 0.7 on FF, 0.3 on Opera and Chrome, all in incognito mode. Maybe they have just a few values and return it based on AND OR logic of 2-4 variable. Or maybe they are just playing around trying to gather some stats, for some "Don't be Evil" purpose!


Google is putting a number on us, is honestly some Minority Report level dystopia. Google is already using this to make life hell for anyone who cares about their privacy, we need to do something about this before they finish putting up their iron curtain over the web. Would it be possible to sue website owners for requiring such invasive measures? I'd love to see this ruled as monopoly power and Google broken up but that's probably not very realistic so we would probably do better to make using Google captchas more expensive in court costs alone than just building their own solutions to fight bots.


Work Firefox which I use all the time, no addons (including any adblockers): 0.1

Almost unused Chrome installation, also without addons: 0.7


Seeing what everyone else has posted I'm very suprised that I've received a 0.3 using Chrome on Android. I'm logged in to Google and most of my browsing is via Chrome or Chrome based webview. At least on my phone I've never cleared my cookies or done anything special.


This is total bullshit. My score of 0.1 in firefox shoots up to 0.9 if I change my user agent to ChromeOS. No other changes - same set of ghostery/ad blocker/fingerprinting prevention, etc. What a scam.


Ding ding ding ding, Google's way of killing the other browsers in the market for good, kill off the adblockers manifest, literally become the entity which monitors the internet as much as the NSA...


Oscillates between 0.1 and 0.7 for me, and I'm changing nothing on my end (just hitting "Try again"). Does it have to do with refresh speed, I wonder?

Privacy Badger and ABP on my work (less-locked-down) Mac.


Hitting the same URL over and over again is bot-like behaviour. When working with reCaptcha on forms I usually start getting hit after 4-5 test submissions.


I get .9 in Firefox on my MBP with UBlock Origin installed. I wondered if it was because I was logged in to Google, so I tried Incognito and got .7. In a never-before-used container I also get .7.


I get a 0.7 on my computer on Firefox. If I use the same website in Chrome (which is signed into a Google account) I get a 0.9. I guess it's a [0,1] scale?


I'm guessing their a-listers came up with something like this:

    // TODO: add impressive-looking math
    if (signedin && trackedEverywhere) {
         return 0.9
    } else {
         return 0.7
    }
I think we give Google way too much credit for their talent. This is the same company that didn't feel like finishing their website for two decades and subsequently stole $75 million from their users even when Google knew [1].

The same company that somehow still doesn't reconcile amounts owed and just keeps the money when they randomly-ban users and hide behind fake support emails, but they did feel like paying $11 million to keep that away from scrutiny [2].

[1] https://www.businessinsider.com/google-emails-adtrader-lawsu...

[2] https://www.searchenginejournal.com/adsense-lawsuit/248135/


Google consistently gives me the impression of a company that (I suppose) has tons of smart people in it, but has badly broken management & incentive structures leading them to constantly do bafflingly stupid stuff at both large and small scales, even by the standards of a bigcorp, to the point that they survive only because they've got one hell of a golden goose.


Good info. Thank you.

And in keeping with recent revelations on Google's manipulation of search results, I think they have really gone beyond the pale. I un-archived my old iPhone two days ago and went back to iOS after the James O'Keefe/Project Veritas revelations. I now cannot, in good conscience, use anything Google. I always knew about the tracking and all that because, after all, they are an ad company. I'm now in the process of moving all of my domains over to Fastmail, which I've used since 2002. I'm using Qwant, Startpage, and DDG for search. FF for browser with many about:config tweaks and several add-ons.


You know Project Veritas is a load of shit right?


Please explain. Even without the revelations from PV, it's patently obvious Google, et al are biased. Anyone can see it. Silicon Valley is a bloody echo chamber. If the videos by PV were not damning in the least, why did 4 different companies take them down and remove the accounts of PV?

Sunlight is the very best disinfectant. People have a right to know if searches are being manipulated to one side.


If I sign out of my google account in Chrome it drops from 0.9 to 0.7.

I could have sworn I'd never signed in to Chrome using my google account, but I guess I must have mistakenly signed in to gmail or something.

I use FF as my main browser, only ever drop back to Chrome sporadically, or when I really want tabs to be completely isolated (there are some annoyingly CPU/power intensive stuff I do from time to time, and I can just renice Chrome while I get on with other stuff.)


> I could have sworn I'd never signed in to Chrome using my google account, but I guess I must have mistakenly signed in to gmail or something.

Chrome 69 tricked users into signing into the browser, myself included - https://lifehacker.com/how-to-disable-chromes-automatic-sign...

That was the last straw to uninstall Chrome from all my devices and I've been a happy Firefox user ever since. Well, except now reCAPTCHA hardly ever works.


I believe that's a "feature" they added a while back, auto-signing you into chrome as soon as you was logged into gmail.


The GP post's IP address or other fingerprint may be validated from other Google properties they might have visited, so I wouldn't put so much stock in the 0.7.

Honestly... if it's the same team that did ReCaptcha 2.0, this is a team that pulls out all the stops. Per https://github.com/neuroradiology/InsideReCaptcha ... they implemented a freaking VM in Javascript to obfuscate the code that combines various signals. There's a lot going on here that's likely highly obfuscated and quantized before it's displayed to us.

EDIT: non-paywall link for [1] in the parent post: https://outline.com/aA7HS5


I get 0.9 on Firefox which is my main browser and 0.7 on Chrome which I use only for hangouts.


So, I still have to whitelist Google in uMatrix and allow cookies for this to work. Even after doing so, I get a 0.1. I reloaded the page to check for variation as some other users mentioned but get the same score each time. I guess Google is saying I shouldn't be allowed to use the internet.


I got a 0.9. What's it out of? 1? Sorry if I completely missed that somewhere already.


Yes, it is out of 1. From https://developers.google.com/recaptcha/docs/v3, > reCAPTCHA v3 returns a score (1.0 is very likely a good interaction, 0.0 is very likely a bot).


0.3 with Brave on Android, no extensions. 0.9 with Chrome on the same device, same connection.

Brave isn't particularly "unusual", and is even based on Chromium - surely this is Google blatantly punishing non-Chrome users?


Interesting.

I get a 0.7 on Chrome with no account logged in and uBlock Origin installed.

Same browser, same plugin but incognito it's 0.1.

Papa google needs my data to trust me. Makes complete sense but still interesting that you can affect your score by giving in.


What is most odd is I get 0.7 on iOS Safari which I use for 100% of my purposeful mobile browsing, but I get .9 on iOS Chrome, which is only used when I accidentally click on links from gmail (so very, very rarely).


Not really odd at all - if you're using the gmail app, there's a shared authentication cookie in all Google apps - including Chrome, so Google knows who you are in Chrome.


It seems a lot is iOS users get 0.7.


A consistent 0.3.

> error-codes": ["score-threshold-not-met"]

Not sure if happy or not happy with that. I will conclude happy enough.

Linux, on VPN, Firefox. Not logged into any Google services. Cleared caches (still same IP), no difference.


Stock Qutebrowser 0.7, FF w/ all the usual extensions (ublock origin) 0.7. Don't know if it matters but I'm rolling Arch. Just adding another point of data for those curious.


From my computer, where I browse fairly equally with all three of Chrome, Safari, and Firefox (albeit different sites), I get the following scores:

Chrome: .9

Safari: .7

Firefox: .1

I have adblock running on all three, and I use containers on Firefox.


interesting my score is 0.9 if I allowed google to track me using cookies, if I block the cookies it goes to 0.7 and if I enable content blocking in Firefox it drops to 0.1


With desktop Chrome I get a 0.3. My browser sends Do Not Track, has PrivacyBadger extension, and has that useless google-profile-in-the-browser feature disabled.


I got 0.9 on Chrome, logged into google. I also got 0.9 on Firefox, not logged into google.

In incognito mode in chrome, I sometimes get 0.9 and sometimes 0.7 when I reload.


Using desktop Safari incognito without a Google account and Ghostery enabled, I get 0.7 too. Interestingly, disabling CSS drops me to 0.1...


Interestingly enough I got .9 on Edge with Ublock origin installed. Perhaps this has something to do with how Edge is using webkit now?


I got 0.9 in my Android phone running chrome. When I opened it in incognito mode, my score was reduced to 0.7


It gives me 0.7 on Safari (uBlock Origin) while 0.3 on Chrome (uBlock Origin) - both macOS Mojave.


Firefox mobile w/ ublock: 0.9


Firefox with uBlock O I get 0.9. Don't know what everyone else here is talking about.


I get 0.7 in both desktop (linux) chrome and firefox. I get 0.3 from android chrome.


>Please upgrade to a supported browser to get a reCAPTCHA challenge

I guess this is a 0 for me then


I use the same extensions on desktop and get 0.3 on my android Firefox


The first time it failed the second time I got a .7 iPhone Xs.


iPhone with a good (not amazing) adblocker: 0.7

Safari macOS with the same adblocker: 0.7

Firefox macOS with a lot of adblockers: 0.1


I get 0.9 on my Firefox


It didn't load for me and I couldn't figure out why.

Then I remembered that I put this in my /etc/hosts a few weeks ago and forgot about it.

    127.0.0.1       google.com
    127.0.0.1       www.google.com
[Edit] So if nothing shows up for you on that page, check for that. Also I just generally recommend it. Google has some unethical practices and duckduckgo.com is pretty good.


I got "reCAPTCHA script loading".

You need not to use hosts to block it, uMatrix could do it by itself.


There are government services, such as the USPTO, that rely on Google reCAPTCHA. The new reCAPTCHA has made it difficult for me to access documents, and sometimes they think that I'm a bot and thus deny me access entirely.

Does the government realize the consequences of this? Both that it pushes users to use Chromium-based browsers, and that they're helping to solidify a company that already has a near monopoly in the browser space?

Further, this quote is very creepy:

> To make this risk-score system work accurately, website administrators are supposed to embed reCaptcha v3 code on all of the pages of their website, not just on forms or log-in pages.

With AMP, Google Ads, and reCAPTCHA, Google now has access to pretty much everything that people do on the web.


To be fair, government at all levels did the same for Adobe by mandating that things be done in the PDF file format. Consistency of government operations sometimes requires that certain private companies be preferred vendors, and of course there's going to be a snowball effect there as big players get increasingly larger shares of available funds. Every government in history has had a government-industrial complex with winners and losers - some of this boils down to human nature.


> Consistency of government operations sometimes requires that certain private companies be preferred vendors

In which case those private companies should now be deemed an extension of the government and fall under all rules a government organization has to abide by. If they do not like it they can forbid the government from using their software/products and can sue if the government does not abide.


You are required to use reCAPTCHA on the California DMV website when making appointments and other functions.

https://www.dmv.ca.gov

It will also log you in to google on the first page.

Additionally, the stations at the DMV all have tablets on stands, showing Google logins for some operations.


When did this start? I used the DMV page without a google account in February with no issues, and the local DMV has no tablets as of May.


It's been quite a while. At some point in the last year maybe with google blocked, it became impossible to register for a DMV appointment. Filling in the forms and pressing submit ended up with an unhelpful "Server Unavailable" and "Call xxx-xxx-xxxx during business hours"


I was amused that Elizabeth Warren's campaign site wouldn't display the content for me unless I permitted scripts from google.com (w/ umatrix) since she is promoting breaking up google.


Although you can be pro break-up-Google while using one, or even many, of their services.

So I don't really see the amusement.


The belief these days is that when someone does something wrong everyone must shun them and not do business with them. Her website didn't have to use Google services as there are many alternatives.


Reminds me of Matt Bors' Mister Gotcha: https://thenib.com/mister-gotcha


It seems foolish to me to target Google while simultaneously sending a constant feed of data about people visiting your campaign site.

In this case the only "service" it appeared to be using was hosting for jquery...


It's typical of politicians not to run their own organizations in the same way as they say the world should be run; campaign organizations are a bit fly-by-night, and political platforms are more or less flexible by necessity.

I think it comes from the sad fact that, generally, ambitious politicians create organizations to get themselves elected, rather than previously-existing and purposeful organizations presenting candidates that represent that organization's values to a larger audience.


Makes me wonder if this could cause those sites to run afoul of the ADA? (I'm admittedly not very familiar with the requirements, but thought it was interesting to consider).


I do wonder what these government offices can do otherwise to prevent spam - I can run `curl` on the Georgia DDS appointment listing page and I get back all available slots. Assuming there isn't a captcha later in this process, it would be trivial to build a bot that books fake appointments for the next 5 weeks.


The other tradeoff is you're giving Google an extraordinary amount of power to decide who is allowed and not allowed on your website with no transparency on how this decision was made. Not sure what company is willing to blindly trust Google with that power.


Unfortunately, the answer is (basically) all of them. Combined with CloudFlare, even websites that aren't explicitly making that decision are still opting their users into both CloudFlare and Google's tracking.

I use Tor fairly regularly and it's a complete nightmare. I sometimes spend 5-15 minutes solving reCAPTCHA (since your Tor circuit changes every 10 minutes this can result in having to solve the reCAPTCHA several times).


Crummy solution, but get the FF user agent switcher in TBB.

And then set it to Windows/Chrome. And all those Scroogle-captchas are easy-peasy.


I can't help but wonder if that's because of the nearly unique user-agent-string + user-agent-feature-detection combination, allowing it to identify you as belonging to a very small group of people.


can you expound on this? not sure I grok


     1. Download Tor Browser Bundle
     2. Connect to the Tor network
     3. Download a user-agent switcher in the plugin store in TBB
     4. Change user-agent to "windows, chrome"


Plenty of small companies/private website owners might make a simple cost/benefit calculation - 15% less users but in return X reduced false accounts/traffic/...


This is a problem that has to be solved by government regulation, “the market” has already made up its mind: It doesn’t give a fuck.


What is there to regulate? Bots are already illegal and "the market" has decided it does care, which is why reCAPTCHA exists and is widely used. Those companies aren't rejecting legitimate users/customers out of spite.


The last thing the internet needs is "government regulation". If that were to happen, you could kiss everything you love about the internet goodbye. Governments do a horrible job regulating basically everything else other than critical, necessary services, and the jury is still out even on that one. Why would they suddenly do such a better job regulating something we're still all trying to figure out?


Bank of America is...


I'm torn on this. reCAPTCHA v2 (mostly useless[0]) and v3 function largely on browser fingerprinting plus a few other heuristics (e.g., whether or not you have a Google cookie). Any meaningful privacy measures to resist fingerprinting end up with a low reCAPTCHA score. I personally run into a wall on most sites using it.

That said, it's one of the most effective means of combatting automated spam and credential stuffing attacks. In a recent implementation I did, having 2FA active for your account bypasses the captcha requirement, but the vast majority of users are still too non-technical to use 2FA and are subject to the frustrations of reCAPTCHA.

[0]: https://github.com/dessant/buster


It is used irresponsibly.

A responsible spam protection system should allow every spam (and consequentially responsible user) from an ISP.

If a ISP shows sign of abuse, then show Captcha or other system that will block some spam while also blocking some valid users. This is a evil-for-the-greater-good solution. Do not fool yourself into thinking this is a solution (i.e. without caveats)

Impacted users can complain to both the service provider (you) and their ISP. And that failing, switching their ISP (i.e. voting with their wallet --how that happens in a monopoly is another discussion)

Bottom line, if you show captcha for all users (even for ISPs that are now showing signs of spam) you are intentionally blocking some users for no good reason. And you are part of the problem. Sadly, this includes the US government as they blanket censor all their forms (from visa request to DMV visits) behind Google(R) captcha(tm) at all times.


It seems that your suggestion is that ISP is a good signal for detecting spam, but it's not obvious to me that this is true. For example a site targeted by a botnet could be hit with traffic from a wide range of otherwise legitimate looking ISPs, in which case you're going to be getting a lot of spam on your website.


In our experience, country of origin is a better heuristic for possible abuse than an individual ISP is. Most malicious traffic comes from a fairly small number of countries, many of which are the obvious culprits. That kind of data is never guaranteed accurate, though.


lots of salty people in denial about being part of the problem ;)


I hate the v3 reCAPTCHA. On FF, I usually KNOW I am answering correctly and it says I failed. I always have to go through it multiple times. It's maddening. It often leaves me second guessing myself... is that sliver of car counted? is a crossing signal a street light? What about those streetlights way off in the distance, do I select those two in addition to the ones front and center? That RV looks sort of like a bus, should I select that too?


It's not really about getting the questions right. The challenges they present aren't that hard for modern computer vision systems. It's more about verifying that you consider the question for a "human" amount of time, make your mouse move like a human might, etc.


Captchas are hell on the blind and vision impaired. There are add-ons for this, but they mostly suck. Things like Webvisum exist but they are invite only.


You're thinking too hard, reCAPTCHA is made for regular users who won't sit there thinking if they're right or not. I always make sure to select a few wrong squares out of principle and after 6 or 7 attempts I'm usually allowed into whatever site I was visiting. Nobody knows the exact algorithm, so trying to please it, or worse - agonising that you answered wrong - is completely foolish. Just click a few squares that seem right and roll the dice. Beating yourself over not solving reCAPTCHA correctly makes the terrorists win.


You're describing v2, not v3.


I intentionally get a few wrong (to subvert their harvesting of manually labeled training data).

It still either lets me through, or doesn’t even manage to display images for me to click on because it doesn’t like my browser settings. (The latter is more common than the former these days...)

I wonder how much legitimate traffic bounces because of reCaptcha. Can sites even measure this?


Apparently one of the Twitch streamers I sometimes watch had it do that to her just the other day whilst testing out her own site's reCAPTCHA-protected form: https://clips.twitch.tv/BitterTentativeRamenChefFrank


They want all tiles with bicycles... but are showing me a road painted bicycle crossing pictogram...


At that moment I don't care if their AI gets the wrong feedback on what a bicycle is as long as I get pass the captcha.


The subheadline gets it right:

> It’s great for security—but not so great for your privacy.

For individual users, security and privacy frequently go hand-in-hand. But for site operators, user privacy makes security a lot harder. The more you know about a user, the easier it is to figure out if they're an adversary.


Wouldn't reCaptcha V3 also make things much more difficult for Google competitors, assuming that site owners place it on every page? I'm guessing it will block any sort of scraper (since scraper access patterns don't look human) with some sort of whitelist for Google's scrapers.


They typically won't completely deny access, but only disallow certain actions (posting a comment etc). All out denying access will likely get site owners into trouble, at least in the EU - you need to be able to access privacy information, publisher info etc. I'm also not sure whether non-opt-in usage of recaptchav3 is GDPR compliant.


Yes it means Google has p0wnd the web.


I guess another question is why we really need captchas. What are we trying to protect against that can't be accomplished with rate limits, voting systems, or other ways to regulate meaningful use of a website?

Ultimately why does it matter if the user is a human or bot, as long as they are being a valuable user? What's wrong if a bot buys some of your inventory, pays for it and everything? What's wrong if an NLP bot responds to discussion threads with scientific facts and citations?


> What's wrong if a bot buys some of your inventory, pays for it and everything?

100% of the time, a bot buying things from a store is doing so to test a database of stolen credit cards the bot's owner has purchased/stolen. Accepting those sales means you'll get hit with chargebacks a few weeks later as the real owners of those cards see their statements. Then your store gets shut down for exceeding the maximum 1% chargeback ratio mandated by Visa and MasterCard. So preventing this scenario matters a lot, and when someone targets one of my stores for testing like this, enabling a CAPTCHA on the payment page is one of several, often-essential mitigations. Blocking IPs, blocking whole countries, including a nonce in the form, etc are on their own insufficient most of the time: the readily-available tools for this kind of attack already handle rotating IPs, retrieving a new form nonce on each try, spoofing the proper referrer, etc.


Our company has industry leading fraud rejection rates and we don't use captcha at all


Would you be able to say how your company accomplishes that?


Honestly, statistics from about 2010 (ie before the age of neural network hype) and limited human observation.


Human moderation and ad-hoc heuristics seems to make the difference at Reddit too, rather than the CAPTCHA at registration.


I get a recaptcha when trying to sign up for a new account:

https://i.judge.sh/Flutter/45DyMRuL.png

maybe this is related to some other heuristic they're using for determining whether or not to show recaptcha (although this is in a no-extension Chrome on a residential IP address).


Right, they have that at registration but it's either superfluous or it only catches the really easy stuff because they rely on an army of human moderators who spend all day cleaning up after bad actors able to click buses.


In practice it is a major pain to keep up to date, and bots slip through all the time, at least on the subreddit I help moderate. It's a lot of manual volunteer work.


Do you have any stats on cart abandonment rate, and/or how that changed after you enabled recaptcha?


"Cart abandonment"

Why is this a bot-specific thing? I abandon shopping carts as a human all the time, including especially:

* If you require a registration and login to checkout

* If your UI is too hard or clunky to use

* If shipping fees are higher than what I think is fair

* If there are additional non-upfront fees


I think that was the parent's point: humans will likely abandon carts more often if they need to solve a captcha to advance to the check out.


> What's wrong if a bot buys some of your inventory, pays for it and everything?

In the book "Spam Nation" (Brian Krebs), a group of students try to fight fake online pharmacies. To do this, they created an army of bots and placed thousands of fake orders every day. The goal was to create so many fake orders that the human processors (many fake pharma stores were not fully automated) had to spend a significant amount of time clearing out the fake orders before getting to the real orders.

Now imagine this on a legitimate website. Not every website is automated, not every organization has the same resources that Amazon does. CAPTCHA's are a great way to ensure orders are coming from real people. They could still be faked, but the bar is a bit higher.


Don't think bot; think botnet. Ratelimits do not work against botnets since they appear to be independent actors. e.g. if you think it's fine for everyone to do something 1-3 times, then you are letting a botnet of 10k hosts do something 10k-30k times.

[edit]

Also, NAT means that there could be hundreds or thousands of individual users on the same IP address (many dorms at smaller colleges are setup this way), so you don't want to rate limit by IP address either.


Shouldn't open aggregate listing and blocking of botnets be effective? How many 10k node botnets are there?

Also: egress hygiene should be a thing. Block subnets and ASNs if toxic behaviour is detected.


> How many 10k node botnets are there?

About 10 years ago it was somewhere around 10 million bot computers. Obviously the distribution isn't uniform there, but that gives you an idea of the order of magnitude.

Also that was 10 years ago, before smart fridges and tvs. So, possibly more now?

I don't think "just block all the bots" is a feasible solution here.


Not how many bots, but how many botnets.

The nets themselves take resources (time, effort) to set up and maintain. Presumably they're engaged (at least for the purposes of generalised website defence, though specific niches such as targeted commerce fraud may not apply) in systematic behaviour, which leaves signatures.

Systemic cross-site collaborative detection. -- essentially what Google's CAPTCHA systems are, though there are others, such as CZ.NIC's Turris project -- could identify these, and via network-based domains of authority (ASN and CIDR assignments) assign reputations and target anti-fraud or anti-abuse countermeasures.

Durable, privacy-respecting reputation tokens might be another approach.

Present systems are far more primitive and reflect an earlier world-state.


Are there potentially more effective ways to combat this? I'd rather combat malicious behavior than stereotype bots as malicious.

That's almost like saying laws don't work against [members of certain race] or [members of certain religion]. Rather we just need some combination of better education and better enforcement strategy instead of stereotyping.


Indeed, if they need to pay, there is no need for a CAPTCHA. As for responding to discussions, sure, you'd ban any human that posts spam the same as any bot. However, bots can spam your site faster than your human moderators can keep up with, so by using a CAPTCHA, only humans can post (ideally, of course), and thus moderators can keep up.

As a security consultant, it is not uncommon to recommend a CAPTCHA for things like successive, three failed login attempts from a single IP address within a certain time period. But I do agree that CAPTCHAs are used too frequently, and some security people recommend them for just about everything. As someone who blocks a lot of tracking and feels the pain of these tracking monsters (that's what CAPTCHAs are these days, more than the Completely Automatic Public Turing test they're supposed to be), I always think very carefully whether a CAPTCHA is the only option, and I'm sure to recommend CAPTCHAs that fit the situation but are less invasive than a third-party one.

Edit: oh, right, credit cards are common in many countries and banks set chargeback limits. It's still crazy to me that your 'public key' is also the only thing needed to withdraw money from your account, thereby necessitating a chargeback system. I guess for credit cards a CAPTCHA might be useful too.


> if they need to pay, there is no need for a CAPTCHA

> As a security consultant

Hm? How does a security consultant not know that credit card numbers can be stolen and used by bots?


Sure, I know credit cards exist and that the numbers are stolen because they're so trivially easy to abuse, but payment systems are not very related to my work. I very rarely come across a product where I have to test payment features, and when I do, it's out of scope. Either the payment is handled by some third party (the usual case), or it has already been tested years ago and they're now asking to test some new feature (all other cases).

More typical projects are testing traffic filtering solutions (firewall-like), blockchain startups (those are the worst), back-end (no payment) or b2b (contract-based payment, not online) applications... Even in the months that I was consulting at a bank, I never touched any sort of money system, there were a thousand other applications, services, websites, infrastructure things, and mobile apps to test. To give you a random example, they wanted to give people advice when buying a house through an app (where the final screen goes "and that's where $bank comes in: we can finance all this!"), so they had some external company prototype an app that was riddled with bugs in the login system. Or some internal service that POSTs data from one system into another. Or some API endpoint used for statistics. Etc.

So the cases that I see are as a consumer, where I pay either by bank transfer (logging into my own bank's website), via iDeal (which also redirects you to your own bank's website to complete the transaction, but that one is instant instead of having to wait a working day), or sometimes via PayPal if that is the only option (I guess paypal do their own bot detection? No idea). So from my perspective, when I paid for something, the money is in the hands of the merchant and only customer support or a lawsuit would get it back.


Here is an answer from the hncynic bot:

I had problems with my spinner at first. However, this is one of the things that is really annoying about using captchas instead of passwords.

It is possible to create a secure account for something useless without using the system and you probably won't get spammers anymore.

What a dumb idea, you would want to implement it yourself, because the people working and maintaining the system(s) will all have some way of doing that already.

I don't know how much the government can take away from a site like this as well. But it's a bit like trying to ban a kid because the kid got an old friend on their facebook because his parents were "bad".

So, even though you have a big idea about voting systems, it would be a better option to require a system that only exists to be able to be used for good reasons.


So this is probably a bit off topic, but why don't more site owners just create their own unique anti-spam system? In my opinion, if they were simpler, yet all unique, there would be less bots that could mass spam and privacy would be improved.

Even something as simple as a question: "How many legs does a spider have?" ____

And then cycle through different types of free form questions of things that most people should know. Perhaps block the IP after {n} failed attempts for an hour.


Forum software like vBulletin and Invision often has this feature built in, and I've used it on a forum I help run. Unfortunately, after writing four or five custom questions, I soon found server logs showing spam bots blowing through the questions in seconds -- I suspect that since this is a common enough strategy, it's worth their time to pay someone $0.10 to pick the correct answer, then save the question and answer pair in a database somewhere for future use.


I don't remember what forum software it was, but they'd render a number or word using all periods (kind of like ASCII art) and ask you to enter it in a text field. I wonder how well that worked.


Here is a rough silly first pass using figlet. [1] It just displays a word. Next I need to accept a post of that word and do something. Maybe another version will use a game.

[1] - https://tinyvpn.org/up/d/


One possible reason: the site owners nowadays use the likes of Foursquare/wix/whatnot which means reCaptcha etc are just a checkbox item. Custom anti-spam may be beyond their technical capabilities or interest levels.


I think this would fail under any directed attack. It’s too hard to generate a database that’s large enough.


This is the answer. It seems that most website owners are somehow super scared of a targeted attack, since it is indeed trivial to bypass (and they realize that), even if nobody will take the time.

I've heard stories from people that own small sites and still have someone targeting the site with custom scripts, but never anyone I know (not even a friend of a friend, only ever random people on the internet). But there is also the (much larger, from what I can tell) group of people that never had these issues. But people don't like risks, and installing a tracking captcha from google is made very easy. "Everyone does it, that ought to work!" (Meanwhile I hear of a 90% success rate from a recaptcha browser plugin, but who cares about that right?)


I've had received attacks from custom scripts to post spam in a blog that nobody read. I changed my custom robots tests a couple of times, and each time it took a few days for the bots to adapt. At the end I removed the comments section, so there was nothing to attack.


This is exactly the kind of story I'm taking about. I'm sorry about your experience, I don't doubt that you're real, but this is the kind of confirmation/hindsight bias that makes people misjudge risks. I expect you are an outlier, but I have no idea.

Might be interesting to poll random people that have websites with <100 unique visitors a month for this sort of thing to get us any sort of idea of how necessary an invasive CAPTCHA like Google's is.


XRumer (forum spamming) software had a feature over a decade ago that would reload a /register page on different proxies to generate a list of these sorts of questions. You'd run it for a moment, feed an answer for each question into XRumer, and then continue on your merry spammy way.

These ReCaptcha topics on HN really illuminate how few people have dealt with any real spam, much less targeted human or botnet attacks.


Can you teach it to play hangman? I'm thinking about digging out and dusting off my old perl cgi games. It might even keep humans out that don't have my sense of humor.


Most sites use reCAPTCHA to protect against bot registrations, sometimes targeted to their specific registration system. Any simple solution would defeat the purpose of a CAPTCHA.


Would it though? If each system was unique and rotated through different types of challenges, the bot would have to be custom tailored to handle every challenge type.

i.e. free form questions, count the gray dots (vision impaired friendly), math questions, play tic-tac-toe and get a stalemate, ascii hang-man ... I could think of hundreds of different challenges. The bot would have to constantly adapt and the bot developer would have to really love puzzles. The bot consumers would have to update constantly and would have to learn all the challenges of each site owner.

As a bonus, google the all knowing, is less knowing and no longer a gate-keeper.


The spammers can also use cheap overseas labor to update the bots.


I would be honored to help them feed their families. It would be a fun game of cat and mouse. Based on discussions here, it sounds like people have already automated Google captcha. I will go ahead and work on a few of my own and see what happens.

Maybe we can turn this into a public competition.


Looks like you have a great business plan at hand. Beat Google with a potentially superior product and help employ some people in developing countries!


idk, recaptcha solvers are about $3/1000 or something. That sounds like a very low hurdle to mess with most websites if that's their line of defense.


We still lock our car doors even though you can jimmy them open in a few seconds.


Yeah, but you won't build a business on that, you'll add GPS trackers, make renters show you their drivers licenses etc. A captcha isn't "wrong" in general in my mind, it's just not something you add and you're all done, and it shouldn't be your first line of defense. It can be part of a multi-pronged anti-abuse strategy, but it's a very tricky part: it doesn't offer a lot of protection but creates a lot of friction for actual, legit users. Running a DNA test on somebody can be a good way to verify their identity. But asking for their ID card and looking at the picture is a lot quicker, cheaper and less intrusive.

I don't see captchas a lot, because I'm not frequenting sites that use them. A friend of mine apparently does, so often that he pays for a captcha solver while he's sitting in front of his computer. He just can't be bothered to play Google's mind games, so he'd rather pay a few cents a day to not deal with it.

We've come to the point where humans are paying for services that were created for bots so they can bypass technological hurdles that were meant to tell humans from bots.


I remember a site that had such questions, specific to the knowledge of the site in question. Imagine HN asking "The L in SQL stands for: _____"


Disabling browser API that adblockers/privacy protectors use. Fingerprinting users with adds at stackowerflow. Now collect information how user navigate webpage. This is a scary trend.

And I am not speaking here about how Android and Android apps (which is allowed by Google) track users.


https://hcaptcha.com/ seems to be a viable alternative. If you are a developer, please consider using something other than reCaptcha. Not only is it annoying, but a privacy nightmare as well.


> earns website owners money

> an open decentralized protocol for human review that runs on the Ethereum blockchain.

Is this basically a JS crypto miner?


No, it is similar to recaptcha -- users select pictures to given description. In case of recaptcha, the work is free labor to google whereas in this case, it is paid to the site owner.


We provide dataset annotation services and pay out to sites based on what companies pay us.


I guess the big question is accuracy -

If you have a brand new dataset, couldn't bots assess the first few thousand images randomly and get through (since there is little or no basis for what is an accurate selection)? And if they do, how would that affect future real human selections (assuming it learns over time what selections are accurate)?

Another concern is that it's very likely that Google's existing Cloud vision ML could handle most classification challenges your clients are trying to train (since you're basically working against a much wider-deployed mechanical turk dataset, recaptcha). High-profile websites (such as ecommerce sites) may have attackers (such as those with stolen CC's) willing to spend the money needed to run all of your images through Cloud Vision. So I guess my question is: are other data points collected to prevent bots from getting through?

I would understand if you can't answer some of these as they may fall under "trade secret" territory.


I work on bot detection, so I should be careful not to leak all of our approaches, email me at amir@imachines.com and we can have a more in depth offline conversation.

Since our captcha provides an opportunity for website monetization, we expect different uses aside from just bot detection, for example as a replacement for the "disable ad-blocker" popup or replacing paywalls with micropayments. This means there will be a broader set of users who are not strictly focused on attacking our dataset and polluting it with bad results. This allows us to have a confidence model initially based purely on the site.

Having a state-of-the-art AI is table stakes for a captcha product. We already run our datasets through visual recognition systems and run our captcha with an AI model-in-the-loop. In beta now, we offer websites under attack offline bot data in the background, currently as a batch report, and soon as a webhook. This approach has a game theoretic advantage of not leaking results to attackers, and allows us to run non-causal analysis of different attacks over a wide period of time. By combining this approach with a variety of rotating challenges we can identify patterns of behavior consistent with bots as they continue their attack strategy against only the mix of challenges they have seen.

There are also services where you can pay for people to solve captchas for you and this is a different sort of attack from bots, since they are in fact humans signing up for hundreds of accounts. If your goal was to prevent fraudulent signups, or to host a give-away for example, then we can have days of time to perform an extensive analysis offline, and perform an epidemic analysis of the traffic.


thanks, I work on hCaptcha, we are hoping to provide a unbiased bot-detection system with different incentives from an advertising company. Let me know if you have any questions.


Have any services like AC or 2captcha integrated against you?


we've seen people asking them to integrate us on social media. you should see my comment about epidemic analysis above.


It won't take long if you guys get popular. I don't say this as a rip - I just have never come into contact with your CAPTCHA.


Google is trying to kill the competition by purposely introducing weird bugs here and there, taking advantage of the fact that they own the most visited sites on the web.

They've been doing it for a while now, Tech Altar even had a video about it the other day: https://www.youtube.com/watch?v=ELCq63652ig

Along with the censorship and privacy issues, I guess it's time for them to change their payoff, "don't be evil".


>I guess it's time for them to change their payoff, "don't be evil".

They dropped that years ago, literally and effectively.


Ah, really—wasn't aware, thanks.


I'm... actually struggling to see what this dark side is. The data is collected under a non-reuse agreement. It's specifically there to make a good captcha. There are other captcha vendors, and they don't make that promise (and I can think of at least one who admits they collect and resell data via captcha).

So the downside here is that no one has a credible way to compete with Google? Maybe because their Google cookie actually is a pretty good indicator of humanity?

That's nonsense. Tons of people do. There's LOADS of great research on captcha that isn't implemented by any vendor. The roadblock is that NO ONE WANTS TO, because it's a thankless, unprofitable task that puts you dead in the crosshairs of a ton of very organized people who will devote huge resources to circumventing or breaking your offering.

"A land grab," sure. Of a nuclear wasteland covered in small arms battles.


It's a land grab of the general web and browser market, not of the captcha market. They're using the captcha to disincentivize users from using browsers other than Chrome or from not having a Google account. And now with v3, it's supposed to happen on every page of the web? It shouldn't be too hard to see it's a disaster.


> They're using the captcha to disincentivize users from using browsers other than Chrome or from not having a Google account.

It has absolutely nothing to do with Chrome. And anyone who is sane has switched to Firefox and is now patiently enduring how lousy it is by comparison because ad blockers are sacrosanct.

> And now with v3, it's supposed to happen on every page of the web? It shouldn't be too hard to see it's a disaster.

Yes, site owners gotta opt in to captchas. Most sites already have enough connections to Google on every page they could already do most of this work. But that's unethical.

Ultimately, a increasingly sophisticated statistical analysis of users is the only reliable way to get robots out of spaces meant for humans. Our social media is crippled by robots masquerading as humans for the profits of various agencies who's names you aren't even privileged to know, but you're concerned about opt-in countermeasures because... Why again? That in a dark future every mom and pop web shop is gonna have sophisticated log analytics at their disposal, either because free software finally gets off its ass or because state capitalism does what it does and awards all the business to 1-2 competitors?

To me, you're arguing about the color of the insulin bottle rather than pointing out how absurd the system that can cheerfully jack it's price 10x is.


> The data is collected under a non-reuse agreement

Oh you sweet summer child. Would you by any chance be interested in buying a bridge?

Despite that, even assuming if it's true and we'll have a lovely accurate AI captcha system. The big down-side is that captcha is breaking programmable web. I maintain a lot of small software crawlers from simple notification applets to bigger analytic crawlers and the web in the past few years has been increasingly hostile towards this. API endpoints are disappearing and in general make distributing free apps very difficult. The web-crawlers are super simple to write and even maintain but services like cloudflare, distils, captcha break them and while there's always solutions to these systems they are very hard to distribute to users (you can't really pack in pupeteer, selenium or some other webengine automation stack with your app).

Public data should be public. These sort of idiotic measures are not compatible with web protocol. The web only know one thing - 1 IP address == 1 person and it should be encouraged not dismissed.


> Despite that, even assuming if it's true and we'll have a lovely accurate AI captcha system. The big down-side is that captcha is breaking programmable web.

I actually find this argument to be a bit compelling, if I'm being selfishly honest. It's super annoying that crawlers are so awkward to write these days, and I miss the days when they worked better.

> but services like cloudflare, distils, captcha break them and while there's always solutions to these systems they are very hard to distribute to users (you can't really pack in pupeteer, selenium or some other webengine automation stack with your app).

I don't disagree, but I also think we may be asking to keep model T's or gasoline driven 1-person bikes around. These technologies made more sense once, but make much less sense now.

> Public data should be public.

Sure, but what you don't get to mandate is how their public. If someone wants to make public information available in a specific way and you don't like that way, the burden is on you to republish it. Outside of a very narrow accessibility scope, I'm neither legally nor morally obligated to cater to your specific needs. And indeed, as a service or data provider I have my own problems.

It's by no means an imaginary threat CAPTCHAs are solving. This is not a classical phantom security issue that statism uses to justify authoritarianism. It's equivalent to locking my doors when I leave a shop or making sure that my wares are properly labeled and not spoiled.

> These sort of idiotic measures are not compatible with web protocol.

The web protocol as you envision it hasn't been compatible with reality for a long time now. Hell, your crawlers are themselves a violation of the spirit of the original web. You are part of the very problem you're railing against!

> The web only know one thing - 1 IP address == 1 person and it should be encouraged not dismissed.

I suspect this statement is why you got downvoted, for what it's worth.


No I'm getting down-voted because hackners is notoriously pro corporate medium - of course people don't care about public data and data freedom here.

> It's super annoying that crawlers are so awkward to write these days, and I miss the days when they worked better.

It has never been easier to write crawlers witht he exception of purposfully built in barriers. Just check youtube-dl

> I don't disagree, but I also think we may be asking to keep model T's or gasoline driven 1-person bikes around. These technologies made more sense once, but make much less sense now.

What are you on about? For example to get around some crawler protections you need to execute js with some specific stack of libs. Distributing crawler.py vs distributing a whole stack is much more difficult.

Your logic makes absolutely no sense. In the web there is no distinction between who is behind the ip address. It's a net of ip addresses and headers, right? If I'm asking for your resource that you choose to serve publicly I only need to give you my IP and some http cruft, right? So now it turns out you don't want to serve _some_ ip addresses.

Now you have to introduce an extra layer that is not part of the web - a layer that is incompatible with your goal. You need to use javascript to fingerprint your client - except you know what? client is the one executing your fingerprint code so they can send whatever they want to you. I've never seen more idiotic medium. On one hand I get job security on the other the web is absolutely broken by complete bafoons who have zero logical capabilities.


How can I complain? Where to go? How can we organize? I want to do something about this!

This is complete and utter bullying. Bullying on user privacy, bullying on Firefox.

Somebody please tell me where to go?


Apart from the privacy nightmare, couldn't this also result in discrimination?

> For instance, if a user with a high risk score attempts to log in, the website can set rules to ask them to enter additional verification information through two-factor authentication.

Seems to me, this could easily flag genuine users who access the site through a non-standard flow - e.g. because they use assistive technologies. In the worst case, this could result in impaired users being forced to jump through additional hoops - or being blocked completely.


Smells a lot like using Google's virtual monopoly on bot detection as a way to push users into using one of their other products. Likely not a wise idea when the government is itching for an excuse to bring an antitrust case against you.


Google's captcha system is overkill for most websites. If I want to filter out bad actors (on a simple straight-forward site), there are other more simpler and easier to solve captcha systems out there. They might not have the rigour of Google's system, but they do the job, and well.

I would however use Google's system if the site is massive and there is the possibility that someone is using a script or some program to algorithmically bypass the (simple) captcha, and register accounts en-masse and trying to create a psyop[0], or disinformation campaign, or even a sockpuppet army.

[0] https://en.wikipedia.org/wiki/Psychological_Operations_(Unit...


There are diminishing returns once a CAPTCHA gets past a certain point. Bad actors can (and do) just humans to fill out captchas all day. We get some spam submissions on our sites that I'm 99.9% certain are people in developing countries copy/pasting spam templates and filling out captchas by hand.


What happens is the captcha is farmed out to live operators who solve it.


I disagree. Adding Google's captcha is a 15 minute exercise. If i remember correctly, you copy/paste a snippet then add a callback in your own code. Whereas rolling your own captcha implementation would take much longer and be worse.


And it is this very convenience that has countless sites using it. As I said, there are other systems which are just as easy to implement as Google's and which are not overkill and also more privacy friendly (Google's CAPTCHA is known to fingerprint the user using heuristics like mouse movements, screen resolution, etc).


> there are other systems

Like what?


There are many easy to use libraries specific to different languages(like https://www.phpcaptcha.org/ for php) and frameworks. These are not as secure as recaptcha, but in most cases does the trick. There are also services similar to recaptcha like solve media and hcaptcha. I believe hcaptcha is a drop-in replacement(https://hcaptcha.com/docs).


Which Captcha systems would you recommend?


On that note, has anyone tried hCaptcha? I am considering using it on a project


I work on hCaptcha, let me know if you have any questions.


> trying to create a psyop[0], or disinformation campaign, or even a sockpuppet army

That's not a viable reason. Anyone doing so is going to have a budget and human reCAPTCHA solving is less than $0.01 per CAPTCHA. It costs very little for mass account creation, reCAPTCHA or not.


Stupid question: why do companies care so much about bots to the point of degrading the customer experience significantly? I can understand for things like public forums. But like why would an ecommerce website ever put a captcha between you and your order (or a news website)?


An example from another comment: bots checking stolen cc numbers which then results in high numbers of charge backs and the potential for getting blocked by visa/mastercard.


For example: - bots sign up with email addresses that are owned by other people that don't appreciate your welcome/activation/etc. mails. - all that automatically generated data can start to hurt performance. Especially on a smaller site, having millions of useless users in your database can slow things down significantly.


That's one thing, but like why would the FT put a captcha on the login page. I am not signing up. I just want to access a website I already paid for. This is just terrible UX.


I think it's again to mitigate against potential bad actors attempting to access legitimate users' accounts.

You could use other methods but there's always tradeoffs, e.g., let's say that instead of using a captcha you just temporarily block login attempts to some account after X failed login attempts. This has the advantage that it's faster for legitimate users as you don't need to complete the captcha; however, the main disadvantage is that you can then get an attacker brute-forcing logins (even if they don't really care about getting users' credentials) which can disrupt your website by preventing potentially thousands of users from signing in.

In my opinion the captcha is the least bad option from a security point of view, as long as it has an alternative accessible mechanism for example for blind users.


An attacker can use a botnet to try to login your site from a list of username/passwords from the latest big data breach.

They will get likely access of a small percentage of those accounts and you have to do damage control.


Entirely disregarding how it is browsing with Tor and finding ReCAPTCHA so often due to Cloudflare, it's a bother even on a website I regularly use over my normal connection.

Bandcamp is an online music store and I'm prompted for a Google ReCAPTCHA every time I try to log in, which really causes me to do it less often than I normally would, as I must permit Google JavaScript for it to succeed.

I've wanted to send a complaint about this to Bandcamp, but their email is hosted by gmail and none of my messages get to them because I host my own email. Adding reverse DNS and SPF is enough for many email servers, but not Google.

I find it a bad situation that my experience with a business is worse, due to Google, and I can't even contact them to let them know, due to Google.


> “Google is so deeply integrated with the internet,” Khormaee says. “We want to do anything we can to protect it.”

This is outright creepy.


I know people are (rightfully) worried about centralisation on the Internet, but I still wonder how come there's virtually no "competition" to reCaptcha. Even from one of the "centralised" players.

For example, even Cloudflare, which has its own "checking your browser" protection, still uses reCaptcha in some other cases... Why doesn't Cloudflare offer a reCaptcha alternative to their customers? (a transparent one, more like reCaptcha v3 rather than the intrusive 5-second one...).


> According to two security researchers who’ve studied reCaptcha, one of the ways that Google determines whether you’re a malicious user or not is whether you already have a Google cookie installed on your browser. It’s the same cookie that allows you to open new tabs in your browser and not have to re-log in to your Google account every time.

I try to never be logged into my google account as a matter of principle. Maybe I'm just fooling myself thinking this will make tracking me more difficult.


You don't need to log-in, to be tagged with a cookie and subsequently be identified. Sometimes referred to as "anonymous authentication".


Or "shadow profiles".


I'm much more worried about ideological persecution than targeted advertising. It's very easy to think you are doing the right thing, everyone thinks that.


Next step: recapcha pro. $29 a year (by CC or Googlecoin of course!) to browse the web free of recapchas and be able to use your favourite sites again.


$29 to spam the web to my heart's content? I'll take infinity of them!


You did read the ToS? It can be cancelled any time for any reason, and you'll never find out why.


reCAPTCHA apparently doesn't stop bots well at all.

When I published a brand new site, I got thousands of bot sign-ups in the first couple weeks. reCAPTCHA apparently had no effect on stopping them. The bots signed-up with real user emails, causing my site to send unsolicited email to them, which affected my domain's email reputation significantly.

I rolled my own invisible CAPTCHA and immediately stopped ALL the bot traffic.


This is 100% accurate. There is no way to stop bots through reCAPTCHA, doesn't matter the version. It's honestly trivial and can be done with even the most basic web automation skillset.

Rolling your own CAPTCHA is a fantastic option, because folks like 2captcha are never going to take the time to integrate against a one-off solution. When you do something like that you drastically increase the barrier by introducing the need for a reverse engineering skillset to bypass your unique solution... and that skillset is expensive lemme tell ya ;)


I had to test a simple contact form implementation with reCAPTCHA yesterday on an office worker's PC that was running Chrome.

I spent all day clicking on sidewalks ans traffic lights! Or buses. No idea what they want the clicks on buses for.

I actually had problems upstream of the CAPTCHA. This was on a Wordpress site and I was patching up the Contact Form 7 implementation on there.

What shocked me was how naff Wordpress is. After however many years it does not come with a contact form built in. Comments yes, but a contact form, no. Then the fairly de-facto Contact Form 7 would not work with Google Captcha 3 and the latest V5 Wordpress. So there must be hundreds of sites out there with contact forms that do not work. Then there are people cussing ReCaptcha when there is this hideous mess of bloat going on.

The Contact Form 7 didn't even use HTML5 form validation and styling it was a nightmare.

I eventually went to CAPTCHA2 with the box you tick. Having the v3 box in the bottom right of the screen on every page was not what the client wanted. Plus it didn't work with this kludge known as Wordpress.

I think the issues raised in the article are not that big a deal. If you have logged in to Chrome and you are on your normal device and IP address then you can get a free pass. Why not?

I seriously advise anyone to test their implementations on a non-logged in PC, it is an eye opener. And a time consumer. But forms have to be made to work. You can't have people locked out.

There is a lot to be said for backend validation based on form data, I like to make forms unique with a hidden timestamp in it that is MD5 encoded. You can then see if someone has spent long enough on the form for it to be 'real'.


Any app should be clear and upfront about what data it collects and how it collects it, and what it does with it.

The platforms - web-browser or operating system - that run these apps - web app or native app - for the benefit of that user - should provide well-understood intuitive experience around what is allowed/possible to be collected and used from the user's device.

Now, technical mechanisms are one major part of the solution. In this regard, these mega corps should be held to a higher standards as they run the platforms as well as the biggest apps on those platforms.

But we also need legal protections that make both the application owners and the platforms owners responsible for any abuse of the user.

This particular case is eerily similar.

Credit card fraud prevention companies do the same thing - they say they need to know as much transaction data as possible in real-time for them to know which is a legitimate transaction and which is a fraud transaction. There is misdirection and fog around how they justify this with thinly veiled technical explanations about network effects and criticisms about monopolistic by design.

The reality is fraud can be prevent by designing the product differently in the first place - chip & pin - multi-factor authentication etc. technology is present to prevent theft and fraud without having to collect so much data centrally.

In this case, similarly, to prevent DDoS attacks, there are other anonymous non-data collection oriented solutions possible. More research and collaboration is needed to evolve the Internet architecture to react to DDoS attackers and other types of technical abusers of your app, catch them and prevent them from growing. Instead, we get these centralized monopolistic solutions.


>For instance, if a user with a high risk score attempts to log in, the website can set rules to ask them to enter additional verification information through two-factor authentication.

This defeats the purpose of using 2FA: to require a second factor every time you login. This completely negates the benefits of 2FA if a hacker has gotten my username/password through a keylogger. It's easy enough to get a good score if you just disable tracking protections and login to a Google account, then a hacker can easily break into your account. I was thinking it had to be the author not understanding how 2FA worked, but Google is actually advocating this ([0]):

> login With low scores, require 2-factor-authentication or email verification to prevent credential stuffing attacks.

You would think the people behind recaptcha would understand how 2FA is supposed to work.

[0]: https://developers.google.com/recaptcha/docs/v3#score


I think "real" 2fa isn't going to care about what ReCaptcha v3's trust score is, the case for this is "this user has a recaptcha score of 0.2, they don't have any cookies for this site, and [other system] also thinks they are suspicious. Let's require extra email verification and/or ask to answer a security question".


Recaptcha is only used on 25% of top 10k websites? Anyway, I’m very angry about the way the web is made to work, especially identity, and usually I would spew a bunch of anger and swear words describing how stupid it is and all the people who blindly support it but I don’t want this to be censored so instead I will behave myself!

This is hilarious because this is the worst, most needlessly complicated solution to identity that one could ever imagine. It’s funny how it apparently takes a PhD to tell you that google isn’t analyzing your behavior on the website. They track you across every webpage you visit that has their code running. And it goes way beyond cookies. Look at the filter bubble research that duck duck go did — they have an idea of who you are regardless of what cookies you have or whatever else. And this data has informed captcha results before this latest iteration. It’s complicated, needless and also gives a bunch of sensitive data to a private company that shouldn’t have it. Nobody cares.

Identity services are the alternative to this. Have a company that has multiple ways of verifying identity including operating physical locations where you can show up and prove beyond any doubt that you are person X. Once identity is established, something like a yubikey or whatever can be used to authenticate various things like making an account on a website or what have you. If you get hacked then you can rectify by engaging in one of the identity verifications tasks, up to and including coming in person and being biometrically verified with absolute certainty. The company would make money with modest fees to users and charging websites to use their service to verify their users.

It should be that the government has all this in place, and you can use secondary identification numbers for websites. But I’m the United States identity is broken, based on a shitty Ssn where if it’s stolen you are basically fucked.


> According to two security researchers who’ve studied reCaptcha, one of the ways that Google determines whether you’re a malicious user or not is whether you already have a Google cookie installed on your browser.

This makes sense to me. The presence of cookies is a strong indicator of normal human browsing, and Google would only be able to see their own cookie.


Except a lot of people don't like Google having persistent cookies that track your web usage. Why should giving up that data be a prerequisite for accessing a website?


It's not a prerequisite for accessing the website, it's just a prerequisite for skipping the captcha puzzle.


> Because reCaptcha v3 is likely to be on every page of a website, if you’re signed into your Google account there’s a chance Google is getting data about every single webpage you go to that is embedded with reCaptcha v3—and there many be no visual indication on the site that it’s happening, beyond a small reCaptcha logo hidden in the corner.

There’s a potentially bigger risk being overlooked. Google can execute first-party script. This means they get every user’s session credentials and can freely impersonate that user. So can all the other trackers in use.

I don’t understand how anyone thinks this is remotely okay.


I get score of .7 in Safari. .3 in Chrome if not logged in/paused, but a .9 if logged in. Always hate that in non-logged in chrome browsers I have to fill out those picture questions a 100 times seemingly endlessly.


> Google encouraging site admins to put reCaptcha all over their sites

hahah, wow. Put this tracking software suit on every page please - it's for your own good!

I used to kinda believe in Accelerationism[1] - philosophy where you encourage and accelerate flawed systems to promote a breaking point. However it turns out our society doesn't really have a breaking point.

1 - https://en.wikipedia.org/wiki/Accelerationism


Yet, I can't visit websites because I don't want be tracked. What's worse is this stupid thing even shows up on government, university websites ect...


I can't seem to find what rules for uBlock Origin and others will block reCAPTCHA v3.

I'm noticing that little blue bock in the bottom right corner of more sites now and I think I just want to block it entirely. If a site won't let me login because of it, guess I'll just stop using it. If it's something like my Bank .. well guess I gotta get a new bank.

Nothing really excuses this level of potential tracking.


Not sure how this is any worse than the Facebook "like" button that's on every website on the planet.

Sure, we hate that, but it's there, and plus, Google probably already has a cookie on your computer anyway.

Given all that, this seems like a very minor privacy loss to whine about compared to all the other crap that Google/FB/etc does.

Please help me if I'm missing something unique about this particular issue.


Same box, same physical location. Chrome I get a score .2 higher then firefox, despite the fact that I never use chrome for anything...


If it takes me longer than 5 seconds to access the content I'm looking for, I close the tab. Am I alone in this?


I doubt you're consistent with it. You'd probably wait 10+ seconds to post an HN comment for example.

Easy to say you'd bounce on a website you didn't care about. That's a bit of a tautology.


reCAPTCHA is often used at sign-up time. Unless I'm paying a parking ticket or something, if I can't get through your sign up process in a few seconds, then you've already lost my interest.

There are so many SAAS products now, and it doesn't take much to lose out on potential users. It doesn't really impact my life much because, as you said, I probably don't have much investment in it in the first place.


How else do you want to detect humans in a widely-used centralized service like reCaptcha? This is the result of laziness of website developers. There could be thousands of custom captcha implementations but instead, most devs just put recaptcha there and they're done.


Could anti-competitive suit be raised towards google regarding their reCaptcha? They aiming to have (if not already) a crawler blocking tool that obviously doesn't block their own crawler.

Doesn't that sound a bit dystopian?


You can block googles crawler with robots.txt


That's not very relavant to my proposed problem. Here's an immaginary scenario: recaptcha blocks all crawlers except google's ones, how are competitors supposed to compete if they can't crawl anything despite robots.txt or whatever?


The purpose of recaptcha is to block malicious bots from accessing parts of a site that automated bots are not supposed to go to.

Why would someone be using recaptcha to block bots from the general part of their web site? Your imaginary scenario would require web masters to become hostile to crawlers.


> web masters to become hostile to crawlers

You must have been living under a rock for the past decade, haven't you? Web masters are definitely hostile towards webcrawlers. There's an entire platitude of "web-crawler" protection services - cloudflare for example is probably the biggest one.


`It works best with context about how humans and bots interact with your website, so for best performance include reCAPTCHA in many places` O.o


_It works best with context about how humans and bots interact with your website, so for best performance include reCAPTCHA in many places_ O.o


My confidence in Google reCAPTCHA is diminished by their admin site.. can't edit/change "reCAPTCHA type:v3" settings.


Is it possible to inject code that will provide that kind of reCAPCHA fake but looking as legit data and bypass its check that way?


Time for a separate computer just for when i am forced to interact with websites using Googles recaptcha for paying bills etc


What are the open source alternatives to this?


Luckily Idon't need any of the sites that ues this obnoxious tool.

Just vote with your feet and tell the site to drop it or lose you.


Yet you're on a website that uses ReCaptcha.

Just tried to register via Tor and got this: https://i.imgur.com/svjfLqo.png

Kind of toothless to say "I'll NEVER use ReCaptcha (except for websites I want to use)!" In fact, I'd go a step further and assert that, while you complain about ReCaptcha, you're actively benefiting from HN using ReCaptcha since you see less spam day-to-day because of it. ;)


I don't remember using recaptcha on HN.


What about Google Analytics? They already have all my browsing history. reCAPTCHA data is not even as detailed as GA.


Difference is you can block Google Analytics and the websites continue to work.


You can roll your own captcha as well. Nobody forces you to use reCAPTCHA v3.


Don't 'whatabout' us and change the subject, we're talking about reCAPTCHA now. If you must know, the key difference is that you can block Analytics and still use the site, but are denied access if you block CAPTCHAs.


I now leave any website that has google recaptcha unless it’s really necessary for the same reason


I just want shout out a previous article published here in HN that sparkle a quite interesting conversations.

https://news.ycombinator.com/item?id=20158386

Long history short, don't use a captcha if you don't need it. And most of the time your website don't need it.


Is it just me or is the site broken? I see it for a quarter second, then the page turns blank.


Wait until they make google-webassembly-blobs mandatory, it will get even worse.


add this to the list of how Google's open source engines work! ex how they treat captcha in other browsers ( easier captcha/validations for chrome users and beating the crap out of Firefox users)


Everything Google does has a dark side. They are a parasitic corporation.


“Google’s ... has a dark side” is valid for almost everything they do.


not sure how Google can keep making their product worst and keep getting more customers...


Every 3rd party service has a dark (or grey) side. This is a trade-off for offloading some functionality onto 3rd party.


you will identify yourself to google, or you will be denied the web.


There is a light side?


And the bright side?


Congratulations to the reporter who found reCAPTCHA after 10 years


Is this a surprise to anyone in this community?


It also doesn't block bots


there is a light side?


Remember every time you complained about China's citizen score system?

Now remember that every interaction with the government must happen online (from requesting a US visa to going to the DMV), and all those forms are behind a Google(R) Captcha(TM) censorship system, which ranks users based on how well Google(R) can monetize the current user browser session. Let that sink in.


This is probably good for proxy users, since it probably isnt just tracing a polluted IP anymore

But Im not sure, since the browser sessions for some proxy users like TOR exit nodes are so short


> one of the ways that Google determines whether you’re a malicious user or not is whether you already have a Google cookie installed on your browser.

... can we please get a serious antitrust investigation now?


So you are so afraid of being tracked? you don't wanna give any ounce of your data? but you still want to have full access to the web, and for free?!! The world doesn't work this way. Either pay for what you get, or be prepared to accept ads/tracking. It is that simple.


I pay money to plenty of businesses and most of them still embed loads of advertising/analysis/tracking scripts. Seems less simple than you make out.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: