Hacker News new | past | comments | ask | show | jobs | submit login
hCaptcha now runs on fifteen percent of the internet (hcaptcha.com)
605 points by fab1an on Nov 25, 2020 | hide | past | favorite | 363 comments



I dislike the widespread use of captcha regardless of provider.

I realize anything connected to the internet will be subject to automated abuse, and it's impossible to run some types of services without taking some steps to defend against it, but it seems to me there's usually a way to handle that without invading the user's privacy or wasting their time. The exact details will vary based on the type of service, of course.

One particularly egregious misuse of captcha in a service I use presents one after I enter a correct username and password. An incorrect login says so without presenting a captcha. The potential reward for an attacker who successfully gains access to an account is high, so it seems almost certain anyone running a targeted attack would defeat this by handing it off to a human upon detecting that they had a good account.


> it seems to me there's usually a way to handle that without invading the user's privacy or wasting their time

As much as I agree with your dislike of captchas, I don't think this is true at scale (unless universal online identities existed, which could and should include anonymous identifiers by design). When you need to accept information from anonymous users (comments, votes, forms, registrations), there's no way to not invade users privacy and not waste their time, unless you are manually filtering / moderating all the input data, in which case you can't really say it scales. You might say emails can solve the problem. Well, they don't really solve the problem against dedicated attackers / spammers, and they do invade privacy for the average user. You can use statistical approaches to try to reduce privacy invasion or others, but I don't know of anything that really solves the problem without manual identity verification at some point.


I built an alternative[0] that takes a proof of work approach. As a site owner you set the difficulty that makes sense for you: so perhaps you would want 20 seconds of computation before you can submit. The nice thing is that this can happen entirely in the background while the user fills in the form.

Also with multiple requests from the same IP in a short timespan, the difficulty increases.

There are downsides to to any captcha, but in my opinion make a much better tradeoff. Accessibility and privacy are respected, and there are no annoying tasks.

[0]: https://friendlycaptcha.com


proof of work really doesn't work well in practice. spammers have huge farms of compute, often on residential ips, and legit users are accessing the service from a device that is often power-constrained (like a phone). you end up either hugely penalizing legitimate users, or having to employ many of the standard antispam techniques (IP/ISP reputation, captcha, rate limiting etc) on top, so the proof of work adds a lot less incremental value.


It's not perfect, and you are right about the downsides. These resources that spammers have can be applied as easily to re/hcaptcha (either through ML or clickfarms). No CAPTCHA will actually lock out targeted attacks.

The difficulty increase per IP can be seen as a form of soft rate limiting, it's shared between all websites (which is where it's different from ordinary rate limiting). In the future we may use IP reputation lists to guide the initial difficulty too - but we haven't implemented that yet.

I think that no perfect captcha can exist, which is inherent to the problem. Proof of work makes different fradeoffs, and perhaps it is cheaper to attack still - I think it's a much more friendly solution for users though (accessibility, privacy, simplicity, fairness, UX).

Maybe in the future the solution would be something like this: a long PoW-based captcha that runs in the background as well as a vision task for the user, whichever gets solved first.


I get re-captcha'ed all the time from the same IP. And if I don't use Chrome, the captcha count is like 4x-5x higher just for using Firefox.


That's why I have even stopped using google services. If I literally have to get another browser to use your snowflake site, then why would I use your service anyway?


This reminds me of a similar solution I saw on PH last year, I think it's a great alternative for smaller websites that are less likely to be targets for spams/bots

But say, there's a website and it's a likely target, you implement IP protection, fine, the user uses residential proxies. Now your best bet is to go off fingerprinting, but there are marketplaces which sell those too in bulk.

Maybe I'm wrong, but wouldn't the best approach be to stick to human interaction puzzles, which are hard and don't have a set way to solve by a machine(for now)?


bangladeshi click farms[0] are cheaper to use to bypass captcha than renting residential proxies to solve PoW. Also image captcha cannot scale automatally in difficulty (as an incident response) but PoW can (see how bitcoin adjusts with the miners)

[0] https://2captcha.com/


Just did the math from the numbers on their site and on average a "worker" doing captchas for them gets paid 0.2$/hour.

Adjusting based on average monthly salary in Bangladesh (157$) [1] and the US (4056$) [2] that would be similar to an American making 5.2$/hour which is surprisingly close to the current minimum wage in the US (7.25$/hour) [3]

So I guess this must be a fairly decent way to earn money if you're young/poor in Bangladesh...

[1] https://tradingeconomics.com/bangladesh/wages#:~:text=Wages%.... [2] https://www.thestreet.com/personal-finance/average-income-in... [3] https://en.wikipedia.org/wiki/Minimum_wage_in_the_United_Sta...


5.2 is not close to 7.25. That's like 30% less than the minimum wage. Minimum wage itself is a massive struggle but 30% less is just plain offensive and dehumanising.

> So I guess this must be a fairly decent way to earn money if you're young/poor in Bangladesh...

Solving dumb captchas is never a fair or decent way to earn money, not when you are poor and definitely not when you are young. Creating living conditions for other human beings where they can be easily exploited and used for mindless degrading work such as solving dumb captchas is one of the most grotesque things of the 21st century.


It's a bit like locking your bike. That doesn't work against targeted attacks, but the presumptive thief is more likely to choose another bike that has a smaller or no lock.

The arms race is bad for everyone, in both examples, but the underlying problem is a fundamental one of misaligned incentives.


Ignoring the other criticisms because they generally seem valid, to everyone saying that proof of work doesn't matter because bots can just use more machines, that depends a lot on the economics of any specific automation project. I scrape a little data here and there, and a reliable proof of work system costing ~20s on a commodity core would make some of my personal projects cost tens of thousands of dollars monthly. Maybe that's worth it to someone (e.g. if they have an army of hacked machines without anything better to do), but I think it'd keep a lot of the riffraff out.


> FriendlyCaptcha will prevent 99.9% of spam

For someone who has little expertise in this specific field, how are you calculating this?


As a counter-point, the uncaptcha[0] research project used Google's free Speech-to-Text service to solve reCAPTCHA at a reported 85% success rate.

I'm convinced CAPTCHA are no better than fake/dummy security cameras.

[0] https://github.com/ecthros/uncaptcha


Admittedly it's not calculated so it may be a stretch, it's based on the assumption that the vast majority of spam out there just looks for forms to submit without smarts (which is also why honeypots can be pretty effective, especially if you have a small website that nobody will take the effort to work around it.)

I've seen people report that they have reduced spam to near nothing already with just a honeypot, but of course I can't verify those claims.


Judging by the downvotes (despite answering the question truthfully), I see it's not a good way to present ourselves, and frankly we don't have to make that claim. It's hard to estimate the real percentage, our customers are happy but measuring what is no longer there is tricky in the real world.

I will change the wording on the website and remove the percentage.


People take quantitative claims seriously. I wouldn't make them without being able to defend them in an intellectually rigorous way.


> I've seen people report that they have reduced spam to near nothing already with just a honeypot, but of course I can't verify those claims.

Can verify from personal experience. I once implemented a simple honeypot approach on a small blog site. It immediately cut down automated "drive by" comment spam to almost nothing. I never tried to quantify it, but it was the difference between dozens of spam comments a day and maybe one or two a week (which I assumed were probably manual submissions).

Most spam bots are pretty unsophisticated it seems, and do not pay any attention to a honeypot field being hidden either by CSS or JS.


It should be fairly easy to set up two open wordpress blogs, one with the captcha and one without.

After a few months you check how much spam arrived at either and get your number?


How do handle low-end devices? Do you reduce the difficulty for them and can this be abused by pretending to be a low-end device that really isn't?


Everybody gets the same difficulty initially which you determine as a site admin, so one should base this on their audience (e.g. Gitlab would have a different device profile from a government website).

The solving can be a few times slower on a low end device which you should keep in mind. To aid with this when setting the difficulty for your website it shows you an estimate for various device types. This is indeed a downside of PoW approaches.

There is one factor that helps: you can start solving as soon as the form loads, so as the user enters their details/comment it can start solving - I have a hunch that people on mobile devices are inherently slower at entering their data which should help a bit..

Anyway - if you set the difficulty quite high and the solving takes 30 seconds, it takes the user 15 seconds to enter the form - the user would still have to wait 15 seconds. That's not very different from the time to solve image captchas (it's actually lower and doesn't come with a 2MB payload download which isn't great on phones either, and they can keep their privacy + sanity). You could give the user something to do that makes sense for your website (ask them for feedback?).


> I have a hunch that people on mobile devices are inherently slower at entering their data.

In general, I use form autocomplete to fill this quickly. And on the contrary, my mobile signups are faster than desktop because the lastpass firefox extension on desktop takes longer to detect the form before the autocomplete can begin than my phone does.


Why bother with a proof of work scheme when you can just rate-limit directly? It accomplishes the same thing, while eating way fewer CPU cycles, doesn’t require JavaScript, and guarantees uniform cost between all client types.


This sibling comment was responding to you: https://news.ycombinator.com/item?id=25215024


So, if your app takes off on a college campus, you’d block them?


Do you collect metrics on how many people end up waiting for the PoW and share that with the admins?

In all, this sounds really promising though. I’d venture that most spammers already have higher end machines than end-users to solve existing captchas.

I’d probably approach this with a different strategy. I’d send an encrypted time stamp as the nonce. On the client, I’d first do a few easy PoW tasks and estimate the PoW difficulty for the given machine that would take at least X seconds to do. Then, send the PoW, the encrypted time stamp, and difficulty to the backend. If it’s been shorter than X (with a margin of error), or the PoW is wrong, it’s not valid.

In this scheme, it doesn’t matter how powerful the device is, a core is going to do some work for X seconds or at least be throttled by time.


> Why bother with a proof of work scheme when you can just rate-limit directly?

Tad amusing after all this time people still don't understand why proof-of-work schemes exist.

Rate limiting has zero cost to an adversary. PoW has physical costs. It's in the name :)


And Cloudflare already does that—that's what the "Checking your browser before accessing xyz—Please allow up to 5 seconds" message means. It's clearly not enough for them though, because they then go to also require CAPTCHAs.


First one to make this mine an altcoin for proof-of-work wins.

But seriously I like the idea, although it seems trivial for someone to attack a protected site by exhausting its subscription level? Are there any protections against that?


We don't disable the service if a protected site goes over their limit.

Right now we manually look at the limits and are reasonable with overages - also we can see how many captchas were unsolved.


Wow, this is an awesome idea. I can imagine this could be extended to solve tasks to mine cryptocurrency. If you get attacked by a botnet, you would actually make a profit!


Proof of work by itself is nearly useless, unfortunately. Compute is cheaper than people. This is one reason why CAPTCHA services will likely be with us always.

As someone working in the field, I also doubt your claim "will prevent 99.9% of spam" is based on real data. Modern headless browser spambots are not deterred by this kind of approach.

(Edit: looks like the poster admitted this number was entirely made up later in the thread.)


I just tried loading the demo of Friendly Captcha in 8 browser windows, and click the verify button, refresh the window and on repeat for about 3 minutes. Not once did it tell me that I'm a robot so seems your alternative fails the most basic of captcha functionality, limiting people/machines to spam functionality that the website owner wants to be limited.

Maybe not everyone but a lot of people use captcha services to prevent automation from being used to extract/insert data. I know as a developer that there is always a chance of bypassing this, even with Google's reCaptcha, but your service seems to make this trivial, so many won't even go beyond your demo.


>Not once did it tell me that I'm a robot

Right, unfortunately you've completely misunderstood the point of Friendly Captcha, a question which is answered right there on its main page.

>>How does FriendlyCaptcha tell apart bots from humans?

>>It doesn't, FriendlyCaptcha adds a small cost and complexity for spammers that becomes large at scale.


Right, I guess it's time for you to upgrade the UI of your tool then, as when it's inactive it says "Anti-Robot Verification" and once the challenge is done it says "I'm not a robot", while in reality, none of those things are true, as you said yourself.

You might also want to rebrand to use a different word than "Captcha" as you're not actually telling robots and humans apart, you're simply adding PoW to an action, nevermind if they are robots or humans.

So instead of blaming users for misunderstanding your message, maybe try working on making your messaging a bit clearer so for the people who know what captcha is, don't get confused by your own definition of it.


Actually the user you're replying to is not the author of the service, from what I can tell.


Oh dear, it seems so. Thanks for letting me know, I guess I just assumed it would be the creator of the service who would defend it, not someone else, but seems you're right.


It doesn't work for me, comes back with the error: Verification failed: Background worker error undefined

I'm using latest Firefox on GNU/Linux. Admittedly I've got a lot stuff blocking all sorts of things, and I'm not really sure what's kicking to block background workers, but I'm glad it's blocked. Anyway, after disabled literally all blocking tools that I have, it still refuses to load.


That's not good, could you maybe provide more details in the Github repo [0]? The widget is open source, hopefully we can figure out what is blocking it here.

We test the captcha in browsers up to 8 years old and on many devices, do you perhaps have background workers disabled entirely? Here is a link to the widget on its own [1], does that have the same behavior? How about a minimal worker example [2]?

[0]: https://github.com/FriendlyCaptcha/friendly-challenge [1]: https://unpkg.com/friendly-challenge@0.6.1/index.html [2]: https://jsfiddle.net/christopheviau/90syrp0q/


Didn't try the other links but the jsfiddle link just says Preparing worker in Firefox here and neither button ever does anything.


Curious why it wouldn't start 'verifying' immediately on load? The fact that it runs in the background is really key--I'd hate to fill out an entire form, click the button at the end, and still have to wait around to submit.


You can change this behavior of the widget (data-start="auto" instead of default data-start="focus"), or you can start it programmatically.

The reason you wouldn't always want to start it in the background is if the user may not intend to submit the form (perhaps it's a form that is in your footer of every page and only a small percentage of users intend on sending it). Starting it on focus of the form is a good default.


So your solution is to technically waste electricity to replace captcha? It's for sure an interesting concept, the first point and low-end devices requiring 20+ seconds to pass are not a very good points to sell your service.


You're right that there is an electricity cost to solving this type of captcha - the same as there is an electricity cost to loading 2MB of JS+images and clicking the pictures with the fire hydrants (and the infrastructure behind that). It's hard to estimate how they compare (and what value you assign to the human labor performed and privacy loss).

20 seconds would be a fairly high difficulty. It's up to the site owner to decide what makes sense for them.

If anybody comes up with a useful computational task with a small bundle size that can be verified cheaply that would be the holy grail - until then the computation is only there as a form of hashcash.


20s doesn’t matter when it’s someone else’s hardware (eg spammers using malware installed on victim machines).

It’s also nonsense to compare the computational cost of N seconds of sustained, maxed out useless computation to the milliseconds of compute time needed to decode an image, or the minimal power usage of waiting on network data.


Electricity is a lot cheaper than my time.


Right? I think they’re describing those crypto mining scripts people were being inflicted with a while back :)


This is very interesting. Can you change the questions in the form? Those questions seem too personal and are offputting.


This looks very interesting and clean. Well done!


CAPTCHA does not scale. CAPTCHA spams real people with requests and wastes my VALUABLE time, and still labels disabled people as subhuman. It's offensive. It's ineffective. It's outdated.

It's reaching a point where encapsulating a VPN with anti-captcha is something I'd pay for.


> CAPTCHA is the worst option, except for all the others that have been tried.


I would happily pay 15¢ or so per site to bypass captchas if it were done in a way that would preserve my privacy. Has something like this ever been offered?


> unless universal online identities existed, which could and should include anonymous identifiers by design

Yes but no. Anonymized identifiers can be deanonymized. They should utilize zero-knowledge proofs in such a way that they can prove "yes, I have an identity verified by entity X (and Y and Z) (based on passport/phone number/...)", without disclosing any of those details.

It could, optionally, yield an identifier unique to each requester and unlinkable to others unless an explicit proof of the link is provided. Though if this is included, there has to be some mechanism to avoid huge ad networks sharing the same "requester entity".

This is a solved problem. All that's left is politics, implementation and alignment.


Maybe "anonymous identifiers" has some very technical and exact meaning that I didn't know, but when I said "anonymous identifiers" I did it very abstractly, no need to assume a specific underlying implementation from those two words.

I have actually discussed the concept in the past [0], and I exchanged some emails with the guy in that thread to talk more about technical details. We all seem to agree that design and political will are the problems, not technology.

What I was basically saying in the comments, in general terms, is that you might have one primary identifier, and then somehow you can get more identifiers that are tied to your main one, but that might have different expiration periods, might grant access to different levels of information about you, and might be limited to a certain number for each service you use. Of course, there are quite a few ways to implement such a system. And that's precisely why I'm more focused on the design, usability and characteristics than the underlying technical implementation; I think the best we can do if we ever want to see this happen is to spread the idea in terms that anyone can understand [1]. I mean, I'm interested in the technical details too, so I'm just complementing and contextualizing a bit here.

[0] https://news.ycombinator.com/item?id=22180120

[1] ...or discuss more the idea among those that are interested and setup a demo website to make it easier to spread the word, even if there's no actual implementation behind it and it's just a mock-up. I'm quite busy at the moment, but I'll definitely do something along those lines when I have some time.


>"yes, I have an identity verified by entity X (and Y and Z) (based on passport/phone number/...)", without disclosing any of those details.

Actually, I would prefer: "Yey, this one-use temporary ID is tied to an identity which is known to behave on public websites." Or "... is tied to an identity which is known to be knowledgeable on topic X". Or whatever information is needed at the time.

Next time a new ID will be generated and the identity provider will vouch for it. No passport or phone number should be required.

Edit: fixed spelling.


One such solution would be a small payment, something like 1 cent for access. That's not too much, because I am already paying 3 cents to a service solving captchas for me.


Maybe but how would you transfer 1 cent in a way that's fast enough not to impair UX and cheap (where the transfer doesn't cost more than the validation fee)? Additionally (as with all online payments) there are privacy concerns.


Please, what is the service? I want to pay someone to solve Captchas for me.


https://anti-captcha.com/ is one such service. There are others, but this one that has browser plugins for visually impaired people in addition to APIs.

I've used the service in the past, though it's far enough in the past all I can say is it worked once upon a time, no clue if it's still reliable.


WTF Did you see the super man like guy shooting at the sweatshop workers? This looks pretty bad...

https://imgur.com/a/CvYyBQH


Holy crap that's terrible and offensive.


Websites spamming me with captchas is terrible and offensive.

"Because scaling. It is our God-given right."


It is possible to solve captchas without glorifying violence against workers.


Interestingly, it takes them under 20 seconds to solve a recaptcha and 70 for hCaptcha.

I wonder if they’ve partially automated recaptcha, or if hCaptcha is just a bigger pain in the neck. (I usually can’t solve a reCaptcha in 15 seconds...)


I use Buster browser extension with free IBM Watson speech to text node, free limitation is 500minutes of speech per month, which is plenty for solving captchas. The crux of all this privacy (blocked 3rd party cookies) is that i need to solve 3 ReCaptchas every time.


For a lot of people, they want to run a service and not have to spend a significant amount of time and energy investing in anti-abuse. In general anti-abuse work is not nearly as useful as product work, a day off, or a variety of other things.

I agree, there should be better ways to do anti-abuse. Yet I find myself coming up empty when I try to find better options for the common scenario where people would really rather invest deeply in their service than in anti-abuse.

I would love to hear some ideas about how to solve this nasty general problem while also respecting user time and privacy. Unfortunately, I've found that entirely too often the vague sense that there must be a better way fails to translate into substantive better way.


Better way? I'd be hard pushed to come up with a worse way.

The number of things that are "wrong" with reCatcha etc, have been mentioned on here ad nauseam. In fact, I'll quote myself from another debate on the subject, a while back:

  >1: It's never made clear exactly what you're supposed to click on. For example. If I'm told to click on "traffic lights" does that mean just the lights?... or the poles as well?... and what about a square that only has a tiny bit in it? Does that count too, or is it only squares which are mostly filled by the object in question?

  >2: They make no concession to non-US English speakers. I've been asked to identify things before, where I had to guess what the word means because the same thing is called something completely different in UK English.

  >The only thing that approaches the level of rage that reCaptchas instil in me are those captchas where you've got to transcribe what's in a photo of some letters & numbers and where they NEVER fecking tell you whether it's case sensitive or not, or where they use identical characters for zero and letter O, one and letter I, etc.


So in other words you have no better ideas either?


I have lots of better ideas.


Are you executing on them? Otherwise you should share them here, so that others may.


Also, the outcome of the captcha is only loosely correlated to whether you answer correctly.


  >One particularly egregious misuse of captcha in a service I use presents one after I enter a correct username and password
Is it eBay by any chance?

That recently started randomly showing reCaptchas to me when I'm already logged in and have been using the site for some time. When this happens, it descends into a never-ending cycle of more login screens and then more reCaptchas.

But thankfully eBay have taken note of the dozens of complaints about this on their user forums, dating back to 2018 and rushed their best people in to fix it.

[That last sentence was dripping with sarcasm, in case anyone unfamiliar with the company thought eBay ever took any notice whatsoever of their users' concerns]

I'm not a violent person at all. But if I ever meet the person who spawned reCaptcha and all its equally annoying clones, which are a pox on the internet, I won't be responsible for my own actions.


I'm about as anti-Google as it comes, but I didn't even mind the first incarnation of reCaptcha as a concept. You prove that you're human, and you also help transcribe books so that they're more accessible/searchable! Sure, it's in Google's interest in that it improves Google Books, but it at least seems like a symbiotic exchange (to, e.g. humanity in general.)

Contrast that with today's form of reCaptcha where you identify stop signs/crosswalks/et c. for Google's benefit, but at the same time you're also improving...oh, wait, Google again. It almost seems like forced labor, in a sense.


It is forced labour (to a very light degree, but still).

It is additionally resource-theft, when recaptcha-protected sites are used for business purposes. You are stealing valuable business time (possibly very valuable business time, if the person in question is a high-paid role like a CEO or surgeon) to power your pet "spot the crosswalk" project.


Yeah, I certainly don't disagree. I was just trying to use 'forced labor' in a literal sense rather than try to imply any of the awful things that usually come to mind...


>eBay

>I'm not a violent person at all. But if I ever meet the person who spawned reCaptcha and all its equally annoying clones, which are a pox on the internet, I won't be responsible for my own actions.

be careful, else they start sending you dead horses head and planting gpses in your cars https://www.justice.gov/usao-ma/pr/two-former-ebay-executive...


With our hCaptcha Enterprise product (https://www.botstop.com), showing a CAPTCHA actually only happens in rare cases (relatively speaking..) - vast majority of bots are caught and stopped in the background (using ML), and most users will never see one.


I'm curious what how rare it is / what triggers it. In my experience, at least Google triggers hard mode if you use any sort of privacy preserving technology, etc ublock, brave, etc. It's very frustrating.


Gave up on google search because of this ... not a big loss.


I have a VPN so like ~50% of web sites present a captcha to me ... had to subscribe to a service solving captchas automatically.


> had to subscribe to a service solving captchas automatically.

What a bizarre world we've made for ourselves.


Do you allow by click type?


Not sure what you mean by click type?


I find that when I solve a Captcha too quickly, I get another one. And another one. And another one. So instead, I wait a short time, click a few wrong boxes, then enter the correct Captcha. Maybe this is part of it, but I don't like it.


If the Buster plugin can't solve the reCaptcha for me [It does fail from time to time] then I just don't bother visiting that website. Or if it's a site I need to use, then I'll try again later and see if I either get let in without being asked to jump through hoops, or get a reCaptcha Buster can solve.

I simply refuse to waste my time and drive up my blood pressure by doing unpaid training work for Google's AI, in order to visit some crappy website. I really wish more people would start boycotting any site which uses reCaptcha [or its derivatives], so we could get rid of this blight on the internet.

I've spotted this new hCaptcha junk show up recently on a couple of sites I used to frequent. I don't visit those sites any more. So well done webmasters. Apparently annoying the shit out of visitors to your site tends to drive them away. Who'da thunk it?!


I sent a email to my representative, which got my automatically added to her newsletter. But the unsubscribe link doesn't work without solving one...


Where? That would not be legal in many countries and I suggest you try reporting it.


Let me guess... You can try reporting it to the authorities, but they need you to solve a CAPTCHA first.


As in US house of representives. Cheri Busto to be exact in case someone works with her.


Can you provide an alternative that doesn’t involve my contact form getting spammed to hell with crap?


I have my email address listed in plain text on my website and with a simple regular expression to to reject the standard pharma/bitcount/etc. spams at the SMTP level based on subject there is at most a couple of spam emails pair day. Hardly takes any time to go through that.


Use the recaptcha after form submission only, rather than on the whole website. Then at least the user is incentivised to do it as a last step of a process, as opposed to being stopped in their tracks before they even got to visit the website.


What service are you using? Some browser plugin I assume?


    One particularly egregious misuse of captcha in a
    service I use presents one after I enter a correct
    username and password.
That's nothing.

eBay will CAPTCHA me after I enter my e-mail address, and then again after I enter my password too. Every time. And I'll be damned if I don't "fail" this CAPTCHA at least once a week, with it telling me to try again.

Come on, there are only so many mountains/hills, taxis, traffic lights, bicycles, and cross-walks I can look at before I go cross-eyed.

They even have the nerve to suggest that I can avoid this by using the latest version of my browser (Firefox), which I already am and always do.


> The potential reward for an attacker who successfully gains access to an account is high, so it seems almost certain anyone running a targeted attack would defeat this by handing it off to a human upon detecting that they had a good account.

Then it may surprise you to know that simply preventing automation makes many types of account takeover attacks infeasible in practice. It won't mitigate the attack if you are personally a high value, named target. But most account takeover attacks operate en masse and are coordinated after large security breaches, so having to hand over accounts to a human operator as part of the auth loop would make the campaign uneconomical. It also introduces another step at which an attack can be logged, recognized, fingerprinted and stopped by an incident response team.

This is something your security team would probably gladly tell you about if you asked them. There's also a bunch of talks about this presented at conferences like Blackhat, DEFCON, USENIX, etc.

Stated in another way: not all potential rewards for successful account takeover are high. The modal account in the modal campaign is low value, which is made up for by volume and particular purpose of accessing accounts. If you model these campaigns economically, you can eliminate entire classes of "low margin, high volume" attacks simply by introducing friction that mitigates automation.

Then there is a natural cost-benefit tradeoff as to how much friction is allowable on a per-user basis to prevent the most common types of account takeover attacks.


>One particularly egregious misuse of captcha in a service I use presents one after I enter a correct username and password.

I run a problem validation community platform. Couple of days back an individual launched automated spam/DDOS attack by commenting an abusive, demoralising text on every single thread by creating different users.

Fortunately, I had systems in place to identify and mitigate it with Cloudflare. So, in this case even genuine users would have received captcha. I found out soon enough who the attacker was from the firewall, he had earlier created an account with his own name and was using the same IP to attack, after I blocked his IP he tried with couple of other IP addresses incl. Tor; but stopped with his activity after couple of hours.

I generally don't like re-captcha because it takes cultural background for granted(e.g. 'Pie' is not a common food worldwide), Accessibility as a disabled person myself and has no mitigation for captcha-solving farms.

But in nuisance cases like the one I detailed above, captcha is the easiest method available en masse.


Why can't they just allow automated user agents? I should be able to scrape websites if I want to. Why do user agents have to be browsers?


Exactly, or be able to just use a text-mode browser.

Or wget to save a set of pages for later.

I understand protecting commenting with captcha, or contact forms. But captcha on regular read-only access to public web pages in the style of Cloudflare is a bit ridiculous.

One thing contact forms should have is a static indication there's a captcha in use. I've filled all too many forms that just sent my written text to void, because I block some domains.


This doesn't mix well with the ad-based compensation model.

Sadly, there still doesn't seem to be much in the way of micropayment infrastructure.


That's a feature if you ask me. The whole point of scraping websites is to get the data I want while discarding noise like interface chrome and advertising directly to the garbage.

If they'd like me to pay for access, they should return HTTP 402 Payment Required instead of letting me download the page for free. Perhaps they could also rate limit the network connection to prevent denial of service. Why straight up block automated user agents though? That sucks.


> they should return HTTP 402 Payment Required

That's illegal.

They must first return HTTP 418 Know Your Customer Required.


> there still doesn't seem to be much in the way of micropayment infrastructure.

Anti Money Laundering regulation killed it: KYC doesn't scale down to micropayment levels.

If you want to fix the web, you have to roll back the AML/KYC insanity. Until that happens, the web will stay broken, because paying with attention (ads) is magically exempt from the AML/KYC insanity, whereas paying with money or anything money-equivalent (fungible and transferable) is not.


> the AML/KYC insanity

Just finished reading about this and I completely agree. I can't imagine having a company and being literally obligated by law to violate everything I personally believe in about privacy and freedom just to help the government be even more efficient at marginalizing people.


Being scraped isn't free, if it's at a large enough scale.

Plus, it's not just benign read-only scrapers. Have you looked at the spam folder of your email recently? That's what every comment section and user bio and god knows what else would look like if you just blindly allow all automated traffic.


It's exactly how email spam filters evolved.

They used to be completely local and even some DIY solutions, evolved to signature updates, but eventually the attacks grew so advanced that only online services could be updated and aggressive enough, which is of course how gmail took over the internet with near perfect spam filter (when was the last time you checked a gmail spam folder).

The last generation of local spam filters were pretty good though. Anyone remember Eudora and Spamnix?


Local spam filtering still works quite fine. It just needs a lot of data most users probably don't have when starting out.

I just use bogofilter, and it worked almost perfectly from the start, just because I saved years upon years of SPAM and HAM. 10's of thousands of messages each.

It got slightly worse over years, because I incrementally only train it on new SPAM but not on new HAM, because of laziness.

People probably have HAM archives, but don't usually save their SPAM, to be able to start using Bayesian spam filters right away with great results.

Personally I find it much better than whatever Google uses. I don't even bother with SMTP level domain/IP blacklists, or reverse IP/domain checks anymore. All mail is just passed right to the mailbox and is then pre-filtered by a bogofilter to SPAM folder that I check once weekly, and barely find any HAM there. I receive about 500k mails a year.


And don't spammers just click farm captchas out to Facebook users filling out "what Hogwarts House are you?" quizzes, anyway?


That's a little amusing just to imagine: 'Which Hogwarts house are you? Identify these traffic signals and we'll sort you into the proper house!'


Here's a thought experiment. This one requires some long-term thinking, outside the box and well past recent history and the status quo.

What if the majority internet usage is non-interactive, from so-called "bots", what we may refer to as "automated use". Google and Facebook, among others, rely on the use of automation and "bots". The non-interactive clients ("bots") being used by these companies are not asked to solve captchas. (In turn, after collecting data from public sources, these websites attempt to prohibit the use of automation by their users wishing to access it. What is interesting is that neither company provides any definition of "automated" nor any clearly stated limits on the speed at which a user may access resources or the quantity of resources they may access in a stated time period. One might be apt to find such limits associated with an "API".)

In 2013 an Incapsula report suggested that the majority of internet usage is in fact automated and not "malicious"^1 -- what if public information sources on the internet catered to the use of automation rather than trying to limit such use, e.g., with speed bumps^2 like "captchas". What if servers treated all clients equally, instead of having data forcibly collected by a few large clients that receive preferential treatment, then siloed and protected from "automation". What effects would this have on "centralisation" and levelling the playing field.

"Do not ask for permission, ask for forgiveness." What does it really mean when applied to the internet. Perhaps it means there is an endemic lack of clarity about "the rules". Prohibiting "automation" is far too vague and in many cases it makes no sense. The growth of computers and the internet is the growth of automation. Both servers and clients may have concerns about resource utilisation. Websites do not ask for permission when they decide to use large amounts of the user's computer resources.

Consider that a Google could not exist without being "given permission" to use automation. Does the GoogleBot have to solve captchas. No automation means no company such as this could exist. How useful would the web be without anyone being able to use automation to create an index. Based on the HN comments about web search I have read over the years, I would guess that for many commenters, it means the usefulness of the web would be dramatically reduced.

Imagine an automation-friendly internet. The truth is, I think (the data shows) we already have one, except we are in denial that "the rules" actually allow it. An early metaphor for internet and web use was "surfing". It may be that those who are constantly fighting against automation are fighting against the waves instead of riding them. Time will tell. It stands to reason, IMO, that every internet user, whether a server or a client, should be expected to use automation.

1. https://www.incapsula.com/blog/bot-traffic-report-2013.html

2. An early metaphor for the internet was a "superhighway". Speed bumps would seem out of place on a superhighway.


Could the captcha be there to keep spam bots from posting? Sometimes it is trivial to get a new or just valid account, so just checking for that wouldn't stop spam bots.


It's easier for me to switch from google to ddg. Then to actually complete a captcha. I don't understand why businesses don't understand this.


There's a good reason for what you're identifying as misuse.

If you show a captcha after a failed password, you need to show a one after a correct password as well. Otherwise you leak information. You can have other solutions, e.g. in a login flow that splits the username and password entry, it's advantageous to put the captcha between those two steps. But even in those solutions the display of the captcha must be independent of password correctness.


There's a lot of arguments against captchas, but I do not agree with this one. You will always leak whether or not a password is correct based on how your app behaves - a correct password will grant entry to the application. If you only ask for a captcha when a user account exists but fail to ask if they use a made up username, that's an information leak.


The trick is to ask for captcha before validation.


> If you show a captcha after a failed password, you need to show a one after a correct password as well. Otherwise you leak information.

Presumably, if the person has entered the right username and password they're going to get access to the service at which point they'll know they entered the right one. What information exactly is leaked here?


The reason you'd want a captcha on a login page is to protect against brute-forcing of some sort. For example credential stuffing or a dictionary attack.

The information the attacker is looking for is the validity of the password. If you want to use a captcha to protect against this, the outcome must be the same whether the password is valid or not. Because if you only show the captcha for failed logins, the attacker can find out that the password was incorrect without solving a captctha, which by symmetry means they can also find out if it's correct without solving one.


Usually when you Captcha on a failed attempt, you captcha every request from that IP (or other session identifier) for a period of time. Try Google Accounts for instance. They behave this way.

You don't captcha the success path because you don't need it. You captcha the pre-login flow once you have a failed attempt. It's a trip switch that is a prelude to the flow.


But this entire thread is about a case where the captcha happens after password entry!

The point is that it is an entirely legit design, and kind of is the way you have to go when the username and password are entered together. As long as the captcha is shown regardless of the password validity, both the security properties and the amount of user annoyance due to having to solve unnecessary captchas is the same as if you had had to pass a captcha up front.

The example of Google Accounts is interesting, because they use split username and password entries. So there is indeed a natural point in the flow to show the captcha between the username and the password, which is what they do. But at least up to a year ago they were doing it after the password. So enter username + submit, enter password + submit, and if the login attempt was sufficiently dodgy get shown a captcha regardless of the validity of the password.


Sorry, because of how common the method I described is and how absurd the idea of showing a captcha only to give you a login failed message is, I "corrected" it before responding.


You use rate limiting to stop brute force attacks, not a captcha


That's not what's happening:

> An incorrect login says so without presenting a captcha.


Thanks, I misread. Then that indeed makes no sense!


I think it's great. So many sites sit behind Cloudflare now and Cloudflare now uses hCaptcha, which is a big win. And the hCaptchas themselves are easy to complete. No more wondering if you actually clicked on 'all' the traffic lights anymore, yay!

I inspected the source code of Google's reCaptcha offering and was disgusted at how many bits of information they were collecting. They also seem to be fingerprinting users so they can't keep registering new accounts on a platform, locking out anonymous users who are usually the best types of users on the platform, as IMHO anonymous voices are (usually) the best voices, or at least the more interesting of voices.

Google's reCaptcha code seemed to be very keen on knowing my 'cadence' or the way I used my mouse and how quickly (or how slow) I completed the captcha. It also looked at things like timezone, screen resolution, battery charge level etc So they could determine if it was 'you' who was using the captcha, soon after, in a separate session (even on a different device!)


> Google's reCaptcha code seemed to be very keen on knowing my 'cadence' or the way I used my mouse and how quickly (or how slow) I completed the captcha. It also looked at things like timezone, screen resolution, battery charge level etc So they could determine if it was 'you' who was using the captcha, soon after, in a separate session (even on a different device!)

I'd bet a good amount that they store that along with all the other personally identifying info they have on you (and google of course has a massive amount of that); which is basically why after a single reCAPTCHA solve, you wont see them prompt you again for ages - they know who you are.


Just turn on "Resist Fingerprinting" in Firefox and you'll find ReCAPTCHA _really_ annoying! I have to solve 3-5 "panes" of a ReCAPTCHA on _every_ page... It's very annoying that preserving privacy comes with this cost.

I almost want to just add a "DeathByCaptcha" extension to handle these for me and pay a few cents for every page I visit, lol


It's not a cost. Google doesn't want you to protect your privacy from them. It's a punishment.


> It's very annoying that preserving privacy comes with this cost.

It doesn't necessarily have to if Google supported privacy pass like hcaptcha does. The problem is that they don't.


Why would they? Supporting privacy-preserving options is not within their business interests.


It's way cheaper than that, you'll pay significantly less than a cent for each captcha.


> which is basically why after a single reCAPTCHA solve, you wont see them prompt you again for ages - they know who you are.

If only. If the same site has reCaptcha across more than one page, within mere minutes of having to slog through multiple screens of one, I can guarantee I'll be doing it again.

And I'm never sure if Google has served me either a very long sequence of reCaptchas, or whether they've decided I'm not a person and are serving me an infinite reCaptcha.


Being on a VPN, have blockers on, or not logged into google are a few things that will increase the captchas you’ll see.


Add "running Firefox" to that list.


And that's how you know that your VPN and adblockers work :)


Also using a screen-reader and using anything other than Chrome.


I'm guessing that the comment you're responding to uses Chrome/ium, and that you don't.


Just looked on Takeout and there don't seem to be any reCAPTCHA data there. I wonder what a GDPR request would produce.


Someone on here has tried to get all their data from both Facebook and Google. I wish I could find the blog post. The tech companies are claiming their Takeout/equivalent is sufficient under the GDRP and anything extra we ask for is not being provided due to it being "non user understandable" or in a "machine format". IIRC.


That's still their data though, no?


That is an investigative blog post I would like to read too.


reCAPTCHA only needs to make a determination that the user is some human, not that they are any particular human. And reCAPTCHA is usable without being logged into Google’s identity system. The profiles it builds are clearly not associated with Google’s primary identity database, and its trivial if you don’t need to preserve identity to one-way hash every piece of data that GDPR considers user-identifying at the entry point to the system and store only the hash. The EU isn’t shy about handing out billion-dollar fines to Google, so while Google can match a user to a stored profile there’s no reason to suspect that Google has a way to reverse that mapping.


ReCaptcha seems to use your logged-in status (and, I’d guess, account reputation), along with an ip reputation score when deciding whether to serve a captcha or not.

A GPDR request naming an IP address should allow them to provide those scores.

If not, it’s easily demonstrable that they are storing and using information that they’re not including in a GPDR response, and they deserve their multi-billion dollar fine.

Also, ReCaptcha’s behavior is obviously anticompetitive, and also using Google’s dominant positions in some markets to establish dominance in unrelated markets.

This is anti-trust lawyer candy.


Logged in status and account reputation are obviously useful input, but reCAPTCHA objectively works without requiring login. IP reputation also obviously makes sense to take into account, but the fact that Google collects all of this fingerprinting information demonstrates that’s again only a part of what factors in. The same profile can show up from behind a different IP.

Not sure what you’re getting at about GDPR and IP reputation. GDPR says that an IP address is PII if it can be associated with an individual, but that doesn’t mean an IP address is a “subject” for the purpose of filing an Article 15 Data Subject Access Request. And it doesn’t mean that stored information that is keyed by an IP address is personal data, even if the IP address can be associated back to a particular individual.

I also find it strange that you’re talking about reCAPTCHA being “anti-trust candy” in the comments of an announcement about how a different captcha service now handles 15% of the entire internet.


>GDPR says that an IP address is PII if it can be associated with an individual, but that. . . doesn’t mean that stored information that is keyed by an IP address is personal data, even if the IP address can be associated back to a particular individual.

Okay, I’m usually pretty good at understanding law, but this doesn’t make sense to me. This seems like a very shallow, arbitrary definition of subject data. So GDPR didn’t grant ownership of one’s PII? What did it grant ownership of? I’m very confused by this.


A subject (a person) can make a request for subject data (data about the person) that a company has stored. The data has to be about the person and associatable with the person to be covered by GDPR. If I keep track of say, average CPU time for handling requests from a particular IP address, the CPU metric doesn’t become the personal data of the individual that is associated with the IP address.


That's no secret, ... that's exactly how it is supposed to work.


I've had hCaptcha recheck me repeatedly and fail to work at all on VPN. I much prefer ebay's simple slide the puzzle piece method.


The slide is just as, if not more invasive as recaptcha in terms of privacy, cross-site tracking and fingerprinting.


That's annoying. Have you tried using their accessibility feature?

https://www.hcaptcha.com/accessibility


Anti-feature as you need to sign up for this.


Yes.

hCaptcha is not easy as is being claimed here. I have lost a lot of time and been blocked from much content due to hCaptcha.


I love this extension: https://github.com/dessant/buster

It won’t solve the privacy issues but at least you’re not working on google’s training set anymore and captchas are automatically solved for you.


reCaptcha has also gotten increasingly annoying lately.

I forgot my password to one site and tried about 2 or 3 different passwords and in-between each it asked me to do about 7 or 8 of those labelling exercises. I finally just gave up and left the site.

Not only that, but the labelling exercises weren't clear. It wanted me to label a "公交車" which means more like a public city bus and there were also school buses which would normally not be called that in Chinese so I didn't label them but Google thought they were part of that class, and wouldn't let me proceed without me labelling them, and furthermore, punished me with more "hard" exercises like that. I guess they are trying to turn me into a stupid bot.


Recently Google's captcha asked me to mark all the traffic meters on the photos, and amongst the choices was a photo of a mailbox. It didn't let me through until I marked it as a meter as well.

Good luck to whatever self driving car they are training using this data.


Maybe would be kind of fun if we, the users, could form a coalition to deliberately mislabel photos on captures on a mass scale.

It just seems there lacks a way to make it happen beyond the hacker community.


4chan users already used to do this for the old text based ReCaptcha. The idea was everyone put in a particular racial slur (they're not very imaginative) for the second word. I doubt it had any impact.


If I remember correctly, it did have some impact.

I believe at one point you could use that racial slur in place of the second word in the captcha and it would accept it.


I do this. I just click random boxes and it usually lets me through. It’s more clicks, but not necessarily more time.


I would venture that the repeated attempts from multiple users to not mark the mailbox will help with that.


What's a traffic meter? Maybe it's just important that the car doesn't drive into it.


I think he meant a parking meter.


yes, thank you!

:facepalm:


Well that might be my handiwork! About half the time I have to do an reCaptcha i mislabel one of the photos. Because I’m not being paid, I’m being held hostage to an automated system and I will rebel.


That literally happened to me 5min ago!


I’m not particularly fond of reCaptcha either but I disagree that it’s an obviously good thing for someone to be able to repeatedly make new accounts with no restrictions. Abusive users use this to bypass account bans.


As a nojs browser, I only think it's great in the sense that I don't have to waste my time with sites which don't care about accessibility.


yes we need captcha that supports browsers like links. NOJS browser ought to make a comeback. display image, text and video. For many sites that's ALL we need.

For sure enable javascript to get the fancy stuff, but mostly we just want to read the text, view the picture and see the video.


Is there a way to delete collected reCaptcha data from google accounts? I looked around and I don't see a specific setting for that.


Lots of websites sit behind a single company which has now changed to use its own captach is a good thing?


Creating a monoculture makes it easier to implement systems that automatically bypass the captchas, so it’s good for end users, especially people that are visually impaired, or otherwise unable to solve captchas.


It's not ideal but it is an improvement.


And you are also contributing, without pay, to G$$gle AI and their plans of world domination.


Using hCaptcha is also contributing to some ML model for different companies serving unknown (to us) purposes.


well the slight improvement in usability is a win, but is it really any more anonymous than recaptcha?


Unlike Google, which is making hundreds of billions with ads, we have zero reasons to track users - customers pay us to stop bots, and that's the product we provide.


one significant issue with recaptcha is that it punishes users who are not easily identifiable. the puzzle isn't enough for recaptcha - it needs to know who you are. is hcaptcha better in this regard? for example does it work behind tor? does it work with third party cookies disabled? is the challenge im ostensibly solving really the only thing required to verify my humanity? if so, i'm impressed! but if not, maybe we should just do away with the puzzles...


Worth noting that this title is primarily due to Cloudflare having switched to them from ReCAPTCHA, and Cloudflare is... well, relatively popular, to say the least.

I'm curious what kind of data may exist on the experience of switching for larger providers; do the users like it? how much more/less time do they spend solving? do they care, let alone even notice that it's not Google's ReCAPTCHA?

Regardless, as ReCAPTCHA is not only terribly annoying but also built for surveillance from the ground up, I still view this as a good improvement.


Disclaimer: I've been an engineer at hCaptcha for a few years now building out the service. I'm just as interested in you as hearing about customer and user success/pain stories!

> Worth noting that this title is primarily due to Cloudflare having switched to them from ReCAPTCHA, and Cloudflare is... well, relatively popular, to say the least.

That's definitely a part of it, but we also have a number of other large sites and services that use hCaptcha to protect against bots, and more that get added every day because of our more advanced bot detection special sauce.

> I'm curious what kind of data may exist on the experience of switching for larger providers; do the users like it? how much more/less time do they spend solving? do they care, let alone even notice that it's not Google's ReCAPTCHA?

From what we've seen, the integration process is generally smooth, especially if you're a previous reCAPTCHA user, since we keep the interface and workflow largely the same.

Solving is roughly the same although we have a number of other protections that irritate bot maintainers and get activated when we detect them.

Not sure if the majority of people are aware of the change, I'm sure some technically savvy people pick up on it more than not.

> Regardless, as ReCAPTCHA is not only terribly annoying but also built for surveillance from the ground up, I still view this as a good improvement.

That's actually one the top reasons we've had a lot of customers come over to us; we put a heavy emphasis on user privacy / security, including adopting/supporting privacy-preserving protocols (PrivacyPass, Tor), and minimal retention of data (see our data privacy policy on our site).


Your CAPTCHA accessibility leaves much to be desired. You require screen reader users to register an account to create a magic cookie that itself requires Safari users to disable security protections in their browser in order to use -- and then it doesn't actually work.

Please do better. You're blocking off a non-trivial amount of the Internet to blind users. You will eventually be sued for this.


We actually spend quite a lot of time on this, and regularly work with blind users to test and improve these flows.

Most vision-impaired users have no issue in our testing, and it is a much more accessible option than audio challenges, which discriminate against those with auditory processing impairments.

(disclosure: work there.)


Your cookie approach requires:

> If you are using the very latest version of Safari on either the recently released OS X 10.15 or iOS 13.4, Apple has just changed the behavior of Safari related to third-party cookies, blocking all of them by default. We are implementing a solution, but in the meantime please visit Safari Preferences, Privacy section, and uncheck "Website tracking: Prevent cross-site tracking" to enable the accessibility cookie to function as expected. [0]

[0]: https://www.hcaptcha.com/accessibility

So while you're patting yourself on the back for not "being like Google", your accessibility workaround exposes blind users to third party trackers like Google.


Using any kind of privacy/adblock extension that supports domain-level whitelisting (e.g. uBlock Origin) works fine, and this is what we suggest in the accessibility FAQ. Apple didn't build fine-grained controls into their browser before making this recent change, unfortunately.

That said, we're working with the browser makers on native support for our next gen privacy-preserving approach to this via Privacy Pass.


> uncheck "Website tracking: Prevent cross-site tracking"

Holy moley! Yeah, that's a deal-breaker. I agree that this is entirely unacceptable.


I usually just bounce when I see a captcha (if I get one, I usually get a string of them, so I don’t bother).

However, I checked secondary markets where you can pay a human to solve a captcha.

It takes a professional captcha solver 70 seconds to solve an hCaptcha but only 15-20 seconds to solve a reCaptcha. Is that typical? That seems horrible.

The market rate for a captcha solution is 1-3 cents, which is clearly worth it, until you think of the ethics of paying someone slave wages so you can browse the internet slowly, but at least without breaking concentration.

Have you considered a more ethical approach, like micropayments that go to charity or something?


Have you tried using privacy pass? Having to spend 70 seconds solving one hcaptcha every couple of days might be a good middle ground.


Love the response, happy to see that it's going well then! After reading a lot of feedback I got from 'You (probably) don’t need ReCAPTCHA' (https://nearcyan.com/you-probably-dont-need-recaptcha/), it started to seem pretty obvious to me that there was an open market space for some better competitors, so I'm glad hCaptcha got around to being adopted with such success sooner rather than later. Hopefully the challenges of the future go just as smoothly as things are going in the present.


>do the users like it?

This is completely anecdotal (and seems antithetical to the typical HN response to hCaptcha vs ReCAPTCHA), but I feel like I end up spending at least twice as much time trying to solve hCaptchas successfully because they have a lot less consistency in the objects you're searching for. I always have to zoom in to the modal and carefully search through each image, which invariably breaks whatever flow I'm in (moreso than other captchas).

For example, here's a screenshot from the hCaptcha website's "try it out" section [1] -- I barely recognized either boat in image #1 because it was so small. I missed image #3 because I didn't realize it was a huge cruise-esque boat (so big you can't even see any water) and I spent a good amount of time deliberating on #4 because, well, it looks like a car + windshield but... on the water? If it's a boat, I can't really tell, but I marked it as one solely because of the water in the background. Not sure if it was right or not.

It also seems to occasionally provide "find all the X" challenges without there actually being any X, which feels super cognitively weird ("am I just not seeing it?!").

I'd say ReCAPTCHA's main problem is deciding whether mostly-consistent objects being partially in-frame is enough to "count", whereas hCaptcha's main problem is actually recognizing the widely-varying objects in the frame. I think the former is a little more frustrating when you get something wrong, but the latter is mentally "harder" and takes more time on average, for me at least.

[1] https://i.imgur.com/uyqvs5u.png from https://www.hcaptcha.com/


Honest question: How do you view it as an improvement? The same data is being shared, and the only difference is that Cloudflare isn't immediately behaving in the same evil ways as Google. But once you concentrate power in an entity, perhaps bad things might happen?

... If there was an on-premise captcha implementation that actually worked, that would be great.


Unlike Google, hCaptcha isn't running an ad network "on the side" of their bot management business :) joking aside, hCaptcha is an extremely privacy-conscious operation, Google is not.


I'm not a lawyer, but can you explain how their privacy policy is privacy-conscious now and going forward, and how centralization of network transit with Cloudflare isn't a bad thing?

https://www.hcaptcha.com/privacy


hCaptcha is more focused on technical solutions to privacy that minimize required trust. A privacy policy is one thing, but a mathematical guarantee is quite another.

We are working through the IETF and directly with browser makers to support provably private options like Privacy Pass, and are currently the only CAPTCHA service to support this.

Similarly, on the enterprise side we offer various technical options to let our enterprise customers guarantee exactly what data we can and cannot see.

(disclaimer: work there, comments not official, etc.)


How does support for Privacy Pass interact with services that pay humans pennies per thousands of captcha solves? Wouldn't it be easy to buy a ton of these blinded tokens then have an extension that provides them on demand to the captcha service?


hCaptcha works on Tor, sometimes.


For site operators, they don’t like the change since users are more likely to complain to the website than directly to CF. The following community post has 20k views and >100 replies asking Cloudflare to move back to recaptcha in some form.

https://community.cloudflare.com/t/stop-using-hcaptcha/15896...


To be fair it doesn't seem to be _that_ bad on this thread: There's the very vocal OP as well as a "discussion" between various users that ranges from "please switch back to ReCaptcha" to "please keep hCaptcha".

For a change that affects "15% of the internet" this seems like very little negative feedback in a period of 8 months.


> hCaptcha is making cloudflare money by earning them Human Tokens on the Ethereum blockchain

> Most people do the convenience from Google CAPTCHA, although they sell some kind of info, but they won’t hurt you

I can't even...this is the Cloudflare forum wow.

I've personally had a few hiccups with hCaptcha quite some time back as I "wasn't sure what I was looking for" and consistently fail on VPNs. But in recent months these there's definitely been substantial improvement , and needless to say I hope to see hCaptcha be the majority provider


Maybe the user should have the option to choose which CAPTCHA to solve.


Worldwide? And since when? I've never hit one of these, I get reCaptcha'd to ~death~ anger all the time.

Although having said that, maybe I am hitting it and that I've been unaware and uninterrogated is high praise! Hm.


> Do the users like it?

Absolutely. Having to solving only one captcha every few days beats solving 5 or 6 on each page visit. hcaptcha supports privacy pass but Recaptcha doesn't.


OP here, and full disclosure I work with the hCaptcha team. Yep, Cloudflare is a big part of this, but you'll find our enterprise offering (BotStop.com) running on many many other large sites and apps. If you've used the internet in 2020, you almost certainly interacted with our products :)


I'm really starting to hate all the captchas with a burning passion. Partly because the corporation I work for seems to have gotten our NAT addresses onto a blacklist so I get captcha'd constantly, and partly because my close up vision is getting noticeably weaker (pushing 50, that's why) and without hunting down my reading glasses it can be difficult to make out the smaller details necessary to solve the puzzle. Especially when I'm on my phone.

I really wish we could find something relatively foolproof that didn't rely heavily on tracking or really good vision.


Similar deal where I am at present in India: the small ISP uses carrier-grade NAT, so there’s malware and related activity occurring every day from at least one of the who-knows-how-many people behind this one IP address. Last time I was here in 2016 it was actually a lot worse than it is now (then, any Cloudflare site would trigger it, so I’d be hitting dozens of challenges per day), but I still get the occasional hCaptcha here (e.g. the Audacity wiki), and they’re awful. I normally take two or three attempts (quite apart from the regular times when you finish the challenge and press submit, and it just does nothing), guessing things like whether they want to count this particular dark smudge as a motorcycle or not, or whether this fragment of a motorcycle should count or not.

I wish people would just face up to the reality that challenge-based CAPTHCA techniques have failed, and stop using them.


Try using privacy pass. It's designed for use cases like yours.


That still requires you to fill out a CAPTCHA at least once, and also requires that you install a browser extension, which I baulk at doing, especially when it needs the “do anything on any website” permission.


Not available for Safari though. Or any mobile browser?


We've moved to hCaptcha from reCAPTCHA after Google surprised us with their pricing (blog[1], hn discussion[2]), and couldn't be happier. We use it in invisible mode and it does a great job at finding bots while getting out of users' way.

Also top-notch customer support. The CEO was personally in the slack channel helping us. Highly recommended.

[1]: https://blog.repl.it/anon

[2]: https://news.ycombinator.com/item?id=25004476


> We use it in invisible mode and it does a great job at finding bots while getting out of users' way.

Interesting didn't realize this was a thing hcaptcha did[0]. It's basically recaptcha in terms of tracking which sites you visit then, no?

0: https://docs.hcaptcha.com/invisible


Compared to reCAPTCHA v3, our approach does not depend on tracking your visit history. (disclosure: work there.)


Would you pass this feedback on? Having completely opaque pricing is a big red flag. I assume it is very expensive, or else you couldn't afford to route inquiries through a sales rep. If you aren't hiding a high price, you should publish it. Also, and this is a strong personal preference, I never ever want to talk to or hear from a sales rep.


Couldn't agree more. When comparing my options, I usually discard the options that hide their prices if at least one has a published theirs. I don't even bother contacting them. I don't have the time.


I think parent means hCaptcha enterprise in passive mode, where hCaptcha is detecting bots in the background using ML: botstop.com


Surely this ML must be presented a wide set of data on the user and their browser to make this determination? So just like recaptcha, they determine if they should admit you based on passively snooped data rather than active challenges.


My knowledge of this is that hCaptcha uses anonymized user data to ensure that the user doesn't look like a bot. What ReCAPTCHA did differently is not only not anonymizing data, but specifically trying to find out which person was presented with their captcha by matching cookies and profiles to Google accounts (which the majority of users would have and be logged into for many reasons). When you combine this with Google owning everything from Gmail to Youtube to Android to Chrome, it gets extremely pervasive.


hCaptcha's ML is 90% "Is the user agent the newest version of Chrome? -> Not a bot, Otherwise -> Bot"


My mom and dad's shared IP (somewhere in Europe) repeatedly gets on CloudFlare's IP ban list meaning my mom keeps having to solve these hCaptcha's. hCaptcha's is a lot more difficult to complete than Google's reCaptcha and she has a lot of trouble with it.

Why they get on these IP lists is I think because it's a general consumer ISP and probably a lot of people get bot nets on there.


I've learned that folks end up unintentionally installing software which acts as essentially "proxy server as a service". I've heard of browser extensions doing this, but I would be unsurprised if mobile apps did it. Every holiday I do a sweep of my parents' devices to make sure they haven't installed anything silly (my mom somehow always has three or four different weather apps). I'd suggest giving it a look the next time you can.


Or, it could be that their computers or home network is infested with malware or bots.

Majority of people complaining about captcha need to look at their system first. Of course any detection system has false positive, but the false positive rate is not double digit percentage in vast majority of cases.


Install privacy pass on their computers. It won't eliminate the captchas completely but will decrease the number of times they'll see them. I believe cloudflair gives you 30 passes for each captcha solved.


What service they use have a big implication of if they get captcha'd. Try switchingservices


As someone who scrapes, captcha's are pretty silly. One of the sites we scrape implemented hCaptcha, and it was a breeze to get around. There are a few things that make my life more difficult, but captchas aren't one of them, and nothing can stop scraping altogether.


Meh, there's always going to be a longtail of targeted abuse so it's not much to boast over. Xrumer software in 2001 could even let you sit at your computer and fill out those common PHP-lib captchas (like on EZBoard) while Xrumer spammed internet forums and blogs. You could even hire a cubicle farm of humans to manually abuse a web service.

Captchas filter out the 90% bulk of automated abuse.

Btw, web scraping is on the nearly harmless side of abuse.


That makes sense that there's longtail abuse, thanks. What would you say is more harmful abuse? Spamming endpoints for SQL and other injection attacks?


How'd you do it?


I'm not OP but there are cheap solving services that will fill them in for you. The cost is trivial and they have decent APIs for automatic integration into scrapers.


many captchas let you fill them in once and then use the resulting cookie to do whatever else you like on the site. Just manually copy the cookie to your bot hosts.


That's great to see! At Plausible Analytics, we had a wave of spam attacks two months ago or so and hCaptcha saved us. Great product and great service both for companies and for users. We're very happy with how it works. And great to have a quality de-Googled alternative for this use case!


I don't understand why anyone likes hCaptcha. With reCaptcha, I rarely got more than the checkbox. Now I get a series of puzzles every time I want to look at a web page. When that happens, I'm just closing out, and going to a better website.


Because many of us get the same puzzles over and over with reCaptcha, and often get sent into the infinite puzzles zone despite being both a human and answering correctly. The assumptions built in to reCaptcha just aren't good. There are people that don't accept cookies for anything, but who aren't bots. There are real humans using text-only browsers. There are people who's mouse and keyboard events fall far outside the expected range of normal human users, or who use aids such as macro keys that playback canned responses and confuse Google's methods. So I can definitely imagine anyone who's dealt with that shit for years being happy to have something different to try. Even if it's just as annoying, at least it's not the same damn thing!


I would say it's not just as annoying, it's worse. It has all the same annoyances, except now I get the worst outcome every time.


My favourite failure mode was when I was using a roller as a mouse and it constantly failed. It took me a while to figure out that it was probably failing because my mouse would move in a straight line.

Naturally if I was logged into my google account I wouldn't have much of an issue, because I would be feeding the surveillance machine.


We need more auto-solver services. Pay a subscription and let developing country workers solve captchas for us. Extra expense on top of internet connectivity plan.


I personally found hCaptcha harder to pass than reCaptcha, to the point where I will leave a site that demands one if that's a realistic option (e.g. not really an option if it's my bank, totally an option if it's one of many stores selling an item).

It's possible that I just got unlucky (one of my recent experiences was a site that didn't let me in even after solving it, which really soured me), but I feel like the main reason it's hated less is because people haven't seen it as much yet.

Edit: TIL how offputting a single bad experience can be. From going through my HN history, I found out that this terrible experience was 6 months ago.


I largely agree. Cloudflare requires hcaptcha if you mistype your password a single time when accessing their dashboard. The UX of hcaptcha is not good, especially in a flow where I'm already fixed on goal (doing something in my dashboard). If I need to stop thinking about dns settings or caching to pick out photos with bicycles, that's a really expensive context switch for my brain. And in my experience, you need to do two "pages" to complete the captcha.

By comparison, I've been asked to complete recaptcha exactly zero times day to day (perhaps I'm lucky?). The last time must have been many months ago.

It's hard to speak to the specific UX qualities of the captcha itself, but I find recaptcha generally less difficult. But a captcha that I don't need to complete always wins out over one that I do.


You're probably leaking lots of data to Google. Browse the Web with strong privacy protection (temporary container addon in firefox, ublock, privacy badger, decentralize...) and you'll see that recaptcha UX is just worse (slowly loading new pieces to identify after you solve them, one after the other). I can't count the number of stairs, hills, fire hydrants, cars, trucks, bicycles, traffic lights, pedestrian crossings... I had to point out just this month.


I've completed both of them, and I am firmly of the opinion that hcaptcha is harder to complete than recaptcha. That is when I'm being prompted to complete it.

I'm comfortable with the amount of data I'm exposing online. And where I'm at, hcaptcha is not better. And even if recaptcha prompted at the same rate, I'd still prefer recaptcha over hcaptcha.


Honestly captcha seem pointles to me.

Literally every time I'm in a situation where I'm required to use a captcha to access a site it is impossible to successfully solve the captcha in any sane amount of time.

This happens both with google and cloudflare.

Tbh. if they don't trust my connection can't they just tell me so instead of pretending to provide a "I'm not a robot" test which is practically (close to) unsolvable???

(Note that this post only refers to captchars guarden the access of an site if they somehow don't trust your connection, not "I'm not a robot captures" on forms or similar).


From the privacy policy

We collect the following categories of information:

Information that can be used to identify or contact an individual ("Personal Information"), such as name, email address, and country.... We may also verify the identity of our Integrators and Customers by comparing personal information against third party databases or official legal documents.

Information collected automatically as a result of an Integrator’s or Customer’s use of the our Sites or the Services ("Analytics Information"), such as IP addresses, browser type, Internet service provider, platform type, device type, operating system, date and time stamp of access, and other similar information. Some Analytics Information is collected on our behalf by third parties we engage for that purpose, and some Analytics Information is collected through a variety of tracking technologies, including cookies

In the preceding 12 months, we have shared the following categories of information with third parties for a business purpose:

Identifiers. A real name, unique personal identifier, online identifier, Internet Protocol address, email address, account name, or other similar identifiers. Shared with Service Providers

Personal information categories listed in the California Customer Records statute (Cal. Civ. Code § 1798.80(e)). A name, credit card number, debit card number, or any other financial information. Shared with Service Providers

Commercial information. Records of products or services purchased, obtained, or considered. Shared with Service Providers

Internet or other electronic network activity. Browsing history, information on a consumer's interaction with an internet website, application, or advertisement. Shared with Service Providers

Note: Fraud risk associated with an individual IP address may be shared with an Integrator upon request.

https://www.hcaptcha.com/privacy


Wait, that's the hCaptcha policy?

Can you link to it on their website?


Updated it in end. By the way I am not in anyways related to hcaptcha. Just sharing it because they don't seem to be the saint that everyone else is believing them here.


Please read the second line at the very top of that page:

"Not Applicable to Third Party Websites. Please note that this Privacy Policy does not apply to any website, offering, product or service of any third party, even if it links to our Site or incorporates the Service – please refer to the applicable privacy policies before deciding to provide any information to third parties."

End user data is governed by the Data Processing Agreement, linked here: https://www.hcaptcha.com/terms


I used to think that reCAPTCHA was bad, but then I had to solve numerous CAPTCHAs from hCaptcha. Now I think that they're all bad.


Accessibility - I am surprised nobody mentioned it.

We looked at hCaptcha, and the feedback we got was that their approach to accessibility is simply unacceptable.

If you can't solve the challenges, you have to sign up on their website, in advance, and provide them with your email address.

https://www.hcaptcha.com/accessibility

We just couldn't justify that sort of privacy imposition.

Personally, I think all Captcha needs to go.


I really hate all these captcha codes

Why can’t they do something like a reverse SSL where we have to authenticate ourselves as humans?

For example if I have an Apple account on my Apple devices, why can’t they figure out a way to authenticate me as a human from that information?

This doesn’t work for all scenarios (eg throwaway accounts), but it could work for the majority?


Forget adding more draconian identity requirements. 95% of CAPTCHA use is simply unnecessary and could be straightforwardly removed or replaced with rate limiting login attempts per IP. Never mind sites that use it to prevent scraping. If serving static pages is that much of a burden that you want to discourage automated means of retrieving information that you're trying to publish, then work on your website performance instead of adding more user-hostile roadblocks.


Exactly. The vast majority of captcha usage is completely unnecessary. Just remove it. As you say I've even seen captchas for static content, which is pure nonsense.

I run several long-lived (decades) sites with mostly static content but also some dynamic pages and a commenting mechanism. Here's what I do to prevent abuse: nothing. It's fine.

If you run a massively popular site or something politically controversial then you'll be targeted for abuse. If you're specifically targeted, I don't know how much captchas will help.

For the rest of the 99.9% of sites, just stop it. You don't need it.


> could be straightforwardly removed or replaced with rate limiting login attempts per IP

This is very outdated intuition. Fresh IP addresses cost peanuts.

For example, your solution still allows an attacker to run a 50k item /login combolist against one of your users with $5 of botnet time, each IP address trying a single uname/pass combo.

Here you pay $18/GB to multiplex your abuse (cred stuffing being classic non-volumetric abuse example) across 72 million residential IP addresses. https://luminati.io/


50k attempts is tiny, like a 3 character password or a single account run through /usr/share/dict/words.

If you're worried about password with between other sites that have leaked, then the real answer is to generate something like a username for each user that they won't be able to share between sites. In fact with the prevalence of password managers, generating users' passwords for them might just be the better approach these days. And just fall back to email auth every time if they don't want to store it.

Duct taping your broken system by throwing up an annoyance for every user who doesn't want to be tracked is not the way.


This scenario is not realistic, as you can just lengthen time between subsequent login attempts per username.


Attackers only need one attempt per username.

They will use a leaked list of millions of username and passwords, then use a botnet to try them all on another website.


I would start with blocking those 72 million addresses for starters :)


Many of these ip addresses would also be shared by servers of legitimate businesses and VPNs.


True, but blocking them stimulates them to stop renting them out to such services.


That’s exactly what I’m thinking too! Lot of it is to prevent scraping and a lot of it is unnecessary


Maybe there's a good solution somewhere there but the problem - as far as I can see - with this is that you either have a central party that knows who signed up for what or you have every account tied to a permanent super-identity. Both of which aren't great.


Webauthn may make it easy because it supports platform authenticators like Touch ID / Face ID. Two good demos are at http://webauthn.me/ and https://webauthn.io/

It's now supported in all major browsers but platform authenticators are likely not supported on all OS yet.


This makes a lot of sense - exactly what I was thinking


I'm not sure it will be as secure as captchas though. Browsers will probably allow dummy platform authenticators for ease of development, which can be used by bots as well.


Well, the point is to avoid automated abuse.

Your Apple account on your Apple device doesn't stop you from unwittingly being part of a botnet, for example.


Why wouldn’t it? Unless they hack my account

Genuine question - I don’t know much about this field


So we use 15% of the internet for captchas and I lose 15% of my life identifying crossroads, stoplights, and buses to computers. There's got to be a better way


You should take a Waymo while solving the puzzles to get back the time


you spend 100% of your life on the internet?


Not keen on hcaptcha because I'm almost always need to solve 2 sets of the puzzle vs 1 with recaptcha. Theres a thin line between privacy and convenience, most of the time I've felt hcaptcha to be on the least convenience side.


Only 2? I usually need 5+.. not that recaptcha was any better.

I sincerely hope they, along with all other companies providing captcha services, go bankrupt.


"hCaptcha now ruins fifteen percent of the internet"

I loathe captchas, especially Googles who seems to punish my use of privacy extensions (ie ublock origin, Ghostery Lite).

Captcha is a ok tool when you have valid reasons to assume the user is a bot (multiple failed logins, unusual traffic, password resets etc.). Used as a default it only antagonizes users.


I hate hCaptcha, those pictures are messy, they pop up everytime and theres 2 slides of them... Why not just use the honeypot method?


Reading their website:

>Presented Challenges:

>Comparison - Select all images that match query

>Bounding Box - Define bounding area for objects

>Categorization - Identify the corresponding labels

>..and other simple tasks.

No, hCaptcha, no way am I going to train your neural networks for free, so please join Google on your journey to hell.


hCaptcha makes money by having humans label things to teach machines. This suggests that at some point, the machines will be nearly as good as the humans at labeling. At this point, until the humans are tasked with a different training exercise, a bot will be effectively indistinguishable from a human via hCaptcha.

If there's value in it, it sounds like a spammer could train an hCaptcha-defeating bot via hCaptcha.


I think it's likely they'll add new types of exercises as this starts to happen.


The exercises are dependent on what researchers are paying for. The spammer needs to keep up. Which they can do by sampling hCaptchas and figuring out what tasks are being done.

The question is how much value spamming provides. If it's significant, then it pays for the arms race. Given the investment in beating reCaptcha, it seems that spam provides high value.

Edit: And! And! hCaptcha could become victim to a malicious bot that feeds bad data into other researchers' training sets. The bot would need to reply to an overwhelming number of hCaptcha captures, such that it becomes the dominant validation set. It would be a self-fulfilling model. It'd even be performed at the expense of the legitimate researchers, who must pay ethercoin for every hCaptcha response. Much like cryptocurrency networks, hCaptcha is susceptible to a cartel. In this case, it'd be a bot-cartel, feeding bad data. Humans could be locked out, as hCaptcha might be convinced the humans are bots and the bots are human.


Wait they actually use the labeling for something other than admittance? Well that takes care of my concern about this tech then, gotta go back and change some of my other comments...


Google uses reCAPTCHA labels exclusively for themselves, and is extracting hundreds of free person years of labor from internet users every single day via this. hCaptcha lets anyone access this type of service, provided they follow certain ethical AI guidelines


How long has the content of the recaptcha puzzles been entirely unchanged, despite being shown to billions of users? Like five years? It should be painfully obvious that there is no actual labeling going on at this point, they have all the training data they'll ever need for traffic lights...

And by a corollary, since they haven't started labeling different kind of data, it's clear that either they no longer need any kind of labels at all, or this is actually not a cost-effective way of doing it.


For me at least, the images on recaptcha have been getting much much worse, to the point of often being almost indistinguishable. So although they are still typically asking for the same things (cars, buses, stop signs, traffic lights) they do seem to be actually still making progress on the labelling effort.


Anecdotally I don’t think I’ve ever seen the same image twice on recaptcha.


How does the recaptcha bottom help train AI?


This is how Skynet starts.


I see there's several "captcha solving services" (Google the term, I don't want to link any) that charge in the $1's per 1000 solves. This makes me wonder how effective captchas really are. Do they just raise the bar high enough so only spammers actually making money attack the more lucrative sites?

Can anyone talk about their experience running a (large) service with a spam problem where adding a captcha helped? How about still battling bots/spam/abuse despite having a captcha?

Supposedly these services have humans solving these. I remember hearing about a bypass in use where the attacker would pass-through captchas and present them to users on their own pirate/torrent/porn/etc sites, and then when the user solved it, they'd get at the content and the spammer would do whatever they were doing on the original site. I wonder if that technique is still in use, or if there are people specifically sitting there solving captchas all day being paid fractional pennies per solve?


Captchas are not effective at preventing attacks at all. There are many human and AI powered services as well as secret exploits that attackers can use. Captchas only stop spam, amateur scripts and web-scrapers.

It's a real shame that because of this "safety feature" the web is losing is programability. You can't even curl many pages these days let alone write some programs that connect to the web. This whole bloated scene of browser emulation had to be spawned and now instead of serving 1kb htmls to few friendly bots several megabytes of junk traffic and countless processing cycles are wasted on some menial tasks like retrieving sport match results from the internet.

The problem with these captcha services are _free_ which means people just throw them anywhere. Imagine a world where land mines are free for everyone at the tip of their fingers - you could hardly go outside! Well you can hardly go online now.


A classic case where captcha is essential is ddos attacks. By sheer volume, they are not able to bypass captcha, as they can't solve captcha at million qps. Rate limiting doesn't work under ddos as ddos attack can overwhelm rate limiting itself and also there are still many proxies that simple ip base rate limiting will cause large collateral damage. Captcha is a very effective tool for ddos protection.

For spam/abuse, captcha is mainly about raising the cost of attack, not about eliminating completely, while still minimizing the collateral damage. Captcha is never meant to be a protection against any targeted narrow scoped attacks anyway.

If the attack is small enough that attackers can pay captcha solving service, it's not big enough to matter.

Captcha is here to stay. It is fundamentally a technical mean to deal with the tragedy of commons, and thus won't disappear anytime soon.


Captcha is a terrible 90s technology, it should have been completely destroyed in year 2000.

it's really annoying.


I couldn't agree more, and it seems to be on the rise. I used to get once a week maybe, like for a new signup to a forum. Fine, understandable. Now I get them multiple times per day.

It used to be a minor annoyance and I understood the reason for them in those cases. Now they've moved way beyond the 'pain threshold' and it's moved on to a blind hatred of them for me.

A similar thing happened for ads.. At first they were OK and I was fine with some ads paying for a sites. Then they became more intrusive flash crap. I started getting annoyed. Then all the tracking came in and full-page motion and sound ads and they really started abusing their privileges. Now I hate them so much I will never turn off my adblocker again for any site. They've just overstayed their welcome too much.

The same thing is happening now with captchas. Started as a good cause, but totally took advantage of the users to solve a problem that's just as well solved on the back end in most cases.


Don't worry, the next iteration of CAPTCHAs will rely heavily on browser feature detection, ensuring that you're not using an unauthorized version of Chrome that can be automated. Only authorized and trusted browsers will pass the tests.


Which will mean even more restrictions on the types of browsers we can use, and gathering more information. Not OK either.


This was my point. Google is already doing this[1] for account access, and they will prevent competitors' browsers from logging in.

[1] https://news.ycombinator.com/item?id=25172755


If it should have been destroyed what is the reason you think it hasn't been? People aren't adding captchas to websites for fun, it is clearly solving a problem for them. So what do you propose instead?


It's solving a problem for THEM at OUR (the users') expense. We should fight this more heavily just like we do ads and tracking and hopefully this practice will at least stop expanding. The amount of sites now showing captchas is crazy.

Or introduce a law where they have to pay us for using up our 'brain time'.


A law that they have to pay you for using their free service?


I see the product is offered in 2 plans, though the only paid plan is enterprise without publicly disclosed pricing. Anyone has info whether it's affordable also for small and bootstrapped businesses or it's primarily focusing on larger enterprises?


Last time I enquired they quoted starting $999/mo for 10m verifications...


Thank you. That's not a small sum, though they probably have some internal tiers per number of requests so for lower traffic scenarios the monthly price might be more affordable.


I asked and they said that’s the minimum, and they had no plans to introduce smaller self-serve tiers.


I was a bit surprised to not find a reference to the singularly most irritating thing about Google's recaptcha - that it treats me - the one trying to authenticate - as a free source of data labels for its ML systems. I guess I'm unlikely to be the only one irked by another "identify all the bicycles" challenge.

Do the labels I provide belong to me or to Google? .. when I signed no job contract with them to provide that information.

edit: aha! https://www.hcaptcha.com/labeling - that's why it wasn't mentioned. One more labelling service that I won't like.


"hCaptcha has grown into the largest independent cybersecurity service in the world"

I feel like Cloudflare might have something to say about that, given that Cloudflare is an independent cybersecurity service and uses hCaptcha.


I appreciate at least having a good option that doesn't involve putting a Google product on my website. We tried to use our own captcha back in the day, and then used some third party, and they just weren't good enough. I'm glad to see them getting this market share because it means that they will get the opportunity to improve based on a large set of users similar to how Google is able to make their products so good.


Just tried the demo and found it somewhat confusing

"enter your name and your favourite vegetable" is to lure bots into responding?

When "I am human", I am asked to select boats and fail because I didn't tick the images with ships.

I would not like to add a Captcha that appears to be smart Alec, has a tendency to trigger the same in the respondents or, worse, makes customers feel stupid. reCaptcha somehow seems better at avoiding this.


Never heard of them before but went to their main site and got served the Romanian version which is so bad I think it could be used as commedy. Also loved how their front page image shows a barrier with their logo stopping some robots but the robots are facing the backside of their logo/stop sign!

Like, hire me for more!

I'm curious how can they train ML models while preserving privacy. Where does the corpus come from?


Is that meant to be a proud boast?

Because, to me, it equates to someone bragging that 15% of the people they've slept with now have syphilis.


Is this mainly thanks to getting Cloudflare as a customer?

Congrats, albeit I have to say I had less problems with recaptcha captchas. I experienced a couple of cases of hcaptcha just not working correctly and being unable to access something despite the captcha success, which never happened with recaptcha (in my experience).


Is hCaptcha less of a pain for users who block tracking (with Brave, Privacy Badger, uBlock Origin, etc) than reCaptcha is? I'm not sure I've ever solved an hCaptcha, but I find reCaptchas to routinely be incredibly time-consuming, and I suspect it's because I block their trackers.


I'm using Brave plus NextDNS, and am on a CG-NAT IPv4 address, so reCaptcha absolutely hates me. I always need to do at least 5 rounds of matching before it'll let me past. My record has been 12 rounds.

hCaptcha seems to pretty consistently only make me do 2 rounds, so I certainly prefer it.


Google reCaptcha likely discriminates against non Chrome users, hCaptcha does not. Also, we support and co-develop https://www.hcaptcha.com/privacy-pass !


While privacy is definitely my main concern here, it's not just privacy that's the issue here--I believe people should be compensated for the work that they offer society and if Google is using a captcha to create driverless cars then it's obviously antithetical to this premise.

I always try to miss some of the obvious items or make mistakes and I (almost) always get through. There's only one service that uses a Google captcha that I continue to use, so it's not really a huge issue for me anyways, and I have decided to stop using it!

It's not too difficult to host your own captcha, I don't see why this can't be an open-source effort.[1]

[1] https://github.com/dchest/captcha


It’s good to hear that alternatives to any Google tech are gaining market share.

For me, all captchas are a stain on the web - in most cases, shifting (and multiplying) the wasted human hours from the company collecting data (eg the owner of the contact form) to the user (the person completing the contact form).

The company is saved from filtering through contact form responses from bots (spam and injection attempts) but simply shifts the work to the user who they hope to pay for their service, losing countless enquiries from frustrated users.

In my opinion, the only acceptable use for captchas is when you’re making a useful, free, no-login-required service available to the public and even then should only be brought in after bursting reasonable rate limits.


> For me, all captchas are a stain on the web - in most cases, shifting (and multiplying) the wasted human hours from the company collecting data (eg the owner of the contact form) to the user (the person completing the contact form).

I run a contact form for a small business. Explaining to them why their tiny website has tens of thousands of spammy requests filled with porn keywords is not easy. Of course webmasters add reCAPTCHA, because they're vastly outnumbered by bots and users.


I did this too. Providing contact forms for lots of websites.

In my opinion, back end filters are the solution to the problem you’re describing. Not making the genuine users jump through hoops.


Their demo even doesn't work for me. It displays an error message in my native language similar to 'Rate limited or network error. Please try again.'. Reloading doesn't change the situation (maybe related to my ad blocker).

Doesn't make the best impression...


I like that this isn’t Google and isn’t tracking people. I tried this on its own homepage. I didn’t find it a whole lot easier (I was asked to choose photos with boats in them), but it was a little more easier than the blurry photos on reCAPTCHA.

What I didn’t like: when I looked for pricing information, I saw that there’s a free tier and there’s a “Contact Sales” tier (for enterprise). There is no intermediate level if you want finer control and just want to know how much that could cost. If anyone from hCaptcha is reading this, I’d strongly recommend adding one or two more tiers or expanding the feature set of the current free tier, at least for some level of granular control.


I vastly prefer hCaptcha to recaptcha because it doesn't treat me any differently just because I'm using Tor. Recently (last year or so), Google has started to explicitly refuse service to me when I use Tor (only after forcing me to solve a CAPTCHA for 3 minutes).

Question for dang -- why does HN continue to use recaptcha? It's impossible to signup via Tor, and Google is a user hostile company.

One thing I will note, is that hcaptcha seems to be more loose in what answers it accepts. Sometimes I click random images and it still lets me pass.


Aren't Apple and Google almost obviating the need for this, since they control their duopoly platforms?

https://developer.apple.com/documentation/sign_in_with_apple...

PS: Anyone have the corresponding Google link?


Is there any place where you can just solve captchas from hCaptcha? I was curious what kind of questions/labels it displays.


https://www.hcaptcha.com/ has a "Try it out" section on the homepage you can repeatedly solve for examples.


How is hCaptcha for page performance? Google Recaptcha ruins your page speed rankings as it ideally has to load chunk of JavaScript on all pages on page load to continually monitor user behaviour. You could e.g. only load it when someone starts filling out a form but this kind of integration isn't standard.


You only put hCaptcha / BotStop where you actually need the protection, and it certainly doesn't follow your users around unnecessarily


I cannot stand captchas. I cannot even solve them myself sometimes and I think I am human. Or when they show "1/5", then at the 4th they add + 2 because why not... Stop wasting my time. This is actually done by hCaptcha.


I haven't tried Firefox + AdBlocker in a while but I always had the idea that in a very Mechanical Turk way I had to earn my few cents worth income for the machine before I was allowed to have no ads and live outside of the Google Empire.


I'd really love these captcha services to allow me to download an extension that allows me to bypass their captchas. This extension could verify my identity and/or device, or just keep an eye on my behavior to validate that I'm human.


Are you aware of their accessibility cookie? https://www.hcaptcha.com/accessibility

It's not exactly what you were asking for, but it can make their disease less painful


The sub-header of this blog post is "You can beat Google by putting privacy first", and at this point it's interesting to think of how many businesses have done exactly this for an entire range of Google products with success.


I can’t be the only one offended by the unpaid labor involved in CAPTCHAs (training self-driving AIs or whatever).

I wish Apple would offer a way for sites and services to verify that a client is indeed human via Touch ID/Face ID.


Please die a screaming death. All of you captcha services. I hate you.


reCaptcha is horrible. If you are a business, please stop using it. It feels like working for Google. why the hell should I have to do work for Google when I want to do business with you?


Almost all of the users that Cloudflare shows captchas to are Tor users. Is there any reason why this couldn’t be done without Javascript?


I don't believe this to be true. I've never touched Tor, and am constantly hounded by different Captcha mechanisms. I do aggressively block tracking, though.


tbh it no ammount of captcha will help a popular platform. If you go black hat for around $150 in private proxies + poster bot + spinner + (insert capcha service here because I dont' want to advertise them) you can pretty much spam anything for a while.

That being said it does make it harder to spam if you don't have a budget to start with.


Curious what people think about hCaptcha?


For me I find hCaptcha harder and more ambiguous than what I have experienced with reCaptcha.

The ambiguity might be because my phone is set to Japanese. I got asked to identify 電車 (Densha, electric train), but there were also images of diesel train. Is it translation error, or is it really asking me to identify electric train? (The correct term for train in general in Japanese is 列車 ressha)


Great catch. If you'd like, you can actually submit a better translation via a pr here: https://github.com/hCaptcha/hcaptcha-i18n


I don't think you can? That repo has the translations of the strings used by the "meta"-part of the service, stuff like "accessibility information". I can't see any text used by the actual tasks.


Far, far superior to reCaptcha from a user's perspective using Tor. I can solve a few puzzles with hCaptcha and I know I'll get through and see the content. With reCaptcha I might solve 3 puzzles and then get denied anyway, or I might solve 10 puzzles with no end in sight and give up.


Not sure if I'm seeing patterns in noise, but with recaptcha I felt that timing and mouse movements are just as important as clicking the right images. It feels like solving the puzzle quickly and using minimal mouse movements has a higher chance of being denied compared to waiting some time between clicks and adding some superfluous mouse movements.


It also seems to be this way too. If I am using Tor, it seems like I need to swirl my mouse around a few times to pass a recaptcha.


But also far superior to reCaptcha from a bot's perspective using Tor. Don't get me wrong: I also hate solving a gazillion captchas only because I'm using a VPN or get outright denied because my IP address happens to be a Tor exit node. At the same time you have to acknoledge that captchas don't stop bots, only slow them down or increase their operating costs. People in third world countries happily solve one image recognition challenge by Google or hCaptcha for far less than a penny. If Google's goal is to drive the costs up for malicious bot operators, then they're definitely doing the right thing. In the end, it won't stop them either since buying a couple thousand infected computers is probably not that expensive, but it is yet another stepping stone for anyone trying to bypass their captcha.


On the other hand, many Tor users disable Javascript. Out of curiosity, does anyone know a good CAPTCHA that doesn't use JS?


BotStop by hCaptcha (enterprise). AFAIK there's also a Google reCaptcha frontend that doesn't require JS. Neither companies offer those solutions because of privacy or security benefits, but to support old IE versions of course.


> BotStop by hCaptcha

Cool, I didn't know about this! Too bad there doesn't seem to be a strong open source option, but I'll check this out.


Personally I find it much more difficult than recaptcha. The tests themselves aren't much worse but since I have a Google account HCaptcha shows up more frequently.


Small price to pay to prevent Google from having tracking scripts and metrics on users of every major website. hCaptcha at least somewhat values user privacy.


We switched a couple of months ago. Integration wasn't more or less difficult, works alright with our test suite. Some users feedback "Why do I have to identify boats?" but otherwise we found no difference number of blocks or user behavior.


Dislike it, to say the least. Since it doesn't seems to remember that I'm a human (reCpatcha did), now I have to constantly solve captchas on CF sites.


With hCaptcha (enterprise ver) this is entirely under the control of the customer.

We're not "remembering" in the same way, but have good enough instantaneous scoring to correctly guess whether or not a challenge is required most of the time.

Some customers may disable that option to meet their requirements. Not much we can do about that :)


Thank you for clarifying that, hope that CF decides to use this feature.


It's been terrible, when Cloudflare switched to hCaptcha it not only broke our app but it seems every time they make a change it breaks something else. We might be a special case but it's hard to say since all they say it's "quirks" and they keep making changes without announcing anything.


I really don't like it. They are generally harder and if I visit a CF site multiple times, I have to solve them multiple times. Even if this is up to the customer, it's annoying and should be discouraged.

I didn't mind solving a reCaptcha once, I mind forcing myself through these every 10 minutes.


I don't like it because Buster doesn't work on it.

Also since only cloudflare uses it, and I dislike cloudflare, I have this irrational hatred of it.


>because Buster doesn't work on it

Assuming you mean Debian Buster: Then get a newer Firefox version from backports. This is more related to Firefox ESR than Buster.

Edit: Nevermind... After reading other replies, I think eznzt referred to https://github.com/dessant/buster


It is more annoying than reCAPTCHA if you don't aggressively block tracking. because I rarely need to solve it manually in reCAPTCHA, while it is not the case with hCaptcha.


Yesterday I noticed CloudFlare uses it so maybe that is the reason for 15 percent of the internet.


Honestly, I thought hCatpcha was being run by OpenAI in order to provide the benefits of millions of people training AI, but they seem to be completely unaffiliated.

What happened to OpenAI? Do they belong to Microsoft now or something?


Anyone have an example of hCaptcha usage?


15% of the web, surely?


hydrant car airplane crosswalk traffic light nightmare


15% of the web. And probably only of the commercial web.


ugh, the craigslist ones are annoying


ublock should include captcha


thanks for making something that has capacity to break google recaptcha v3.


Anyone know why their seems to be military imagery on hCapcha? I don’t want to train anything for the military...


This sounds kinda nitpicky but I think it's an important distinction. We're talking about the web, not the internet, right? Or is hcaptcha also used for iOS apps, Android apps, etc.?


hCaptcha runs anywhere, even works on Telegram groups: https://github.com/hCaptcha/telegram-bot


That distinction is already blurred beyond recognition. There's nothing that makes a phone app more or less "web" than a SPA, for example.


The distinction between iOS apps built in Swift and distributed on App Store (for example) and a website built in JS/HTML/CSS and distributed via the web's decentralized architecture is pretty clear IMO. Case in point I don't see hCaptcha/reCAPTCHA on iOS apps as frequently as I do on websites. The way I was taught in my elementary networking class was: the web is an application of the internet. The internet as a concept is much broader than the web.


"The web's decentralized architecture" isn't a thing. There are clients and servers and communication protocols, that's it. There are plenty of web apps distributed from custom stores. And the web doesn't mandate using JS/HTML/CSS.

A client (whether browser, CLI, phone app, bot or anything else) making resource requests identified by a URL to an HTTP server is as "web" as it gets.

The reason you see fewer capchas on iOS and Android is that it is a lot harder to automate them for spam, that's all.


I mean Wikipedia also phrases the web as a subset of the internet, but as I worried, I think we're just starting to debate semantics. Using "web" interchangeably with "internet" is just confusing to me and that's what I was trying to clarify. I did learn however that Telegram's usage of hCaptcha suggests that this is a technology bigger than what we usually call the web.

> The Internet carries a vast range of information resources and services, such as the inter-linked hypertext documents and applications of the World Wide Web (WWW), electronic mail, telephony, and file sharing.

Edit: but to be fair I totally brought this semantics debate upon myself by raising the question of the difference between the web and the internet. Not sure why I took on a whiny tone about that, haha




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: