Hacker News new | past | comments | ask | show | jobs | submit login
mCaptcha: Open-source proof-of-work captcha for websites (mcaptcha.org)
207 points by notpushkin on Aug 8, 2023 | hide | past | favorite | 145 comments



I don't understand the premise. The point of a CAPTCHA is to tell Computers and Humans Apart, that's what the CHA stands for. You cannot hope to do this test using a proof-of-work system where the work is computer work.

Call this a client rate-limiter, or whatever else, but it is obviously not a CAPTCHA and cannot function in this way.

Another obvious problem is that server hardware is vastly more powerful than the average user's device. If you set your challenge to an amount of work that doesn't meaningfully drive users away and/or drain their batteries, you are allowing a malicious server to pass your challenge tens of thousands of times an hour.


Telling computers and humans apart is a wrong goal. Every request comes from a computer that is commanded by some human. And why shouldn't users be allowed to use automated user agents when they don't do it for spamming or anything malicious?

CAPTCHA is essentially a proof-of-work variant where challenges are designed to be solved by humans rather than computers, and same as any PoW it works by means of consuming some limited resource (human time, processor time, energy).


A lot of times the purpose is more on rate limiting than disallowing bot access. The goal to tell apart is on the premise that humans are a lot slower than bots.


In our SaaS we have usage limits and rate limits. Have never needed to implement "bot detection" for this reason


How do you rate limit a botnet coming from tens of thousands of different IP addresses?


For anonymous/free users we have very strict usage limits and the functionality is more limited to only operations that cost us less money. So a very targeted attack would do damage but that is true of basically any system and we could flip on bot blocking in Cloudflare if needed and if that would help


Cloudflare's bot blocking uses CAPTCHA... By your own admission, the only reason you don't have a CAPTCHA is that you haven't needed one yet.


Again, we have rate limits and usage limits in place. You know that you can pay to have Captchas automatically solved, right? It's not the solution to all problems. Obviously if a targeted DDOS happens then some changes would be required.

Also, that is no longer the case that Cloudflare uses Captchas for bot blocking. That's the legacy mode


The fact that you can pay for both doesn't make them equivalent. To have a similar cost for spammers, you would need to request a challenge that takes many minutes to solve, which you just can't do. There is a strict limit on how long a user will wait for your security check and you can't pretend otherwise.

Let's stop pretending that all things are in the same bucket because "you can pay to have it solved". That's such a weird claim. For the right price you can have someone rob a bank for you, that doesn't mean it's as safe as your $2 padlock.


Way to completely miss the point

At this point you are just arguing for the sake of it. What is it you are even trying to debate at this point?


The point is way upthread, it's literally the top comment on this submission. I don't know where you got lost on the way.


We already do rate limiting. We don't need a captcha that can be automated away for that.


I always figured that CAPTCHAs worked because they limited on a resource that was harder to steal - human attention.

Rate limit by IP, and you get attacked by a botnet that "steals" IP addresses with malware.

Rate limit by PoW and you get people stealing AWS accounts, or using aforementioned botnet. See bitcoin mining.

Rate limit by CAPTCHA and you have to get a lot more clever (see things like setting up porn sites and proxying CAPTCHAs there)

So while you can pay to have CAPTCHAs solved, you actually DO have to pay and can't just steal your way in, so it means your target has to be more valuable.


> So while you can pay to have CAPTCHAs solved, you actually DO have to pay and can't just steal your way in, so it means your target has to be more valuable.

None of these things you listed above are available for free. They all require either effort to obtain or paying someone to do the work.


Someone did the math down thread: https://news.ycombinator.com/item?id=37056504

Unless you set your challenge to many minutes of work, you are not competitive with the human-centric solutions.


Can you steal AWS accounts with no effort?

And keep stealing them after you get blocked on the first ones?


The main goal usually like anti-spam or anti-scraping.

Some shop (for example, concert ticket-selling) have very limited supply and high demand, and don’t want automation in buying.


I see you don’t understand why people make websites or systems. Or why people make bread.

I don’t make application so that users benefit or to make them happy. I make applications so that I can earn money.

Earning money requires having human on the other side. Just like you are not making bread to make bread and throw it into a shredder.

If someone has scheme where automation is beneficial they will create API for their system. You should use API if I provide one. But when I create UI then I create it for people to use it.


> I don’t make application so that users benefit or to make them happy. I make applications so that I can earn money.

This is why most commercial software is so bad.


And open source maintainers are burning out or writing rants how no one wants to pay.

There is no “non commercial software” that is better even if commercial is bad it is still better than non existing one.


Why not both, make money and benefit people. I think that’s what earning money means. Otherwise you’re just making money at someone else’s cost.


You always have to do software in a way that people will benefit because otherwise they will not pay.

Read again my down voted post and think about the sentence in context of post where "Fice" wrote: "Telling computers and humans apart is a wrong goal.".

Then add to that topic of CAPTCHA and that CAPTCHA is annoying for users so adding CAPTCHA is not beneficial for users so it specific case and discussed in context.


Is server hardware vastly more powerful? If you use a hashing algorithm that isn't easily parallel, then you're dedicating a single CPU core for that exercise. Now a server may have more cores, but they are often slower per-core than a client machine. And dedicating server resources has a cost. You'd slow a brute force attack to a relative crawl, especially if the target has a large volume of pre-defined work and answers.

PBKDF2, as an example on 100k iterations can easily pin a CPU core for a few seconds. This is part of why I always have my authentication services separate from my applications, it reduces the DDoS vector. Now, you can shift work to the client as kind of an inverse-ddos rate limiter.

Combine that with a websocket connection, where the browser is sending user events like mouse movement, touch, scroll, focus/blur and input/paste... the two, combined with event timing analysis can give you a pretty good guess if something is a real user. And if it isn't, definitely slowing down bots.


Even if your server is not vastly more powerful, your 1 second of proof-of-work means a single server can pass your challenge 3600 times an hour.

The point is: a CAPTCHA has to be something that is easy for humans and hard for bots. This is at best the same level of effort from human('s devices) and bots. And realistically, more, because bots aren't battery-powered. It can't work.


> a CAPTCHA has to be something that is easy for humans and hard for bots

Do you know of any such things? Because I routinely find captchas difficult now.


I've had this problem a lot when I use a VPN. You're served a captcha that is impossible (I choose all of the correct squares and it still fails), and then I'm given a captcha with the ultra-slow click and reload images. At this point, I think it's more of an IP rate limiter than a human-bot detector.


but then some other services don't degrade like that and still offer you some easy 2-step puzzle "rotate a pic until panda is not upside down" or "find a panda"


Yes, due to the emergence of better bots, traditional CAPTCHAs aren't very good at being CAPTCHAs anymore either. It's a hard problem to solve, and it's a moving target.


> Even if your server is not vastly more powerful, your 1 second of proof-of-work means a single server can pass your challenge 3600 times an hour.

A decentralized CAPTCHA that reduces an attacker to one request per second is a lot better than nothing! Why are you dismissing this as useless?

At the end of the day, all CAPTCHAs can be circumvented by paying humans to solve them. So all CAPTCHAs have a price, and in this case it’s the price of the power used by the CPU as well as renting the CPU (or the depreciation on a CPU you own).


But it does not. It reduces it to 1 request per second, at least, per core, per machine that the attacker control. A single attacker can still send millions of requests per hour at very low cost, limited only by compute resources, which is what CAPTCHA is supposed to work around (by challenging the human not the machine).

Downthread, emurlin has done the calculation for the actual cost of the deterrent and how bad it is compared to CAPTCHA: https://news.ycombinator.com/item?id=37056504


Similarly how many security features work, it doesn't have to be 100% (or it may even be impossible to make it 100%), it just has to be good enough/make the attack expensive enough to deter it. There aren't really any easy task left for humans that a suitably trained ML algorithm couldn't do, and anything more complex would just annoy people. Even if there is such a task, the line moves quickly -- back then reading some colored digits from an image was unfeasibly hard/expensive for bots. Nowadays your phone extracts text from your images in the background.

In this vein, anything requiring ML/expensive computation is still a worthwhile addition, as today the primary purpose of a CAPTCHA is to slow down/rate limit bot-activity. Your single server use case is not really realistic -- it can be easily reverted (it won't come from 3600 IP addresses, otherwise the rate would be much lower), and 3600 times an hour is.. not a lot for a computer. So it seems to do its job well.


> Is server hardware vastly more powerful?

Actually no. The server CPU has lower GHz and server memory is slower due to ECC.

But server has lotta more bandwidth to handle concurrent processing.


The average user is on a 3-year-old Android phone with 40% battery. The average server has 32 processors and industrial-grade cooling.

Sure, it is possible that your gaming PC beats the average server in terms of CPU frequency. But that's not what the average website visitor is using, and you can't scale the proof-of-work out of their reach.


That would work for some desktop and very few laptops only... and only if the task cannot be ported to GPU. Other than that Javascript code would be ported to C.

This very case is far worse as it uses SHA-256, all that bitcoin asics love.


¯\_(ツ)_/¯

It's a semantic expansion. It happens all the time in language. That's not a meme! That's just an image with a caption on it!

CAPTCHA is widely known as a thing that is implemented to prevent spam [0]. This is a thing that is used to prevent spam. It's CAPTCHA now. Here, the concept of preventing spam is communicated through the word CAPTCHA.

"mRateLimiter: Open-source proof-of-work rate limiter for websites"

Huh? What is this thing, what does it do?

[0]: Speaking of the word spam... You're not spamming! Spamming is when you send junk email! You're just pressing a button on your controller over and over again!


It's typical HN: word definitions don't matter and can be tortured to death to mean anything unless one wants to nit-pick then people better use the most academic, agreed-upon and official meaning of a word.

Now back to updating the sophos captcha appliance at work.


Human language is a thing of beauty


> Speaking of the word spam... You're not spamming! Spamming is when you send junk email! You're just pressing a button on your controller over and over again!

The gaming use seems to precede the email use by quite a bit, and be part of the route between the Monty Python sketch and the email use, FWIW.


I'd say you're too pedantic. Given both computer work (calculating hashes) and human labor (filling out reCAPTCHA) have a price point, it is only a matter of making automated actions more expensive to scale. It's only natural then that the word definition has shifted.

Let's just declare that captcha now stands for Completely Automated Public Thingy to Make Spammers And Fraudsters Life A Bit Harder.


But it doesn't catch fraudsters!

Point fo captcha is to make sure that there is a human eg. writing this comment or creating account.

If I used this (admitedly cool and useful) rate limiter instead of real captcha I would have 1000s of ai generated posts and 100s of new accounts. Yes, it would be rate limited and spread over a day or week, and servers would easly handle it, but that's not the point. I don't want this fake activity at all - that's the point!

This seems like a good alternative/addition to cloudflare and their anti ddos features though (?)


But a traditional captcha doesn't solve that either. Even if the captcha really is too hard for a bot, you can pay other humans to solve captchas for you at a click farm. Or even just generate content and automate everything except the captcha, and solve those yourself.


A dead comment thinks you're making a no true Scotsman argument, but you're right. The key is that the workarounds you're listing are very cheap and easy, not just possible.


There are no easy/non-annoying tasks left that could easily differentiate between a human and a bot, and any that may exist will only work for a short time. The only thing left, as mentioned, is to move the price point for an automated attack: I'm sure creating a fake account on your site is not worth, say, 1000$ for those 1000 accounts. Remember, a troll can also register by hand 10-20 accounts, with any kind of captcha, so it's not zero sum either.


large scale spammers are just going to use free cloud credits they got for pennies on the dollar, it won't stop anyone


The problem is that traditional audio/video captchas are not proof of humanity either. Captchas are a method for increasing the amount of work that an automated client needs to do to access your site. They do not block bots, they just impose a cost.

They're designed to block bots, sure, I agree. But we are burying our heads in the sand if we think that captchas imply humanity. They don't. The tests that they impose are not rigorous or strong enough to do that. What audio/video captchas do in practice is impose a cost in front of automated access.

We'd like them to do more than that, but the tech hasn't really ultimately worked out in that direction so even though we'd like a captcha to prove that a user is a human, what the captcha enforces is just a cost-per-request. Sometimes that involves paying a human pennies to solve the captcha, sometimes it just means turning on accessibility features and piping the captcha into a text-to-speech service. Either way, the final request can still be trivially coming from a bot (and regularly is).


It’s not worse than others, computers are better than I at solving the cancer that is ReCaptcha and hCaptcha. It’s why I let them do it.

edit: To be fair, as another comment mentioned, this would be cheaper to solve.


PoW captchas are usually stupid ideas. You have to set the work factor low enough that low powered devices can do it without significant latency, but high enough that it actually stops attackers. Typically robots, unlike humans, dont care about doing things in real time.

It might stop the really low effort attacks of people who are spamming billions of pages where the cpu time becomes expensive, but i don't think the ecconomics work for most scenarios.

The current price for solving a js captcha is $3/1000 https://2captcha.com/ . The cost of cpu time for your PoW captcha is probably much lower. If people are willing to pay the $0.003 for a human to do it, they are going to be ok with buying the much cheaper compute.


> It might stop the really low effort attacks of people who are spamming billions of pages where the cpu time becomes expensive, but i don't think the ecconomics work for most scenarios.

FWIW this is already where captchas are primarily effective and afaik this is 99% of why people use a captcha at all.


One advantage of PoW captchas is that they don’t require the user to click anything. Assuming all captchas are borderline useless (so blocking stuff doesn’t actually matter) that’s a big improvement!

However, that doesn’t apply here, since, for some reason, mCaptcha’s dialog box contains a check box.


I've not looked at the specific implementation here, but Tor's implementation[0] includes a dynamic difficulty scaling.

When under attack, legitimate users will experience a moderate delay whilst attackers will need to scale their compute.

[0]: https://gitlab.torproject.org/tpo/core/torspec/-/blob/main/p...


> The current price for solving a js captcha is $3/1000 https://2captcha.com/ . The cost of cpu time for your PoW captcha is probably much lower.

The cheapest AWS instance I can find with a dedicated CPU costs $0.000565 per second. So running four cores for two seconds already makes it more expensive — and much less effort and time for a human to solve than a traditional CAPTCHA.

> It might stop the really low effort attacks of people who are spamming billions of pages where the cpu time becomes expensive, but i don't think the ecconomics work for most scenarios.

How are regular CAPTCHAs any different if they cost $0.003 for an attacker to solve?


AWS is some of the most expensive infrastructure available. And people who break these things tend to rely on stolen CPU time.


honestly I think captchas that track you are stupid ideas.

For pow, can't you just turn the dial to more work?


If your captcha takes 20 minutes to complete, your users will leave.


Because obviously there's no middle ground between 0 and 20min…

If your user need to wait for 1s, then it will probably not affect traffic top much (especially if you do it concurrently to loading your bloated webpage that already take a few seconds to load), but you've effectively lowered your spammer's throughput to 1qps (if you're using a memory-hard PoW scheme using Argon-2 or equivalent, the attacker cannot really speed things up by using a beeffier machine, by design of the cryptographic protocol)


Something that takes less than 20 minutes on grandma's old laptop will take less than a second on a powerful server.


Not with memory-hard schemes, that's the point of it actually! And that's because you don't have a 1200x increase in memory latency between your grandma's laptop and a server, you barely have an order of magnitude in the most extreme scenario.


Meanwhile a spammer with the same machine will leave 72 messages every day that you will have to clean up for however long you keep the site up.


And, through the magic of cloud computing, they can multiply 72 by an arbitrarily large number without increasing their cost per captcha.

I get the impression this is mostly only useful against ddos attacks. They do start ramping up pow cost at 5000 requests per second.


The whole point of this system is that the difficulty is set to automatically increase when it detects it's under attack. That's why expected traffic and likely failure traffic are configurable options - when the server is experiencing a higher than normal load the difficulty is ramped up to dissuade those attacks. Yes, when this is happening a real user will also have a slower experience, but they would anyway if the server was being kept busy by the DDoS.


Because spammers use a single piece of hardware matching the exact spec of that low powered devices.


And then spammers will also leave. Problem solved, one way or another.


I recently conducted an experiment - I removed client side CAPTCHAs from a form that had reCAPTCHA V2 and a ton of spam was getting through and instead sent the content to Akismet for scanning. It cut the spam getting through to 0.

It made me think, are client side CAPTCHAs really worth it? They add so much friction (and page weight - reCAPTCHA v3 adds several hundreds KBs) to the experience (especially when you have to solve puzzles or identify objects) and are gamed heavily. I know these get used for more than form submissions, to stop bot sign ups etc…

I feel like it’d be just as/more effective to use other heuristics on the backend: IP Address, blacklisting certain email domains, requiring email validation or phone validation, scanning logs, analyzing content submitted through forms


Then you'd just give visitors of your websites no recourse and no information whatsoever on how to fix the problem. The benefit of client-side CAPTCHA is that humans at least can pass it and fix the problem even if something they don't control (such as their IP address having bad reputation due to shitty ISP) is causing problems.

As a website operator it's easy to look at the spam that is getting through and be happy that's it's zero. But do you get any idea how many actual humans that you have incorrectly rejected? You don't have that data and it's really easy to screw up there.

Of course if your website is small nobody cares. If you are bigger like Stripe you simply get bad publicity on HN. People on HN love to hate on mysterious bans and blocks just because they do something slightly unusual and your backend-only analysis flags them as suspicious.

Abuse fighting is hard.


>Then you'd just give visitors of your websites no recourse and no information whatsoever on how to fix the problem.

This is a weird assumption. What's preventing a backend system from saying "Hey, we think you're a bot. Here's an alternative way to contact us."

You obviously don't want to give away enough to help bot developers get through your system, but that's not the same as no resource and no information.

>But do you get any idea how many actual humans that you have incorrectly rejected?

Yes - like I said in my other comment, this new system actually logs all submissions. It just puts the ones it identifies as spam into a separate folder. Akismet also has the ability to mark things as false positives or false negatives.

I think that automated form submissions are very context specific. So, the example I wrote about is for a marketing site, and it's a business that primarily targets other businesses. Most of the spam it gets is for scummy SAAS software, SEO optimization, etc...

But my personal website has a very simple subscribe by email form. There were definitely a few spam submissions - someone just blasting out an email address and signing it up to whatever form would accept it. When I implemented double opt in - gone entirely.

My larger point was that as an industry, we seem to have just capitulated to client side CAPTCHAs. And it sucks. It's one of the many shitty things about the modern web. But I think it's become just an assumption that it's needed, and we haven't reexamined that assumption in a while.

I think it'd almost be better for there to be something could spin up in a container that has a base machine learning model, but can "learn" as you manually indicate messages etc... and then you can also choose a threshold based off your comfort level.


I think the idea here is that 1% of users with the shitty ISP is going to have a much worse experience than anyone was having with the captcha. This is super context-dependent, but being told I need to "contact an administrator" when I submit a form on a website is a good way to make me log out and look for alternatives to whatever service I'm using.

To me, the question is this: would you rather give 100% of your users a kinda shitty experience, or 99% of your users a normal experience and 1% a nightmarish awful shitty experience. The answer probably depends on use case.


> This is a weird assumption. What's preventing a backend system from saying "Hey, we think you're a bot. Here's an alternative way to contact us."

Not a weird assumption, but a necessary assumption based on considerations of scale.

A small-scale website that doesn't receive too much spam attempt can manually classify spam by human agents. A medium-scale website can have CAPTCHA to let through some visitors and the rest goes to human verification. You appear to be in this bucket. When the scale is huge, no other alternative way to contact exists. CAPTCHA becomes your only tool.

In other words, CAPTCHA is only necessary because of scale; what do you think the first A stands for? But because of scale, alternate ways stop working.


>When the scale is huge, no other alternative way to contact exists

1. This still doesn't preclude giving a blocked user recourse or information. Like how a streaming website will say "Hey, you're using a VPN. We don't allow that" - the user's recourse is to turn off the VPN, or find a new VPN that their service won't detect.

2. The case you're outlining is different from the scenario that most users are presented with a CAPTCHA. I encounter it when I am using a VPN and Googling something with Incognito mode. That means Google has already applied some heuristics and thinks that chances are higher than normal that I'm a bot (not logged in, no cookies allowed, masking IP address) before presenting the challenge. In those cases, you're probably correct that presenting a CAPTCHA is a reasonable option. I just think it's weird to have CAPTCHA be the default/first line in many cases. Especially with the focus on things like converting users.


> Like how a streaming website will say "Hey, you're using a VPN. We don't allow that" - the user's recourse is to turn off the VPN, or find a new VPN that their service won't detect.

No, the user's recourse is to stop using the streaming website and go back to piracy instead.

Any speedbump to UX is a lost customer. You can not and should not assume that users are going to jump through hoops, because the overwhelming majority will not.


I mean, the vast majority of people will not "go back" to piracy. Piracy isn't an option that's on the table for them. But you're missing the point.

>Any speedbump to UX is a lost customer. You can not and should not assume that users are going to jump through hoops

So... CAPTCHA isn't a hoop? Both scenarios are hoops.


> What's preventing a backend system from saying "Hey, we think you're a bot. Here's an alternative way to contact us."

In what way? If I got flagged and had to take additional steps to remedy a form submission I would probably just never go back to the site. The only way this could work is by identifying the issue in real-time and then sending a CAPTCHA to be completed by the user client-side while they're still handling the form.


> to remedy a form submission

The correct way to deal with an error in general is to return the form as filled by the user on the error page. Sadly so many SPAs just ignore this, that I usually manually copy before submitting any meaningfully sized text.

> I would probably just never go back to the site

Of course you would need to measure and reasonably reduce false positives. But in case you are serious about getting users to report them, an effective solution I've found with minimal friction is to fully use the mailto protocol scheme. Online shops can be static sites up to a scale, by adding product IDs and quantities to the mailto body, and having the customer order via e-mail.

> The only way this could work

Depending on your target audience, a CAPTCHA might not be possible.


>an effective solution I've found with minimal friction is to fully use the mailto protocol scheme.

This is another assumption that I am not so sure is valid anymore: that there are hordes of bots out there scraping every email they can find.

My personal website has had a raw mailto "Contact" button for several years now. Earlier this year I changed that email to an address I only use for the website (it's an alias) just as a way of tracking what comes through, and I have not received a single spam email to that address. Maybe I am tempting fate by putting that out there, and some asshole is going to try to ruin it for me. But it's my experience.

I'm more likely to get an email from a recruiter who has used a tool to scrape my email off of Github (though I've made that significantly harder plus nobody wants to hire programmers anymore the ultimate spam control!) than an email from them having clicked through to my website and using the Contact button.

I've gotten several real people sending me legitimate emails through it though. Sometimes they read something I post here, or they find a post from Google. Or it's someone I haven't talked to in a long time and they don't have a current email for me but they are able to reach me via my website.

Here's some cold emails I've been sent over the past 1-2 years: https://i.imgur.com/rMfOqb2.png

My website isn't big, but it is fully indexed by Google and other search engines and has backlinks. If the internet was really this dark forest where there are masses of bots out there ingesting every email they can find, surely I'd have gotten something by now.

I don't research this stuff, I can only share my anecdotal experiences. But it makes you wonder, right?


And just to be clear, it doesn't need to be either captchas or doing heuristic abuse detection on the backend. In the ideal case you're making a decision in the backend using all these heuristics and signals, but the outcome is not binary. Instead the outcome of the abuse detection logic is to choose from a range of options like blocking the request entirely, allowing it through, using a captcha, or if you're sophisticated enough doing other kinds of challenges.

But proof of work has basically no value as am abuse challenge even in this kind of setup, the economics just can't work.


Client side captchas are obfuscated code, so the bar of “the user can debug the problem and fix it themselves” is pretty high.

Also, reCaptcha definitely engages in both hell-banning and allows incorrect answers to pass the test. I assume the logic for those things is mostly server side.


I added this trivial honeypot field to a site’s register-interest form in late 2021, and it has been very effective at culling spam: I had been getting an average of around one spam message a day, but after adding it it took a year and a half for one to get through, and no others have got through.

  <style>.pot{display:none}</style>
  <div class="f pot">
      <label for=username><b>If you are human, leave this field blank:</b> <em>(required)</em></label>
      <input name=username id=username>
  </div>
The whole field is hidden with CSS; the “if human, leave blank” instruction is thus only used for text browsers, but I still prefer to have it.


I wonder if this is filled out automatically by browsers like Chrome when doing auto fill...


I would be extremely surprised if it filled it: `display: none` makes the field not be rendered, and autofill should only fill stuff the user could fill.


As a text browser user I thank you for adding that label.


Akismet is a paid service and their apis are tailored for comments. An advantage with comments is that you can just mark as spam if some contents have dubious links or keywords.

An issue you have in many forms (e.g.: login form) is that there is limited data to decide if it's a real user or a bot.


Agreed on all points. That's why I said in my original comment: "I know these get used for more than [contact] form submissions, to stop bot sign ups etc…"

I picked up Akismet because it's been around forever, and while it is paid, it is very cheap for my use case.

This is a bit of an aside, but I feel like Automattic is sitting on several companies/products and not doing a whole lot with them.

Akismet could be expanded into a more fully featured server side spam detection SAAS with a flexible API etc...

Gravatar could be expanded into something like OpenID.

Just seems like a waste to me.


Did you also look at the false positives, e.g., how many non-spam content was filtered by Akismet?


Of course. It would be bonkers not to. It just doesn't send a notification if the submission is flagged as spam and puts it in a separate folder. So I have the ability to look at every submission.

I put the system into effect on August 1st. There have not been any false positives. There was even a submission to the form that was clearly a B2B sales pitch, but because it was an actual person submitting the form and not an automated system it went into the "real" entries list (I think this is reasonable. Any business is going to have to field B2B sales solicitations)

I put together a few rows in a spreadsheet of legitimate submissions (with info blocked out): https://imgur.com/a/stxja1Z

Here's an example of one flagged as spam by Akismet that was submitted about an hour ago: https://imgur.com/a/PmN3t80

Overall, removing reCAPTCHA has increased the total amount of submissions to the form, but the amount of submissions actually being seen by a real person who then has to waste time reading it, identifying that it's spam and discarding it has dropped to 0.


I wonder if it would be a decent approach to scale CAPTCHA tests based on how likely an LLM thinks a post is spam.

People who WRITE LIKE THIS might just end up ACCIDENTALLY BEING FILTERED and using click as an imperative is a complete red flag.

I think you could probably filter a number of these out with just regular NLP approaches/models even.


Sounds like you should write an article about this! I've never heard of Akismet so I'd be curious to see more info on your findings.

CAPTCHAs today are horrible. I don't even bother with most webpages that require them at this point. In a similar vain, I don't bother with sites that send me in a never-ending Cloudflare loop just because I lock down my browser to limit what sites can and cannot do. It's particularly tyrannical when I am either sent in a loop or asked to complete a captcha for a non-interactive page.


Some spam is really low effort. Even a non obfuscated text image (e.g. something that can be read by teseract out of the box) still stops a surprising amount of spammers.


The answer is something like "yes, and..." because reCAPTCHA already decides whether and how to challenge the user based on its own internal risk score.


But if a server side only solution seems to work fine, why add a client side element?

I ran into this because I was doing some freelance work on a website that had worked its ass off to cut loading size as small as possible. For reCaptcha to develop its risk score, you have to load it far in advance of a form - you can't lazy load it, they specifically say not to: https://developers.google.com/recaptcha/docs/loading.

It also spawned its own service worker on top of adding like 300kb of page weight. The API documentation is garbage, you have to fuck around with Google Cloud to get API keys now too which is confusing. It also pollutes the global scope of the page. It's all around terrible to work with.


I think that's one of the benefits of using things like Auth0.

They have thousands of companies so their heuristics are really good.


Damn, for some reason when I saw "Try mCaptcha" I was hoping it will show me the captcha right away and I can really try it in action. The saddest part is that on the login screen that follows, entering the wrong passwords for accounts doesn't trigger the captcha! And only then do you find out that there are credentials for some demo account that lets you into the dashboard. Do you think you can finally see the captcha then and try solving it yourself before putting it in front of your users? Not at all.

The idea is great though and I hope they will be able to push back some of the reCaptcha dominance. A lot of good work is needed still of course. My advice for the team: show it better guys. Everybody knows how reCaptcha looks, but not your new mCaptcha, so you must present it in all its glory! And better show it right away.


I had the same idea. If you look at the sitekeys you can click "View deployment".

It's just a "I'm not a robot" checkbox. If you click it, it does some POW stuff in the background and that's it.

It's more of a rate limiter than a CAPTCHA provider like others have said, but given how cheap CAPTCHA-solving services are currently, I don't know if there's much of a difference in practice anyway.


SHA256-based Hashcash seems like a poor choice of PoW for a captcha that's supposed to incur a nontrivial cost for spammers. They can simply employ a SHA256 ASIC to crack the captcha at practically no cost.


I agree, but you literally have no other option: https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypt...

Web cryptography is stuck in the 1990s. PBKDF2 is the only available algorithm, which gives an attacker's GPUs a big advantage over honest users, let alone ASICs.

Maybe a webassembly-based solution, implementing something like Bcrypt or Scrypt/Argon2, is comparable to a browser-native implementation, but that would have to be verified before taking my word for it. These algorithms provide varying amounts of memory-hardness (Bcrypt only 4KB but even that proved surprisingly effective), which causes contention on the GPU memory bus (they're only a bit faster than the CPU's memory bus, making the GPU have only a small advantage, on the order of 5x instead of 100x) and causes larger ASIC die sizes (which the Argon2 paper argues is what causes cost for the attacker).

Source for the latter: https://github.com/P-H-C/phc-winner-argon2/blob/16d3df698db2... section 2.1

> We aim to maximize the cost of password cracking on ASICs. There can be different approaches to measure this cost, but we turn to one of the most popular – the time-area product [4, 16]. [...]

> - The 50-nm DRAM implementation [10] takes 550 mm² per GByte;

> - The Blake2b implementation in the 65-nm process should take about 0.1 mm² (using Blake-512 implementation in [11]);

I understand from this that adding more memory on the ASIC/chip is more expensive than adding more computing power.


Ohhh, actualy mCaptcha seems to be using WASM already: https://mcaptcha.org/docs/api/browser

So I think it should be possible to add another algo?


A better choice would be a memory hard PoW (that's still instantly verifiable), where the performance gap between consumer and custom hardware can be limited to one or two orders of magnitude.


> that's still instantly verifiable

Good point, current verification of password hashes takes as long as generating the hash. I seem to remember that there was a technique to avoid this, but it wasn't usable for passwords or something. Do you happen to have a pointer for what algorithm has this property?


Asymmetric PoW algorithms, such as Cuckoo Cycle [1] or the poorly named Equihash [2] (which is not a hash function) do not lend themselves to password hashing, since a given problem instance can have 0 or 1 or many solutions.

[1] https://github.com/tromp/cuckoo

[2] https://en.wikipedia.org/wiki/Equihash


What if the consumer has almost no free space?


Then you can trade-off processing power within reason. Modern websites are so heavy, it's not unusual to need a gigabyte of memory to use some of the heavier webpages. Using some megabytes (150MB I'd consider an upper bound of where the advantage will have leveled off for the coming years) is not typically impossible, and even 4KB is a lot better than no memory hardness at all.


You do have other options, you can build your PoW on repeated squarings instead.


Not sure I understand. Numbers in JavaScript can't hold infinitely large numbers, so after a few squarings using x*x or Math.pow(x, 2) you're at float max (or at least lost some precision, breaking the proof) and would need to resort to custom code again rather than a hardware-accelerated browser-native operation.


He's talking about repeated squarings in a modular field, like integers mod N where N is the product of two large primes.


Exploring Proof of Work (PoW) as a substitute for CAPTCHAs is an interesting idea (PoW was originally conceived as a spam deterrent, after all), and one that I have considered (and use) in some web properties I manage. Not only does it obviate 'trusted' third parties, but it also has the potential to reduce the risk of accessibility issues often associated with traditional CAPTCHA. It also seems like a solution that scales nicely, as each 'proof' is made by the client and verification is cheap, and like a solution that finally ends the arms race against malicious traffic by bypassing the need to 'prove humanity'.

However, it's one of those solutions that look good on paper, but upon close inspection break down entirely or come with rather substantial tradeoffs. Ignore the environmental discussion about energy consumption for a moment, and let's face the reality that computational power is ridiculously inexpensive.

As a thought exercise, imagine you're trying to use PoW to ward off spammers (or the attack du jour), and you decide that a 1-cent expenditure on computation would be a sufficient deterrent. Let's say that renting a server costs $100/month (a bit on the higher end), or 0.004 cents per second.

So, if you wanted a PoW system that would cost the spammer 1 cent, you'd need to come up with a computational task that takes about 250 seconds, or over 4 minutes, to solve. That kind of latency just isn't practical in real-world applications. And that ignores that 1 cent is probably a ridiculously low price for protecting anything valuable.

Of course, you may consider this as an alternative to regular CAPTCHA services. A quick search gives me that this costs something like $3 for 1000 CAPTCHAs solved, or 0.3 cents per CAPTCHA. This changes the above calculation to about 1 minute of compute, which still seems rather unacceptable considering that you might, e.g., drain your users' battery.

So, overall, while I'd like for something like this to work, it probably only acts as a deterrent against attackers not running a full browser and who also aren't targeting you in particular.


You can scale the difficulty based on how likely an LLM thinks a post is spam or not (just don't only use OpenAI's please)


That makes no difference, you'd have to scale the challenge to many minutes as GP explained, which is not something any user will go through. What's the point of issuing a challenge only spammers will pass?


Note that this was the original intent of proof of work (or very near it) [0].

Should you want to visit a site that has proof of work as a requirement but allows it to be done offline/deferred/transferred, then you've essentially re-invented some of the major aspects of cryptocurrency.

[0] https://en.wikipedia.org/wiki/Proof_of_work#cite_note-DwoNao...


Its not really reinventing if its the original idea. Hashcash came before bitcoin.


Funnily enough I've recently implemented [0] a little proof-of-work generator (or more specifically a time-lock puzzle [1] generator), which is the base building block on top which something like this can be built.

It's a very cool idea imo, you generate a cryptographic puzzle that's cheap for you to make, cheap for you to verify if its solved, and potentially cheap-enough for legitimate users to solve, but expensive-enough that users making too many requests would find solving them prohibitively expensive.

I wish something like this was bolted onto email protocols, it would just cost more to be a spammer than it'd be worth it.

Interesting how mCaptcha seems based on sha256, I don't know enough but it would be worth checking how much the algorithm can be sped up with (already existing) dedicated ASICs, if the attacker can solve the puzzle like 10000x faster than normal users you just can't crank the difficulty of the puzzle high enough and for motivated attackers this becomes basically useless. Basing this of on repeated squarings, like the RSW paper on time-lock puzzles did, seems potentially better.

[0]: https://github.com/fabiospampinato/crypto-puzzle

[1]: https://people.csail.mit.edu/rivest/pubs/RSW96.pdf


Proof of work = waste electricity.

Basically, we are incentivizing people to waste electricity. The proof of work is basically proving you wasted electricity doing something useless.

While I appreciate the goals behind this, I think proof of work is unethical in our current energy situation.


Other captchas also waste (your and captchas' provider) electricity. For example reCaptcha requires tons of resources to track your moves to ensuring you're "not a robot". Sure, the data is also used to serve you ads but resources are still wasted.


But as efficiency improves, the electricity usage goes down.

With proof of work, as efficiency increases, the work increases to keep on wasting the same electricity.

If electricity prices drop, the work increases to waste more electricity to keep the attack expensive enough.


Do you think this needs more electricity than the whole recaptcha cloud? I seriously doubt it. My laptop would waste something like 0.006 watt hours per proof (0.5 seconds at 40 watt). Also, per the default setting, the proof complexity is lowered to almost zero when the server is at normal load.


obviously not because no-one's using it yet - how much energy do you think it would waste if it completely replaced recaptcha?


My guess is that mcaptcha reduces energy consumption by an order of magnitude when compared with recaptcha. But could be wrong of course. It also saves human productivity.


> proof of work is unethical in our current energy situation

Stop generating electricity using coal etc.

Generate electricity only from:

- Solar

- Wind

- Hydro

- Nuclear

And other non-fossil sources.

Using fossil sources for electricity is unethical, full stop. Doesn’t matter if you are using the electricity for PoW, or for baking cookies or for feeding kittens, or what-have-you.

Fossil energy is the problem. Not PoW.


Yes, so as long as coal is going used to generats electricity, wasting electricity is unethical.


I would say they are both a problem.


When the alternative is sacrificing privacy or anonymity, I think it's at least useful, even if not ideal given the current energy situation.


The cost of this kind of proof of work is several orders of magnitude smaller than crypto's, so frankly, it is a drop in an ocean. Do you use your phone on minimum brightness at all times? If not, you are also wasting electricity..


Privacy is worth the cost.


Related:

mCaptcha – Proof of work based, privacy respecting CAPTCHA systemhttps://news.ycombinator.com/item?id=32340305 – Aug 2022 (96 comments)

MCaptcha: FOSS privacy focused captcha system using proof-of-workhttps://news.ycombinator.com/item?id=32340590 – Aug 2022 (5 comments)


Proof of work proves not to work (2004) https://www.cl.cam.ac.uk/~rnc1/proofwork.pdf


TL;DR / TooAnnoyingPdf;Didn'tDownloadAndTryToReadOnAPhoneScreen:

> We analyse [anti-email-spam PoW] both from an economic perspective, “how can we stop it being cost-effective to send spam”, and from a security perspective, “spammers can access insecure end-user machines and will steal processing cycles to solve puzzles”. Both analyses lead to similar values of puzzle difficulty. Unfortunately, real-world data from a large ISP shows that these difficulty levels would mean that significant numbers of senders of legitimate email would be unable to continue their current levels of activity.

So it wouldn't work for mass senders, I think this means in the abstract? Reading into the details, page 6 says:

> We examined logging data from the large UK ISP for those customers who use the outbound “smarthost” (about 50,000 on this particular weekday).

Not sure I agree with the conclusion if this is their premise. This smarthost (an SMTP server sitting on the edge doing $magic, preventing client PCs from directly sending email to any which internet destination) is handling a ton of emails for free. Why should it solve the PoW? The residential client that is really trying to send the email is the one that wants to send the email and should attach the PoW already before sending it on to a relaying server.

I do agree it is probably undesirable to require that honest senders outcompete attackers on CPU power (=electricity =CO2, at least in the immediate future) to get any email delivered


Pdf download warning.


I respect your dislike of direct download links, but what browsers still download PDFs these days?


All of them. Sometimes they open the downloaded PDF in the internal viewer maybe, but PDF is a pretty dangerous format either way. It's getting safer to open them though.


Firefox on Android, for example. I guess they just didn't want to add mobile UI to pdf.js?


Firefox on Android supports PDF viewing (using pdf.js) as of version 111: https://www.mozilla.org/en-US/firefox/android/111.0/releasen...


Im on brave mobile and it downloaded the pdf automatically when I clicked the link.


Where is the code? I couldn't find a link.

edit: https://github.com/mCaptcha/mCaptcha


For the antiquarians on the site:

http://www.hashcash.org/


> Try mCaptcha without joining

> user: aaronsw password: password

mixed feelings about this.


Not mixed feelings here: I wish they wouldn't do that. No doubt it was meant as respectful homage, but the effect is to flatten and obscure his particular qualities, because they have no specific connection to this product.


What’s wrong? It’s a demo.


Note, there are credentials for a test account, to try it without signing up.

This is listed on the sign in page [1], just not very visible.

> user: aaronsw password: password

[1]: https://demo.mcaptcha.org/login


Logging in shows a dashboard asking you to implement it on your site.

That's not what I expect from a "demo" of a captcha.

A recommendation would be; please use mcaptcha on the mcaptcha account sign up page at the least. It'd then provide an instant user UX demo which is one of the major pains of captchas.

Or at least provide a link to the widget[1] for UX demo, where inspecting networks calls also shows the api calls in action

[1] https://demo.mcaptcha.org/widget/?sitekey=pHy0AktWyOKuxZDzFf...


"Account not found" :<


It says 'Account not found'


This makes some sense for straight-up DDoS attacks.

For bots/spam my intuition is this could not work. Computation is extremely cheap, and your average legit user is going to have a significantly lower threshold than your spammer.


Funny coincidence I made the similar thing into a webcomponent a few weeks ago for a different purpose: click-to-reveal buttons to prevent scraping of public emails in static websites. It works by encrypting the content using the same deriveKey method varying the iterations to determine time 'cost'.

Imo it's not really fit for most captchas situations since you can easily parallelize the execution by simply running multiple tabs/browsers or even hooking crypto/subtle up to an GPU/ASIC with a bit of hackery.


So make a Captcha with generative AI itself. There is no training data if stable diffusion just created a CAPTCHA which has only been seen a few hundred times across the world.


I could see my org adopt this if there was an out of the box way to add this to WordPress.


What is a good captcha for gdpr safe deployment that doesn't require a cookie banner?

Edit: Open source would be great.



Also similar wehatecaptchas.com




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: