To be clear, they didn't compare this to the naive solution. Which is just run recaptcha.render and boom, it issues you a token.
The problem they're solving with RL isn't the "click the tiles with the stop sign" its the "click the checkbox to prove you're a human". The token score is mainly derived from your env (medium impact on score), google cookies (high impact on score), and IP quality (high impact on score). Mouse movement is barely factored in at all, and can be ignored for botting purposes.
So for now, there's no added value here over the status-quo real-world solution. That said, for future systems which use more behavioral analysis, this research might be helpful.
If I'm being honest, I was being charitable to the paper in the spirit of HN guidelines. Figure 1 and Figure 2 clearly shows interaction with reCAPTCHA v2. The language of the paper also evokes reCAPTCHA v2 and suggests that those figures were not just for reader enrichment:
> Abstract: We present a Reinforcement Learning (RL) methodology to bypass Google reCAPTCHA v3. We formulate the problem as a grid world where the agent learns how to move the mouse and click on the reCAPTCHA button to receive a high score.
> 2.2 Settings: To pass the reCAPTCHA test, a human user will move his mouse starting from an initial position, perform a sequence of steps until reaching the reCAPTCHA check-box and clicking on it.
So I was responding for reCAPTCHA v2 where you'd call:
But like honestly there are so many methods not discussed in this paper that sort of invalidate the conclusion. They don't "contradict" it, they just dont validate it. One critical one is how they chose their sitekey(s).
Because when a sitekey is first created, it gives 0.9/0.7 scores very often. Then over time, it adapts to the "normal" traffic for that sitekey. If they used a sitekey from an actual site with real traffic (bots and human), then they would need cooperation for that sites recap admin panel. Which they didn't document, so they probably made a fresh sitekey.
> . Mouse movement is barely factored in at all, and can be ignored for botting purposes.
Partially because they also have to do something about low-cost human clickers being hired to complete captchas in India etc. So, besides checking google reputation and the other forms of reputation you've mentioned, Google gets free mturk if the click farms manage to bypass these reputation checks.
I skimmed through the article twice, trying to figure out how are they using RL to detect fire hydrants, crossroads, bikes, etc. Thanks for explaining it. It's been a while for me since clicking "I am not a robot" resulted in a pass and not a captcha challenge, so I forgot it's actually an option.
One thing we did for the Wells Fargo captcha for example is to pick up the audio version that's there for accessibility and parse that instead. Which makes me wonder now. These more hardcore captchas like Google's where you have to mark buses and traffic lights. How do they get around accessibility legislation? Whichever large service is relying on them can presumably get sued for a lot of money because some of them are very difficult even for a non-impaired person.
I'm as critical of ReCAPTCHA as the next HN user, but their audio version is actually reasonable compared to the alternatives out there. It plays sensible and seemingly real audio (I'm assuming random extracts from YouTube videos), generally meaningful and audible, instead of random words with silly distortions.
reCAPTCHA allows for audio challenges as accessibility fallback. And this has been a loophole for a long time to automatically solve them via google's own speech-to-text API https://github.com/dessant/buster
No data on the actual model, they just gave the mathematical formulation of reinforcement learning (which can be simply copied from an undergraduate textbook). The Reinforce policy gradient algorithm still has to operate on a differentiable model.
This last week I had a "click here to confirm you are not a bot" fail for the first time..
I have no idea why it would be the case but since I have a static IP and don't do anything like botting websites, so I have to assume the algorithm isn't holding up as well as it used to.
or just pay another human to solve it for you, there are lots of companies in developing countries for that. Its very cheap with easy APIs to integrate.
The problem they're solving with RL isn't the "click the tiles with the stop sign" its the "click the checkbox to prove you're a human". The token score is mainly derived from your env (medium impact on score), google cookies (high impact on score), and IP quality (high impact on score). Mouse movement is barely factored in at all, and can be ignored for botting purposes.
So for now, there's no added value here over the status-quo real-world solution. That said, for future systems which use more behavioral analysis, this research might be helpful.