This is probably good to avoid general low-hanging-fruit spam, but anyone who spends even a second manually configuring something to spam your exact site will be able to get past this with extreme ease.
So for most sites, it'd probably be a good idea to put things like this in place but then to also be ready to roll out reCAPTCHA at any given moment, in the event a serious spam campaign begins.
Yes, it'd be easy to get around this for a single site.
But the people writing bots really do not want to have to tweak their bot for any single site. And most of the people using bots cannot write their own.
I've been using a very simple negative captcha for many years with great success. I just don't get bot spam, and before I put it in place I got lots of bot spam.
There are some gotchas, though -- the main one that this project doesn't seem to know about are form-fillers, like Google Chrome, RoboForm, etc. -- i.e., bots that you want to be able to use your form.
My first version of a negative captcha tried to be sneaky -- I called the field "name" or something like that, and called the real name field "name2". This was a disaster; the form-fillers all put values into the name field, and suddenly tons of users (especially Google Chrome users) were unable to submit comments to me... and so quite probably a lot of them were unable to find a way to even tell me it was broken. Problems like this are a very good reminder to always use helpful, kind error messages even when you think you've just caught a spammer or some other nasty. You may have caught an actual customer.
My current version names the field something obviously NOT a standard field name, and gives it a label which is also not a standard field name (this was important), and I even clear the field with JavaScript on form submission just in case. The name of the field is hard-coded, as are all of the other field names on the page.
I have been ready to roll out more clever versions, but the years go by, and there's still no need yet.
Unless you have a site that's a big target for bot spammers, I highly recommend you just roll your own, and leave it extremely simple unless the spam returns (someday it will, presumably, but it has been at least 6-7 years and I'm still on the so-simple-it's-silly version).
This library could work with some tweaks and simplification, but currently I'd worry about form-fillers in real users' browsers. It hashes all of the real field names and makes the honeypot fields look valid. So a user who is accustomed to having their name/address filled in will first find that function is broken -- next they'll find themselves accused of being a bot when they submit the form (after tediously filling it out by hand).
I have a simple mechanism like this in the comment fields of a few of my sites. And it shoots down 99.9% of mechanical spams. Yet, although my sites are not that popular, I have a few tens of spam attempts per day, and once in a while they succeed to come through.
What's interesting is that they do seem to monitor the result. If I leave the succeeded spam comments for a while, the attacker adapts their strategy to follow the succeeded pattern. (Usually the initial attempts are just a neutral comment, irrelevant to my contents but doesn't contain any URLs or product names. Once it penetrates, they start spamming with URLs.)
I'm not sure if there're human behind this, monitoring bots activities, or bots are actually sophisticated enough to adapt.
Strategies that work for small sites are really different than what you need for large sites, as you say. Everybody seems to look at these things with the mindset of "will this work for Google?" when it's not at all necessary.
For a long time, I protected my blog comments with a simple "captcha" that was just a field labeled: "Enter the word elbow." The word wasn't even dynamic, it was just "elbow" every time. This still worked to filter out nearly all spam for years, because my site wasn't popular enough to merit any specific attention.
It depends on the kind of spam you're fighting against.
I speak from the perspective of someone who often creates websites that various groups of people, for whatever reason, would very much like to cause chaos on. I use "spam" to mean both "advertising spam" as well as "distributed flooding" (sometimes in the form of so-called "shitposting", and/or just random text and images).
Heavy spamming can be an effective form of DoSing if it's not limited or controlled well enough. There are many people out there who take great personal pleasure in disrupting or reducing the quality of a service.
If you want to protect against typical pharma spam, this will be good. If you want to protect against the kind of thing I described above, then only a service like Cloudflare can help, and even then it can only help you so much.
(It's not easy to bypass Cloudflare from a straight bandwidth DDoS perspective, but it's not too hard if you just want to get through its anti-bot filtering to post spam. In such a case you can enable reCAPTCHA from the Cloudflare security panel, though.)
I should add -- I'm protecting comment forms and sign-up forms, which certainly do get bot spam, but are lower-value targets than forums, etc. where the spammer's content can actually be posted to a wider audience.
Protecting forums is particularly hard because there are often real people involved in spamming them, sometimes with the aid of bots but not always. Some of them will fill in captchas, register user accounts, and post spam links along with half-assed original comments. Negative captchas are obviously useless against these; I've managed with having logs of moderators, but it's still unpleasant.
I tried to be pretty explicit about that in the README. However, the project is now 5 years old and I have yet to hear of a single case where this approach has failed to weed out the vast majority of spam submissions.
I'd urge real caution in using a honeypot field with a meaningful name. Various crappy browser plugins will helpfully auto-fill a field named "name" even if it's hidden. Also consider how the field will look to screen readers.
<div style="position: absolute; left: -2000px;"> will definitely cause problems with screen readers.
These problems can be mitigated to some extent if you have multiple tests and allow clients to fail some of them. That's the approach I used in my latest project. There are approx. 10 tests. Some use CSS and JS. Some use random tokens and hashes. Some are based on assumptions about the target demographic, such as their location, preferred language, and browser capabilities. (e.g. If you're accessing a mobile site with an IE7 user agent, something's fishy.)
Fail any one of them, and your submission is accepted but flagged. Fail two, and your submission is rejected. Fail two X times in Y minutes, and your IP is banned for Z hours. (The fail2ban approach is useful when someone tries to brute-force your hashed field names.)
Occasionally, of course, you're going to run into a legitimate client who fails two or more tests simultaneously, e.g. someone who uses a screen reader with a crappy autofill plugin. But those cases should be exceedingly rare. In any case the rules can be easily adjusted.
That sounds too complicated and fragile for me to use.
Really, I was just getting at that you should probably label the field as "Do not type anything in this box" so users with assistive devices don't get trapped. I'd also provide a useful, non-cryptic error message for if the captcha fails.
That, and a lot of bots are created with point-and-click builders, such as ZennoPoster or uBot Studio; while training the bot to do a specific task, these negative captchas would be easily missed.
I've been using this technique with great success for a while now, and still catch hundreds of fake registrations a day[1]. The few accounts that slip through are fairly easy to detect algorithmically, and it seems that they're mostly created by real people, usually in China or Russia, presumably for a small fee.
* Which anti-captcha did you use? How does it work?
* You only showed 1 day after the change. Was the drop in spam account registration permanent? I would expect it to immediately bounce back the next day once the adversary figured out what you changed.
The graph shows all the attempted spammer registrations I caught over the last month, the count of '1' for today is because the clock had just ticked over when I generated the image.
I implemented this years ago for kink.com (nsfw). It really helped quite a bit with the automated logins. It also shows a captcha after a number of unsuccessful logins or if someone is clearly sharing an account. We called it the 'cockblocker'. Heh.
Porn sites are pray of bruteforcers, who use thousands of proxies to scan for working accounts. I know people who do this and I can say that no negative or usual CAPTCHA stops them. What you would need is a lazy loaded CSRF field, which breaks their bruteforcing applications.
We just showed a captcha after enough failures. If we saw a massive spike of failures, we'd just add their ip to our firewall and disappear of the net for a period of time. This stops 99.99% of the attacks. For the rest of them, the feeling was that if they wanted free porn badly enough, then it was free marketing for the company.
I've understood that after the page shows CAPTCHA, the bot either tries to solve it with OCR or then it just rotates to the next proxy. By rotating trough lets say 7000 proxies, by the time it has gone trough all of them the first ones do not display CAPTCHA challenge anymore.
One funny thing I remember of being told is that if the main site seems to hard to config, they just change to the mobile version of the site, which usually has less security.
For them it really is free porn besides of hobby, as I've seen logfiles of hundreds of subscribed accounts on them. Some people steal the credit card information attached to them while some just preserve a giant libary of porn accounts on demand.
When I noted this problem I coded a service which would have prevented all these attacks made by the tools they used. I contacted various of porn and filehosting sites, but none never replied to me. It's a pity that the site owners either do not care or can't address the problem.
In the case of kink, we changed things up quite a bit. An account is meaningless until you add a subscription or kinks (the micro-currency we developed) to it and also your account never goes away. This is different than 99.99% of the sites out there which generally tie an account to a subscription, when the subscription is up, so is the account. Those sites generally just add/remove lines from a simple .htpasswd file and it is those files which usually get stolen from badly configured servers and sent around the forums. If someone shares an account on kink (different ip's/browser agents over a period of time), they are all automatically cockblocked... by changing the users password, thus making them do the password recovery dance.
It is no surprise nobody contacted you back... most sites are either run by people who have no clue about tech (and thus wouldn't be able to do anything with you) or they are smart and don't trust some random person emailing them saying they can fix their security issues.
I host a site to stop forum/blog spam but I suspect there would be value in a general-purpose login-test. Of course with a botnet the owner could configure each node to login only twice, as they do with SSH dictionary-attacks.
I am afraid that password managers like 1Password will get caught in this. I always use it not only to remember my password, but also to fill in sign up forms with my data. I am sure it would fall for the honeypot and it would be very hard for me to find out what went wrong. I am sure the experience would be incredibly frustrating.
How effective are these honeypot form fields? There are only so many ways to make a form field invisible. Simple inspection of the DOM and CSS ought to be enough to determine whether the field is visible and nearby the other fields.
If you actually rtfm, you'd realize that the purpose of the honeypot textfields is to create them so that they look like they are visible, to bots, but when rendered on a page they would be invisible to a human (for example: textfield may be under another element, positioned off the viewport, or background set to match the page background). Building a bot to detect all those would be incredibly difficulty. To start, the bot itself would have to actually parse the CSS/HTML/JS and render the page, and then run heuristics to figure out which fields should be ignored.
I realize you only just threw this together, but this only detects the CSS display attribute. The visibility and opacity rules are also easily detected, and a large negative left/margin value could be detected with a small amount of code. However, detection becomes more difficult when you start to consider other techniques such as z-index. It's not as simple as you seem to be implying.
I only imply it might be simple. I'll reserve judgement until a good counterexample is demonstrated. z-index sounds good!
My intention was to pose a challenge to prove that honeypot fields do work, and to distinguish between good and bad implementations (e.g., {display: none}).
been doing this for years, the way we handle it is to hide the element via javascript. then most screen readers won't "see" it, but generic form bots aren't that smart.
Adding a <label> saying "Do not fill this out, antispam filter" should probably be enough to mitigate that. (Disclaimer: I am lucky enough not to have to use screen readers etc.)
I think that it would be easily fixable. Every time you disable an account you send an email to the email address they registered letting them now about it with a link to reactivate it. For the reactivation, you can even add a normal catpcha to prevent the bots exploiting the loophole.
I just label my fields appropriately, and put a "skip to form" link (like a "skip to content" link). Other options exist like using media detection, or JavaScript (very few bots run JavaScript on pages. It slows down the process, and requires more smarts than a generic html parser)
No negative captcha will stand up to a targeted attack, but if you're not running a huge site, targeted attacks are very unlikely.
This is a form of security through obscurity. A bot could easily (relative to a positive captcha bot) be created to check if the form fields are visible.
I see this method as similar to changing the ssh port to something other than 22 on a server. Sure, an attacker would be able to discover your custom port if they tried, however that requires significantly more work than performing a dictionary attack on the defaults.
Totally agree. Bots vs humans is not an issue of security at all in a cryptographic sense (which that phrase refers to). For this particular task all we have are tricks that have practical value. It isn't even clear if the problem is meaningful in an absolute sense, while cryptographic protocols can be clearly defined and reasoned about.
Would it even be possible to solve this problem in a serious way? If you could then would that mean strong AI is not possible? If not then why don't we figure out something better like asking users to actually pay for things and then we don't have to solve these philosophical quandaries. If it's too hard for people to pay for things then lets focus on that problem instead. If you don't want money and just want to rate limit then look into proof of work puzzles.
CAPTCHA is broken and, at least in my experience, does more to harm legitimate users than inconvenience bot makers. You can farm out CAPTCHA solving to China for less than $1 per thousand solved.
Unless you hide the field via Javascript. In which case the bot implementation would certainly become a lot more complex and I imagine this technique working well. (Though I haven't used it myself)
I wrote something like this as a module for Drupal a few years ago, and it still works very well for most sites: https://drupal.org/project/honeypot
The honeypot (or honey trap) is somewhat effective against most bot software, and the timestamp protection is actually pretty effective against most human spammers. However, there are many sites that will require active spam prevention due to their popularity, and thus targetability.
Also, regarding accessibility, there are many ways of implementing these spam prevention techniques without impacting accessibility, even without using JS.
I feel like having this knowledge be widespread is somewhat self-defeating, as if it were to catch on to any capacity bot makers would improve their bot's ability detect which is a "valid" form.
I do this all the time when installing off-the-shelf forum/blog software.
I'll modify the form to have a field not in the original. Bot authors assume the forum is stock and don't customize it for mine, and that's enough to block almost all spam.
This feels very hacky and not like a long term solution to the problem. With some effort on the part of spammers they could write bots that simply read the field labels or placeholder text, check css attributes, etc. Then you have the problem again and your code is bastardised all to heck for your effort.
Just use positive captcha for now until an actual solution comes along but don't start screwing around with the output of your markup. That will cause problems when you want to change or even just run tests on things.
You're dismissing it too quickly. I've used this method on my stock contact forms and it has worked 100%. Nobody, nobody bothers writing custom bots for your thing, so, no matter which method you use, if it's not something common, it will stop 100% of spammers, if you're a low-profit site for them.
If you're Craigslist, sure, it won't work, but you probably aren't. All my clients' contact forms thank me for using (something like) this method.
a = "cat"
b = "dog"
c = a + integer_to_string(234234) + b
then you have to submit the c value with the form. i get no spam. (a prior somewhat simpler did get some spam). obviously this would do nothing if i was google, but i'm a small blog and no one is going to customize a bot for me, and apparently none of the bots want to run a full js interpreter.
there are lots of simple solutions to spam as long as no one cares about you. and if they do care it's hard.
i do think making you do some math (maybe something serious that takes a quarter second to run) to be able to submit comments is a good approach in general though. basically humans have lots of spare CPU, but i'm not sure spam bots do currently. and if posting in a lot of places required running some math, it would rate limit spam bots.
I've been working on something which will automatically hash the name fields in a form and generating all my forms with javascript. Haven't considered adding hidden fields but that might be a good idea. And probably recaptcha when all that inevitably fails.
Unfortunately I think spam detection is always going to be a moving target. A lot of these techniques depend on spammers not wanting to go through the trouble of tweaking something for your particular site, but if your site turns out to be popular or if they have that sixth can of Red Bull and decide to get clever there's not much you can do.
This is basically extending the encoded input field trick with another (CSS)hidden field.
You can get rid of the (CSS)hidden field by using just the encoded field names instead. That will prevent someone from just copying the HTML field and mass submitting the same form with multiple IPs etc...
E.G. Encode all the input field names using the session ID and some salt (maybe the URL of the page?). I've done something similar in PHP previously as :
public function encodedFieldName( $name ) {
return hash( 'ripemd160', $name . FIELD_KEY . $this->IP() );
}
...Where FIELD_KEY is a pre-defined random string unique to the application or you can set it to the user's session/cookie etc... And IP() is, well, just getting the IP.
You can then retrieve the actual field name using something like...
public function encodedFieldValue( $name, $fields = array() ) {
$enc = $this->encodedFieldName( $name );
foreach ( $fields as $k => $v ) {
if ( $k === $enc ) {
return $v;
}
}
return NULL;
}
There's some work published by some friends of mine called "Botnet Resistant Coding" where they combine several techniques to defeat the average script kiddie with a bot and force them to target someone else out of laziness and ignorance. This seems very similar, but if your find their paper you can incorporate more advanced techniques.
I think everyone is aware that we aren't going to be able to completely eradicate spam. As bots become more intelligent our defence against them must improve as well.
I like that this is a non-intrusive technique that seems to have met with some success. It doesn't matter that the method is not perfect, nor that it is possible for bots to engineer their way past. Innovation in the bot detection space is the only way to keep up with the spammers.
I would be more interested to find out if the method has any impact on accessibility, for example if screen readers are unable to use these forms. I would guess that anything designed to be visible to bots will also be visible to screen readers.
1. Browser auto-completion of forms will break, because that typically uses the form name to identify the data that's expected, and this plugin hashes the form name.
2. The page would not be cacheable, because the hash key changes for each request.
3. The question of form-labels isn't addressed in the plugin. Perhaps the "real fields" do have labels - at least they should - but the negative-captcha fields don't? That would give a clear signal on fields to avoid. If they were to have labels, what label should be used? A dictionary word? That would have usability/accessibility implications.
The hidden form field has worked very well for us. We just gave it a likely-sounding name like "address2" and set visibility:hidden on the site's main style sheet. If the form has a value, we discard the entire form data.
Spam submissions dropped quite a bit when we implemented this.
The knock against this is that it's easily circumvented. True! But the value is that it greatly reduces the "script kiddie" background noise. Not only is that just nicer overall, but it also makes it easier to tell when someone is purposefully targeting our site.
I once implemented a registration page that asked for year of high-school graduation (it was relevant to the site, and wasn't mandatory). After a couple of days, someone noticed that there were a bunch of registrations with something other than a year in that field (usually the name again), and I was told to check out if the form was messing up. It turned out that my form worked, but those were all spam-bots. It turned out to be a fairly effective way to filter for spammers.
Negative Captchas are only good against bots that hit many different sites and don't specialize in yours. Which probably means they are fine for when you are starting out.
However! If your site relies on people not creating lots of fake accounts and spamming the site, you may want to require verification of a cellphone number. After all, cellphone numbers are expensive to obtain, right?
Or are they? I heard Craigslist tried this and people somehow defeated it?
I worked on something like this 4 years ago for a site I build called http://www.mftranslator.com. The concept that my buddy and I created was a Nomadic Post. I should post the spec for it online, but we essentially removed all bot spam from posting on our site without having accounts or captchas and it never interrupted user flow.
This is a cool idea but I am skeptical of how effective it is long-term. If I am understanding the design correctly this method comparable to simply changing the name attributes on input tags. It'll take awhile for spammers to catch on though I believe this negative captcha's effectiveness will decrease dramatically with adoption.
I have the completely unscientific idea that these kinds of negative captchas have a property that's the exact opposite of security code: it works best when you code it yourself.
Someone can code a bot that understands the exact tricks that this gem performs. Make up different but similar tricks, and don't publish them.
I thought that ts was going to be a security feature in that I have a piece of software on my computer which I use to challenge a site to make sure that it is what it says it is, or that the customer service rep I'm talking to is a human...
Is anybody else having trouble opening this link ?
(i am trying from bangalore, india, behind an enterprise nat.
i am unable to open any github or facebook.com page,
the page loads partially & never completely finishes loading)
I did some experimentation with this sort of technique a few years ago. At first I was amazed at how naive bots actually are, but it makes sense: they're not going to spend the extra effort dealing with edge cases like this when they can just widen their net by hitting more sites. I don't know if things have changed but a few years ago this would have caught most bots.
By the same obscurity-based argument, it's probably easier to just roll your own Captcha than installing/integrating/depending on a third-party library.
Exactly. This isn't an end all be all, but it does up the security on your site.
It's like when you lock your bike, a focused attacker can open any lock with the right tools, but that shouldn't stop you from locking and using secondary deterrents like anti-theft skewers, or my favorite, simply locking it next to a nicer bike with a worse lock.
That's what I thought, until somebody reverse engineered the registration mechanism on my personal site which runs software I've written. I ended up having to turn registration off as I didn't have time to fix it properly.
Are you sure that somebody reverse-engineered it, or is it possible that someone's just got a set of rules generic enough to work on your site? From personal experience (with getting spammed), I suspect it's the latter.
It's always possible. I no longer remember exactly how it worked and why that made me think it had to be reverse engineered. This was many years ago now.
So for most sites, it'd probably be a good idea to put things like this in place but then to also be ready to roll out reCAPTCHA at any given moment, in the event a serious spam campaign begins.