Hacker News new | past | comments | ask | show | jobs | submit login

I was in the business of fighting web spam for over 5 years (Defensio) and while these techniques help, they're not the definitive answer.

Spam bots are now extremely sophisticated and have been able to execute Javascript and "read" and understand web pages for many years. They'll also post bogus comments that are somewhat related to your article but sneak in a fishy URL in there. We had many false-positive reports that were actually real spam. It's just really hard to detect by a human. Of course, JavaScript-based technique will eliminate some easy to catch spam, but nothing a 3rd party service couldn't catch.

Another huge problem is that people are paid next to nothing in China and India to manually spam websites and break captchas. The number of human spammer keeps increasing. When I left last year, it was becoming a huge problem. Definitely the biggest headache for us in ~5 years.

In my experience, the best protection against web spam is still Akismet/Mollom/Defensio. And for the record, I know we didn't like when people used other mechanism to stop some spam before it got to us because we didn't get to see the full corpus, which was invaluable to us in helping all our users fight spam.




I think the kind of defense you need to use depends on what kind of website you have.

Based on my experience if you have a small/medium website you won't find bots that execute javascript, understand a web page or use human spammers.

Those are reserved for the big ones, for all the others is mostly general-purpose bots that try every form they can find on the internet. Where speed is most important than accuracy spammer won't use the "Heavy" bots.


Actually, the sophisticated bots typically target platforms, not websites. So if your website runs Wordpress, you're much more likely to be spammed hard than if you custom-built a comment form.


It depends. I can tell you from experience that even a medium-sized website that ranks well on (legitimate) pharmaceutical terms is a huge target for spammers.


I wonder if eventually people will just stop allowing hyperlinks in comments altogether. It would, at a stroke, eliminate the biggest incentive for spam.

Yes, it's nice (I guess) when someone's name is a link to their personal website or they can post the URL of a relevant article in the comments, but it's not like commenting ceases to be valuable without those features.


It actually came pretty close in ~2007. We were working on something else spam-related and when we noticed that big bloggers were fed up with existing anti-spam solutions (false negatives/positives) and were about to just remove commenting altogether, we realized that it was a huge problem without a good solution, so we knew we had to do something about it.


I thought that would happen as more people added rel=nofollow to links in comments... hasn't happened yet though.


A real spammer will take any link, it doesn't matter if the link won't be considered as some form of endorsement by search engines due to the use of the rel=nofollow attribute. A spammer will happily post a million links, there will be some poor souls out there and click on some of them. Quantity over quality has always been one characteristic trait of spam.

Spam and link spam were already there before Google existed and the PageRank was invented. The index of the AltaVista search engine was huge and full of spam.

When the nofollow value for the rel attribute was introduced there were many claims that this would reduce the amount of link and comment spam. Critical remarks came often from people who were offering link building and SEO as a service.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: