Hacker News new | past | comments | ask | show | jobs | submit login
The earthquake that killed Twitter? Spam makes the utility useless (seaicethoughts.blogspot.com)
66 points by seaicethoughts on June 13, 2011 | hide | past | favorite | 43 comments



Twitter's inability to deal with spam accounts is quite bewildering. Tweeting the same link over and over again? Spam. Follow hundreds and no one follows back? Spam. A high percentage of blocks/spam reports? Spam. Every tweet contains a link? Spam. Does anyone I follow, follow this person? Perhaps not spam but not a good start either


I'm not sure the heuristics are that simple.

The signal-to-noise ratio in Twitter is already low as it is. Most tweets contain links, lots of users are followers but are not followed.

Sure, combining several clever heuristics can reduce spam levels, and no doubt Twitter must attack this aspect if it wants to stay relevant.

But can you put your finger on what exactly constitutes spam in a platform that is so noisy?


It would require more than 30secs thought but I'm sure there are some heuristics which will work. I'm not even convinced the "report spam" button is connected to any action on twitter's server. And why not use blocking as an indicator? Surely if Person A @-replies several people, and they block A, then just disable @-replies from A.

Has anyone tried using a Bayesian filter for twitter spam? These have been very successful for email. In fact, I consider email spam a solved problem now (thanks Gmail!)

It's for things like this that I wish I could insert a proxy between my twitter clients and twitter itself and build my own rules/spam filter.


Sounds like you would prefer a communications medium not running a proprietary protocol controlled by a single company whose business model relies on their ability flake sure you cannot block unwanted content.


I cannot think of a single non-spam case where somebody would @-tweet the same link to a hundred people. That won't capture all spam, but it's a pretty easy low-pass filter.


True. @-tweeting the same message is indeed low hanging fruit. But spam gets sophisticated as the arms race continues.

My assertion is that after a certain point (which we are not far from), Twitter as a platform will have a problem making a distinction between "spam" and "legitimate content".


I think there's definitely potential for an arms-race here (just like there was with email spam) but I don't think it's a reason for twitter to not enter the battle at all.


Also, the @-tweeting is easy to ignore. Yes, you have to check who is replying to you a few more times than you would have, but it's not the end of Twitter.

Where you can see the real problem is when there are trending hashtags that spammers get a hold of... spammers grab hold onto a hashtag and keep it artifically trending long past its relevance.


If not permanent ban, but ban for some time, lets say 5 hours would more helpful IMO.


It won't do a perfect job, and their will be false positives, but the point is, unlike with spam in email - because of their asymmetrical following model, false positives are not nearly as much of a problem. Twitter should absolutely do this.

Twitter could easily detect spammers - by their follow history, reputation score (eg a google-rank link like SVD of the follow graph), or simple ratios like follows vs followers. It will inevitably false flag some people but that's not nearly so much of a problem.

That is, they don't have to outright ban these spammers, all they have to do is block these people from search results available for #hashtags or @replies. In the case of genuine people false flagged as spammers, they can still get their message out to genuine followers. It's not too much of a loss to be banned from search results and @reply messaging until you build your karma level a bit.

You're absolutely right twitter should fix this and do it fast. It's a goddamn shame seeing a potentially very useful public resource - like adhoc organising around #hashtags - being destroyed by the actions of a few.

tl;dr Twitter can starve out the spammers just by blocking search results.


You're totally on the right track.

It's not an all-or-nothing thing. Excluding suspect accounts from @-replies and hashtag search results is a nice low-key way to keep the majority of conversations going. It's also subtle enough that the spammer might not realise that no one is listening.


So why isn't Twitter doing this? Don't they care?


If they do, they're doing an admirable job of keeping it a secret.


There are services that rank twitter users based on interaction. Klout comes to mind. If 3rd party services can measure quality, surely twitter could as well. Granted, it's only one metric, but it's a good place to start.

--edited to remove quotes around the word rank. Not sure why I put them there.


The other issue is gaming Twitter. Anyone who has done lots of Twitter searches can see that several publications have lots of phantom accounts that simultaneously tweet (and that's tweet, not RT) their links to ensure the widest exposure. Porn spam is just one of their worries. The entire system is set up for such corruption.


To be fair, I've seen plenty of spammer accounts that have thousands of followers.

That said: I agree, banning accounts that tweet nothing but the same link would be a really easy step towards reducing the amount of spam. At the same time - maybe Twitter don't want to auto-ban, as this is a quick and easy way of finding spammers. Tricky :-/


>>>To be fair, I've seen plenty of spammer accounts that have thousands of followers.

Lots of accounts are set up just to Follow back anyone who Follows them. I've understood for a long time that Follower counts on Twitter are meaningless.


Not entirely meaningless - I think the important number is the ratio of followers to followees. If somebody's got 1000 followers, but they also follow 1000 people, most likely those are all followbacks, and not indicative of genuine interest. But if somebody's got 1000 followers and and only follow 100 people, they're more likely to be worth your time.


Follower counts as in, "Look how big my dick is."


There’s an alternative to banning outright which is tar-pitting.


At least one of those isn't true, my "work" Twitter account is mainly links and a short comment from me for example. Each of your criteria only affects the probability than an account is a spammer. E.g. you could easily organize a campaign of "spam report" on a hated celebrity, politician, whoever and get them booted.


Your work twitter account tweets the same link over and over again? That's pretty odd and frankly doesn't sound like a good use of twitter.

But @-replying or tweeting the same link to different hashtags? Spam, spam, spam!


No, of course not! But one of your criteria was a link in every tweet.


I want to bias my Google SERPs, Twitter search etc. with my social graphs - linkedin, twitter, facebook - like I do offline.

Sometimes I may still want to allow anonymous (to me) signals to influence what I see, but generally I only want to see content/recommendations that people 2 or 3 degrees of separation from me have given some positive signal for.

Too much of my day is filtering noise.


We just need a knob. :) -I'm going to dial the internet back to 4 for awhile.

Our online personas treat us like those stupid parties where you have to wear a "Hello, my name is:" sticker. IMHO, every good service is a fragment of your life, and doesn't try to be the entirety of it. That's how the offline world works, anyway.


*

Until twitter sorts out this themselves, there is a need for something like a browser add-on that hides the following from hashtag searches: new accounts, Retweets, Accounts with less than 10 followers, Accounts that often tweet trending topics, Tweets with more than one trending hashtag


I noticed the same thing last year when I was stranded in Barcelona during the Icelandic volcanic eruption. Then, people were looking for ridesharing, free rooms, news, etc. under the hashtag #ashtag. Not as severe as an earthquake, but a similar mechanism blocking twitter's utility.

Then the problem wasn't spam, but #ashtag retweet avalanches from some well-followed celebrities. The most egregious example I remember: this guy Paulo Coelho in Brazil, retweeted through pages and pages of search results (the source: https://twitter.com/#!/paulocoelho/status/12399786645). I'm pretty sure Justin Bieber said something too, so that was it for #ashtag.

This would probably have been easier to deal with than spam, because people weren't actively trying to game them. Twitter just needed to aggregate some information and make it blockable, e.g. "don't show me (re-)tweets with this text anymore". As a sometimes-user, I still don't see a straightforward way to do something like this in the clients I use. So now the spam angle is not a surprise at all.


I just add "-RT" to my twitter searches.


Wouldn't the spammers be mostly new accounts, as the old ones would be blocked already.

So I propose the heuristic: new account+trending topic => spam


Professional spammers will just keep a 'stock' of created-but-unused accounts and let them 'mature' before using them for spam.


True - I guess no side can win this game in the long run :-(


Things have gotten much worse on this. Years ago, I never got spam @ replies on Twitter. Now just about anything I say triggers some shit about a free iPad or ebook being messaged to me. I guess the target has gotten large enough that the spammers have really gotten onboard heavily.

I wasn't in the very first batch of people using Twitter (sxsw 2006), but I was on there shortly after any now I barely care about it anymore. Its an annoyance. A tool. Worse than email with few real benefits. At this point its just back to texting the 100 or so people who I really want to stay in touch with instead of tweeting at them like I was doing for a few years.


The problem is that Twitter has a little "list of topics that will be most profitable to spam" list on the side. It's not surprising, then, that those topics get a lot of spam.


I'm hearing a lot of simple, interesting solutions for the spam problem. But Twitter has to be doing things like this, don't they?

The main issue is that any of these methods will hurt marginal users. If you fall into the false-positive pool, you likely don't use Twitter much, and may drop off entirely if your account gets flagged. Twitter can't afford to lose those users, or it will seriously hinder their reach. How valuable is Twitter is it contains a bunch of super users Tweeting at each other?

(this all beside the point that any of these methods will kill their "Tweets per Day" and "Total Users" metrics)


@wordsontheweb: "Credit where credit's due, @twitter appears to have cleaned out the #eqnz spam" http://twitter.com/#!/Wordsontheweb/status/80183699127275521

I think she's right.


Why not have a PageRank for twitter accounts? Essentially if I retweet something you say, that's the same as a link from one website to another (other heuristics: person A replies/@mentions B, or person A follows B, or person A follows B and vice-versa).


It might work for a while but it would need to be more advanced to work. If not, the perps will just set up accounts that retweet each other. It would also have to meassure if the retweeting account has good karma as well, if not, don't count it. But hey, let the arms race begin!


Google has the exact same problem. Surely one could borrow some of their insights. i.e. to detect 'retweet farms'. If all the spammers retweet each other, they'll still be on a PageRank of 1. No matter what you do, there'll be people who try to hack it. Twitter (and all organisations in this problem) can only keep trying to make things better.


Twitter could probably count a HN-like karma for people.


I think that's kind of like what Klout.com is trying to do.


Cross posting here in case it can help connect with someone that might be able to help: The problem with the geographical filtering is that very, very few tweets sent about the subject of the earthquake are geotagged.. as far as I can tell that is what twitter uses for the 'near: filter'.

For what its worth I set up this site http://chchneeds.org.nz immediately after the last big earthquake.. and it is still operating today. As a first step to finding tweets relevant to chch earthquake and filtering out spam it does also allow for filtering by address when people mention an actual address in the tweet.

For example if they say 'at 23 Maidavale lane' or 'in Sydenham' it tags those tweets as being in a particular suburb and in the region of canterbury. For example: http://chchneeds.org.nz/#!/loc/canterbury

I had grander ideas for this but it was all about people in canterbury choosing to actively use the tags #offer and #need. When I realised that eq.org.nz and their volunteers were doing a better job of filtering and sorting (and getting their message out) than i put my time into helping them.

At this time eq.org.nz has been shut down as it does use quite a lot of volunteer hours to keep going, but everyone and anyone is welcome to try and use the data from chchneeds in anyway that they think it may helps. It's all open data made publically available of course. I'm @chchneeds on twitter if you want to get in touch with me.

Also for what its worth I also set up a similar service for the japanese earthquake, but was overwhelmed by the amount of data, and I wasn't keeping up so I had to shut it down. The new rate limiting rules from twitter don't help much with this.


Just want to add its really such a goddamn shame that spammers are doing this - basically polluting the public resource of ad-hoc hash tags with their probably automated choosing of popular hash tags. I do hope twitter finds a way to deal with this.


Twitter could add an option to fill reCAPTCHA with tweets. Updates posted with some human verification could have higher weight.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: