Hacker News new | past | comments | ask | show | jobs | submit login
Reddit's Favorite Scientist Just Got Banned for Cheating the Site (vice.com)
73 points by antr on Aug 2, 2014 | hide | past | favorite | 64 comments



Why this matters:

Visibility on Reddit is essentially predicated on early votes. Get a few upvotes and even mediocre comments will often remain at the top due to momentum. Less than -4 and it's doomed to oblivion (ie. downvote the competition).

One of the most popular linked sites on reddit (quickmeme) got to where it was with only a few strategically placed votes for each link, and it took ages for them to be found out and similarly banned: http://www.dailydot.com/business/reddit-quickmeme-banned-mil...

More broadly, the algorithm reddit uses is not only known wrong/buggy:

http://technotes.iangreenleaf.com/posts/2013-12-09-reddits-e...

but also generally defective:

http://www.reddit.com/r/circlebroke/comments/vqy9y/dear_circ...

It's possible HN suffers from the same flaw (initial sort and feedback loop from such) since it's the natural (naive) way to produce such algorithms. The machine can't tell if it's quality content or just easy to upvote, and the latter is more common.

TL;DR: Visibility == early votes from people who have no interest in depth and don't read the articles anyway.

edit: fixed links


I feel a little gratified, because the /r/circlebroke post you linked to has a link at the bottom to my /r/ideasfortheadmins post.

In general, the problem of path dependency based on highly-weighted initial changes can be modeled by the Polya Urn: https://en.wikipedia.org/wiki/Polya_urn_model

Over at kuro5hin (which in general should not be used as a model for successfully running an online community) rusty addressed the problem of initial votes being too influential by hiding the score of a comment until it had received 6 votes.

reddit also allows subreddit mods to hide comment scores for a predetermined period of time, if they choose to: http://www.reddit.com/r/modnews/comments/1dd0xw/moderators_n...

These solutions are good in general, but are still problematic when faced with sybil attacks / voting circles.


Of course reddit is flawed. Anyone that has spent time on Slashdot or Advogato could have told you the whole premise of up/down voting is beyond silly.

No, the utility of up/down votes is in retention. It's nothing more than a Skinner box that people check in on to see if they are winning or losing. Which is how Digg and Reddit both became huge in a short amount of time.


The system up/down replaced two worse system, being "they who post most get seen the most" and "moderators do and decide everything".

What ever system that will replace up/down voting has to do better than the ones it will replace.


> the whole premise of up/down voting is beyond silly

It's not silly when you have a community this size. It might be flawed but it's one of the less flawed system.


Do you know Slashdot's system? jhmarten wasn't criticizing community-based moderation, only the simplistic model of up/down voting, when compared to ones which get users to specify why they're voting.


I'm just a lurker so I didn't know their system. They have an extremely small community compared to reddit though, that was my point.


Compared to Reddit as a whole, yes, but that's because Reddit is a collection of disparate subreddits, not a single common area. Slashdot had 5.5 million users in its heyday, which is on the same order of magnitude of the biggest subreddits.


> More broadly, the algorithm reddit uses is not only known wrong/buggy:

> http://technotes.iangreenleaf.com/posts/2013-12-09-reddits-e....

Reading that article, I get sent to https://github.com/reddit/reddit/pull/583 which contains https://github.com/reddit/reddit/pull/583#issuecomment-35440... which notes that the issue discussed has actually been fixed.


> It's possible HN suffers from the same flaw (initial sort and feedback loop from such) since it's the natural (naive) way to produce such algorithms.

I always assumed that's why blatant blogspam sometimes makes it to the front page of HN.


HN does suffer from the same flaw; that's why voting ring detectors and flagging are necessary as a stronger countermeasure than simply upvoting.


I wonder if the HN countermeasures are stronger than Reddit's. For HN submissions to have much of chance of sticking around you really need a few quick upvotes.


I was under the impression that reddit's algorithm randomized top comments, while also adding weight to other parameters. Adding randomness seems like a pretty obvious way to reduce the impact of the "first mover" advantage. HN should do it too.


[deleted]


[deleted]


Thanks, fixed. Have no idea why they got broken.


> Why this matters:

this doen't matters. Reddit is the bottom of the internet. Anyone cares bout 4chan or reddit.


/r/subredditdrama has a more honest albeit less neutral recap of the events: http://www.reddit.com/r/SubredditDrama/comments/2c9ida/recap...

Vote cheating was only the final straw: part of it was a silly argument between Undian and another user.


The argument wasn't the reason he got banned, though. The admins don't care at all if users argue with each another like idiots. That's the subreddit moderators' problem.

It was the blatant vote manipulation that did him in. The admins only ever ban users who break one of the five rules or make the site look bad. They should probably do a lot more...


It seems the real issue is the cult of personality that develops around these people. It happens here on HN as well. There exists a large cadre of people who rally around certain well-known community figures and eventually morph into sycophantic zealots.

The positive-feedback cycle stemming from early votes serves only to create an illusion of consensus, and many impressionable users begin to form thoughts like "that many upvotes can't be wrong" etc.

That, IMO, is the real problem with up/downvote systems.


A slight but important difference is that Reddit idolizes semianonymous users, while the most-respected users on Hacker News have fully-transparent identities.

The identity of the idol, and the lack thereof, can change the perception of this funny super-commenter completely.

Source: I gained internet notoriety through my comments on TechCrunch under my real name. As far as I know, I don't have any sycophantic zealots.


That's true - and as for Reddit, it seems to make people focus more on the content. Accounts like shitty_watercolour and awildsketchappeared aren't really popular because of who they are, rather what they produce.


I've also noticed that when someone disagrees with one of the types of people you're speaking about, they tend to get downvoted very quickly - even if they present a well thought out critique that isn't rude or negative whatsoever.

Not all the time - I'm sure there are plenty of counterexamples - but enough I thought it was worth pointing out.


In this case with Undian, the person who was arguing with him had all her posts systemically downvoted and was harrassed.

It's worth noting that Hacker News disables downvotes on submissions older than 2 weeks.


nmjohn's point is valid. Disagree with pg in a comments thread here and watch the downvotes fly, even if your argument is rational and your tone polite.


sigh

Here's the thing. You said "vote manipulation is cheating the site."

Is it in the same family? Yes. No one's arguing that.

As someone who is a scientist who studies web forums, I am telling you, specifically, in science, no one calls multi-voting cheating. If you want to be "specific" like you said, then you shouldn't either. They're not the same thing.

If you're saying "vote manipulation family" you're referring to the taxonomic grouping of votidae, which includes things from posting quality original content to asking strangers for upvotes to running vote-bots through proxies. So your reasoning for calling multi-voting cheating is because random people "call the unusual votes cheating?" Let's get copypasta and image macros in there, then, too. Also, calling someone a human or an ape? It's not one or the other, that's not how taxonomy works. They're both. A bot-vote is a bot-vote and a type of vote manipulation. But that's not what you said. You said vote manipulation is cheating, which is not true unless you're okay with calling all members of the vote manipulation family cheating, which means you'd call good posting, bad posting, and copypasta cheating, too. Which you said you don't. It's okay to just admit you're wrong, you know?


I...

Is there some context I'm missing here?

It's an article taking about unidan having a bunch of alts to vote himself up. Referring to that as vote manipulation is a completely acceptable practice.

Why are you being so pedandic? You must have a reason. I refuse to believe you see this as a "Terrible trend that must be stopped". Because that would be hopelessly insane.

edit:

>Is there some context I'm missing here?

answer: YES

give me a link or something, jesus -.-



I'm pretty surprised reddit's backend doesn't just automatically devalue this kind of vote cheating. In the general case, it's really easy to detect, and if your cheater is a bit more sophisticated, it's still not that hard, it takes a really serious person to have five accounts on five different browsers/vpcs that each come from different IP addresses consistently, even then, when you see them colluding together, you can still devalue those votes simply because it's an obvious clique taking early action consistently, so even then it should be pretty detectable.

I don't see a reason to ban someone over this behavior as long as you can make sure that the fraudulent votes don't actually add any value to the content, which they shouldn't.


> I don't see a reason to ban someone over this behavior as long as you can make sure that the fraudulent votes don't actually add any value to the content, which they shouldn't.

Visibility of the content is essentially predicated on early votes.

The problem is that an initial sort is built on limited data (quality content or just easy to upvote produces same results and latter is more common), and the subsequent feedback loop is built on initial visibility decisions.


> In the general case, it's really easy to detect

Why do you say that? I've worked hard on this problem and would not call it easy.


When I say general case, I mean random person creates an account or two to upvote their own stuff. This is usually trivial to detect if you save a single cookie that's linked to each account, even easier if you are also using more tenacious tracking like flash cookies or localstorage or the dreaded evercookie.

In the less general case, you can still determine that a particular group of accounts never take action in the same small time windows.

The way I've approached this problem in the past relies heavily on meta data about the user. It's easy to determine if an IP is a proxy/VPN, which should immediately make a client more suspicious (when you're already suspicious about fraudulent voting).

Initially, account ages are important but if someone is able to get away with it for a long time, they can get complacent and keep using the same accounts, and at that point you can look at things like upvote/downvote ratios (or generally the lack of diversity in voting activity) compared to some expected average activity.

There's a lot of other tricks, but I don't know what your approach has been when working on this, and I know HN is a much different platform from what I worked with :P


>> In the general case, it's really easy to detect

> Why do you say that? I've worked hard on this problem and would not call it easy.

Are there reference data sets for this problem? If not, you or reddit should publish data sets.


The issue with giving information as to how a voting detector work is that it would make it easier to cheat.


Not the algorithm, the data. Anonymized user ids and their votes, together with relevant meta-data.

I suppose the fact that there's already a detection algorithm biases the data, but it'd still be better than nothing.


Imagine a group of 30 people who use the site regularly. When an URLs are posted the each roll a die. If it comes up 1 then they up vote. If it comes up 6 then they downvote. If it comes up 2, 3, 4, or 5 then they do nothing.

That takes 30 people and it requires discipline to oney the die, but it'd be pretty hard to detect. I think, maybe I'm wrong?


Yeah this would be hard to detect, but voting fraud is usually targeted, so you can look at the specific things they have historically voted on and determine if they only vote on X type of thing or posts from Y, or only vote at times when Z just posted a URL, etc...


I think the real issue is the relative weight between the first votes and the rest


> I'm pretty surprised reddit's backend doesn't just automatically devalue this kind of vote cheating.

What makes you so sure it doesn't?


That they have to ban people who do this is a pretty good sign that it's not working well. Subreddits have pretty intense human moderation which also tends to negate the need for sophisticated detection since people are pretty good at spotting this sort of thing on their own.

Also, isn't the reddit source public? I presume that if it had this, it'd be easy to find it there, but I haven't looked, maybe I should :)


It's public except for the large, sophisticated body of anti-cheating code.


Are there any write ups about this anywhere? Sounds interesting. Also, is any information from that system provided to moderators?



Thanks!


No detection mechanism is 100% reliable so better to ban 100% of the ones you know aren't false positives than risk them trying again and being a false negative.


Not really. Devaluing a single vote insitu once in a while shouldn't be that negative on the system as a whole, especially if there is suspected fraud.

The problem with banning fraudulent users is that they are tenacious, thats's a big factor in their committing fraud in the first place, they are usually on a "mission," see: https://en.wikipedia.org/wiki/Wikipedia:Single-purpose_accou... (not really the same as a fraudster, but this is a common archetype)

They will just make another account or increase their efforts, better to keep them in their container within the system.


You seem to believe they won't to notice the effect on their "container within the system". They will eventually, just like a shadowban.

At which point, they create a new account.

> while shouldn't be that negative on the system as a whole, especially if there is suspected fraud.

Assuming it isn't computationally expensive to do this on a per-vote basis. If you have to do any sophisticated analysis, that may not be true.


When it first launched, Reddit's founders used software to fake a large number of users to make it look like the site was more populated than it really was. I really don't see how someone doing this with their own posts is any better or worse. Does it really matter if the content is good?


I think there is a very large difference.

Reddit doing it, although some would argue unethical, was meant to make reddit even usable. It was one of those sites where it takes a critical mass to become usable, and without that critical mass it would be very challenging to actually get it. So they faked it until they had it - and it worked very well.

Someone doing it on their own only serves to benefit them - and their karma count. The goal isn't to benefit the larger community or make the site more usable - it only is to boost their ego.

> Does it really matter if the content is good?

Unidan had a lot of very good content, I cannot argue with that. However he also had a lot of really shitty content that under anyone else would have been downvoted to hell. Unfortunately his user page was taken down so I can't provide examples. But the problem is, that this bad content pushed other people's content out of view.


I'm pretty sure "do as I say, not as I do." Is the house rule, for, well... everywhere that humans exist.


> Despite being banned, Unidan has already made a new account, called UnidanX, and has been posting for most of the day under that username. It's looking like he hasn't been banned with that username yet.

What's the point of trying again, he's already lost the trust of the community and his name is a red flag. Is that subreddit super-forgiving? Or maybe the turnover is so high people will just forget?


All of his posts in his new account[1] have been massively downvoted. So they are not forgiving right now. His account shows 6k+ comment karma though. That's strange unless gold gets you some karma?

[1] - http://www.reddit.com/user/unidanx


I'm not 100% sure how the total karma is calculated, but IIRC if you downvote someone from their user page, it does not really count. The idea is that if you're hunting to downvote/upvote all posts from the same user, you're probably up to no good.


It's because for the most part he posts some very educational stuff. So hopefully he just learned his lesson.


Not to be cynical, I don't think he did. He was doing what he was doing till the time he was caught. It's not like he told the community himself.

He is just trying to play the nice guy game to save his image. His doing this is dishonest -- plain and simple. Don't get me wrong. He is really good in his field and knows a lot. I learned a lot from him and if he posts stuff, I will read it again. But this behaviour is dishonesty.


> He is just trying to play the nice guy game to save his image. His doing this is dishonest -- plain and simple.

If he was being honest, would it look any different? Or is the pitchfork mob just demanding blood?

On the one hand a reputation takes years to create and only a second to destroy. On the other hand, holding onto anger is like drinking poison and expecting the other person to die.

Personally I think if you want to be angry at something there's much more insidious manipulation going on. Unidan makes great content and gets people excited about his biology research... remember him for that, not for his amateur inability to execute a proper astroturfing botnet campaign.

At the very least, let him be reborn with a new username and see what he can build. Isn't the ability to bounce back from failure and mistakes what we celebrate here on hn?


He actually lied, straight-up. He said he only downvoted misinformation, but it turns out he downvoted others' submissions (to make his own submissions look better).


> I had five 'vote alts' when things were in the new list, or to vote on stuff when I guess I got too hot-headed.

I can totally understand that, I used to spend a huge times on forums, communities, etc... And I also used to have a bad temper. Of course I created fake accounts to upvote myself and downvote others. I'm pretty sure anyone who is a bit clever and also a sour loser/has a bad temper, has done that before.

And only 5 accounts? Maybe there are others, but that's really minimal. That's "unprofessional" which is a good thing and I really think things are getting blown out of proportion. Reddit wouldn't be the same without this guy, that he acted like a kid or not.

PS: What is actually sad here, is that he wasn't doing it to give to the community, but just for internet points.


That sucks, I always enjoyed (and learned from) Unidan's posts. What a weird thing for him to do with his hard-earned reputation.


Oh, sorry, I meant, "Wow, Unidan sucks. I always knew he'd turn out to be a bad egg. Good riddance."

Better?


If he honestly cared about sharing his knowledge with the world and not his ego, he wouldn't he would have come back under a different, anonymous name; not UnidanX.


> Reddit's Favorite Scientist

That's being too generous. I would vote for RobotRollCall as Reddit's Favorite Scientist.


I thought they were referring to Neil deGrasse Tyson, which made the title highly implausible.


Then you'd be wrong - if we're talking about popular opinion.


She hasn't posted in two years.


Aww, I remember when Chris Slowe was reddit's favorite scientist.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: