Hacker News new | past | comments | ask | show | jobs | submit login
Hacker News metrics (first rough approach) (mjg59.dreamwidth.org)
88 points by edwintorok on Oct 30, 2014 | hide | past | favorite | 68 comments



The Hacker News ranking algorithm tends to penalize stories that address social issues.

The Hacker News ranking algorithm does not penalize stories that address social issues. It doesn't consider social issues at all, apart from a couple of special cases that we turn off once a major wave subsides [1]. The effects the OP is observing are mostly caused by user flags and the voting ring detector.

1. The intention there is to prevent copycat and follow-up stories from clogging up the front page. For example, NSA stories used to be weakly penalized, but no longer are, and I think there's still a similar penalty on ebola stories.


I seem to remember reading that in an attempt to avoid "controversial" subjects, posts are penalized if they generate comments too quickly. Social issue are often hotly debated and occasionally some of the most actively commented on posts on the site until they drop off the front page. So while HN might not actively be penalizing these posts, they might be penalized in indirect but still very real ways.


I believe a thread with more comments than votes is penalized as well, suggesting that commenting on a thread without upvoting it first is implicitly flagging it.

And of course there may be other arcane metrics involving thread depth, average comment rate, comment size, thread shape, etc. Who knows?


Whoa, really?

...I've been flagging a lot of articles.


Only a factor if it's got more comments than upvotes.


I would hope that HN's "flamewar" detector is somewhat more sophisticated than just "generate comments too quickly".


There's an aspect of it that's a ratio of comments to upvotes, which seems to work pretty well from what I've observed.


Now that makes more sense. A story that generates a large number of comments and a large number of upvotes very quickly seems likely to be a good story. A story that generates many quick comments without upvotes seems likely to have produced comments negative about the article; if you posting a comment but you don't want to upvote the article, that does seem like a good sign of controversy and possible flames.


But it doesn't just penalize flamewars, it penalizes all conversations. For example, I have now posted two comments in this discussion and only upvoted the article once. Does that mean this is a worse article than one that I only upvoted and never commented on or a better article than one I upvoted and commented on three times?


Keep in mind that HN also has a "deep thread" mechanism, which hides the reply button for a while on sufficiently deep comment threads to discourage rapid-fire posting without thought. (Slashdot used to have the same thing; if you posted comments too rapidly, you'd get a "whoa, slow down there".)

I'd also hope that more weight goes to "more commenters than upvotes" rather than "more comments than upvotes", though as vague heuristics neither seems too far off.


I think it's mistaken to assume that not upvoting and downvoting are each implicitly negative, though. There is a neutral dimension to subthreads as well - conversations which are neither praiseworthy nor trolling.


I would be more fair to address the broader point of the post rather than just one sentence. It's good that the moderator(s) specify why something is penalized[0], but it doesn't really change the overall result.

[0] As defined by the post in question.


Stories about hot social and political controversies reliably attract both a lot of upvotes and a lot of user flags. This, together with how the software interprets votes and flags, explains the effects the OP is describing.

On hot social and political issues, many people find it irresistible to conclude that HN is biased against the side they favor and favors the side they oppose. They perceive this bias in all things HN: the community, the software, the mods, etc. But the truth, as far as I can tell, is that on most of these issues the community is divided. That's why the threads on them turn into flamewars.

Users reliably flag threads that turn into flamewars on HN. (By "flamewar" I mean angry arguments over predictable positions.) Moderators do so as well, though not to the same degree. But this isn't a bias against social issues per se, let alone particular social issues—the same pattern holds on threads about, say, basic income, or high-frequency trading, or even something like static vs. dynamic typing in programming. It's a matter of how divided the community is and how strongly people feel. It's more noticeable on any issue where one feels strongly.

What I do think it makes sense to call the bias of the site is a bias against the low-signal/noise discussions that inevitably result when people angrily disagree, especially when they do so through the filter of a pre-existing ideological battle. This is biased in the sense that HN's guidelines are biased: they call for certain types of stories and discussions rather than others.

Sometimes people argue that HN should abandon its anti-flamewar bias, because after all these are important issues and they need to be debated. But this fails to consider systemic effects on the site. The more ragefires burn, the more thoughtful users leave. That selective process, repelling the users HN most wants and attracting those attracted to flames, is a vicious circle that (a) is hard to get out of and (b) would turn the site into scorched earth. For all HN's flaws, it would be nihilistic to just let it burn.

That doesn't mean we aren't open to ways of balancing these things better: they're hard to balance, and if we're missing something we want to know.


> Users reliably flag threads that turn into flamewars on HN.

Then why not simply disable comments on sensitive stories?

> But this isn't a bias against social issues per se, let alone particular social issues—the same pattern holds on threads about, say, basic income, or high-frequency trading, or even something like static vs. dynamic typing in programming. It's a matter of how divided the community is and how strongly people feel. It's far more noticeable, of course, on any issue where one personally feels strongly.

Sure, but of all of your examples, only social issues are really important. It doesn't matter what people's intentions are (this is something that we seem to always get back to). If the result is that really important stories -- stories that touch on the very core of technology, start up culture and innovation -- get "silenced" (and, as a result, cause observes to view HN's community in a negative light) then it's your duty to make sure that this doesn't happen.

> But this fails to consider the systemic effects such a change would have on the site. The more the ragefires burn, the more thoughtful users leave.

So disable comments on those stories.

> But this fails to consider the systemic effects such a change would have on the site.

Except that there's an opposite systemic effect: HN readers get the sense that being thoughtful means shying away from those issues (but not from vapid stories about some startup raising money). Also, thoughtful users might abandon because anything that's truly interesting (other than, say, objective discoveries) -- and therefore controversial -- is not featured on HN. Your policy appeals to those attracted to the vanities of SV culture (and then YC people write blogs trying to educate people against them) while repelling those who are interested in reading criticism of it.

Not saying something sometimes sends as clear a message as saying it.


Disabling comments on sensitive stories has the effect of privileging the viewpoints of people who happen to win the front-page contest, and endorsing whatever framing happens to accompany whatever happens to be the most popular telling of that story. That seems like an extremely unfortunate dynamic to adopt.

Please, HN, do not allow people to post stories on the site that can't be contextualized or even rebutted in comment threads. Lots of popular stories are really, really bad.


I actually think this would encourage more thoughtful rebuttals in the form of long blog posts. One great thing about this community is that usually there are enough people to promote stories on opposite sides of the debate to the front page.


The assumption here is that blog post followups to popular front-page stories will themselves naturally find the top of the front page. But of course they won't. For one thing: simple causality guarantees that they won't be available until hours or (depending on the quality of the post) days after the original story first hits the front page, when it's at the zenith of its gravitational pull for HN votes.


You may want to consider writing a blog.ycombinator.com post about this. I think there are lots of HNers who would find these algorithms interesting :)


This is insanely disingenuous. User flags and the voting ring detector are part of HN's ranking algorithm, whether they're labeled as such in the code or not.


We must have a crossed signal here. Of course those are part of the ranking algorithm! My point is that except for the exceptions I mentioned, none of the code considers social issues. What part is disingenuous?


If social issues are more likely than the average story to trigger aspects of the ranking system that results in them disappearing, the ranking system is biased against social issues. The fact that it's also biased against various other topics is irrelevant to the point I was making - you've constructed a ranking system that makes it almost impossible to maintain visibility of stories about diversity. In an era where the lack of diversity in our industry is so extreme and stories about harassment so visible that the mainstream media have picked up on it, that seems like a problem.


My guess is that I fall right alongside you in terms of the valence of my opinions about diversity, "meritocracy", and the working culture of the San Francisco Bay Area. I'd even hazard a guess that my level of militancy on those topics is similar to yours.

I am a reliable flagger of diversity stories that I perceive as inviting flame wars --- even though I'm also a willing participant in those flamewars, as you can quickly find out from the search bar below.

What I'd like more people to consider is that HN can't be all things to all people. I find the site personally interesting, have met several friends here, and have extracted considerable professional value from it. Those are all very important things to me.

Meanwhile: there's a whole big wide Internet out there that can host discussions and debates on social issues of all stripes. If those debates, hosted here, are going to rip the site up (and evidence suggests they will): not worth it.

There are people whose opinions I respect a great deal --- Maciej Ceglowski being a good example --- who strongly disagree with me on this point. Of course I might not be right about this. I'd just say, be careful about assuming that the bias is in favor of a status quo on the substantive issues. For a lot of us, this is a process issue.


Given how often diversity stories are on the front page, there doesn't seem to be much of a problem getting such stories visibility.

In fact, the top story all day long today has been a diversity story, with a massive number of comments on just about every possible aspect of the story.


You said:

"The Hacker News ranking algorithm does not penalize stories that address social issues."

And then explained the observed effects as being from flags and vote ring detection.

People flag stories that address social issues, thus, the ranking algorithm is penalizing stories that address social issues. That (the collective) you did not specifically intend this makes a nice case study in unintended consequences and/or the collision of technology and humanity, but doesn't change the point being made.


By your logic, the software must also be biased against the things users downvote, since after all the code records the vote.

Of course the algorithm penalizes (most) stories that users flag. But it has no knowledge of what those stories are about. If some topics get flagged more than others, that is a function not of the algorithm but of the inputs users give it. This seems so obvious that I fear this dispute is purely definitional.


... Wow.

Does it help if we change the word "algorithm" to "system"?

Stories entering the HN system on certain subjects are more likely to be penalized by that system. Humans are an integral part of HN. HN would neither function nor exist without them. You cannot simply ignore them and pronounce the system unbiased.

Does that make more sense to you?

This is really surprising to me. I usually disagree with you on, well, everything, but you have not previously seemed like someone who suffers from this particular technologist disease of utterly missing the big picture because someone used a word that you define narrowly in a broader way.


Those two words are not interchangeable in this context. The algorithm is the code. The system, in your example, is the whole thing.

There's nothing wrong in making it clear that the code itself does not target social issues.


> There's nothing wrong in making it clear that the code itself does not target social issues.

I agree, and I suppose if you choose to read my comment completely free of the context of the rest of the thread, my attitude might not make sense. But when you realize he had also said to 'somehow' that he "thought that was the main point of the article?", it's obvious he was missing the forest for the trees.


Because clearly this dispute is caused by his inability to understand you, not by your inability to understand him.


I seriously have no idea what you're talking about. If you'd like to point out something specific you think I've misunderstood, I'm listening. If you just want to engage in content-free sarcasm, I'm uninterested.


... Wow.

You're here calling @"dang" disingenuous when he has posted nothing but straightforward explanations of what inputs the HN ranking algorithm uses. Then you make a reply laced with the pretense that he doesn't get what you are saying, or are trying to say.

Really he's (I'm just inferring here, because it's soooo obvious) 100% clued in to your conversational gambits.


Yeah, not interested in engaging with someone like you.


As far as I can tell, a fair number of users abuse the flag ability to squelch discussion on topics they don't want to see discussed or a viewpoint they don't agree with as an alternative to downvoting (since one cannot downvote a submission).


   This clearly isn't an especially rigorous analysis
Not especially, no.

Data samples consisting of two or five articles per keyword??

And 2 out of all 5 penalized "female" entries have nothing to do with social issues etc. (one is about female mice, another about mixed-sex animals).

Comparing stories on social justice against ARM and Intel is also cherrypicking, because we don't know whether this is supposed to be a bias against social issues, or just non-tech stories in general, or something else.

And what's your methodology of choosing keywords?

"Female" - which made it onto your list - gets penalized 5 to 1, all right.

But then in case of "girls" (omitted from your list) it's actually 0 - 3, telling us a different story.

And unlike with the "female" bunch, none of these articles is about mice. They're about how girls get better grades, how to get them into engineering and why it's easy to teach them code.

"Women" (plural) lose 13 - 4 and so the keyword is featured on your list, but "woman" (singular) wins 7 - 2, and surprise surprise, it's absent.

"equal", as a word stem (counting "inequality", "unequal" etc.) - penalized 2 times, not penalized 3 times. Holy smokes Batman, it rates better than "x86"!

Statistics, huh ;)

If this submission gets penalized, do we count it as bias against social justice, or junk science


This article is humorous considering that it was submitted on the same day as Tim Cook's op-ed which has been #1 on HN all day and now has over 2000 upvotes.


Interesting. I see that that four stories critical of AirBnB were penalized despite many comments (113, 70, 63, 54) and upvotes. The ones touting its value weren't.

I am not surprised. At least 5 times now, I've seen negative stories on AirBnB disappear from the home page fast, whilst silly links remained. Is someone protecting their investment?


Thanks for doing the work of trying to apply data to investigate this. It's too often lacking in discussions like these.


Not sure about penalising algorithms but I make sure to flag pointless social war mongering posts.


Right, because it's the posts that incite social wars rather than the real events they discuss. This is how it happens:

1. Something bad happens due to what we can call "SV culture", at least bad for some people.

2. A brave soul writes about the injustice, and posts the story (or someone else does) to HN.

3. The story is promptly flagged because the discussion of the injustice is "pointless social war mongering".

4. Order is restored.

5. Another brave soul writes about how HN reflects the SV social hegemony by ignoring important issues.

6. That story is promptly flagged.


So much butthurt.

This is Hacker News, not Tumblr. Social justice idiots try to inject their nonsense into every culture, and I would prefer this community to stay on the technology.


What's technology for? Technology isn't neutral (like science can be). It is either to advance social justice or to obtain or maintain power. I think you're not particularly powerful, so I can only assume you're so laser-focused on technology, so religiously fanatical about it, because, how did you put it? Well, because you're a "social justice idiot".


You seem to have a mental disorder when everything seems to be revolving about the issues you like, and so you are unhappy when they aren't covered by site X you visit.

Most of us like technology because it's interesting, not because it gives us some magical power. It's infinitely more interesting than whining about the abuse of virtual characters in games and other such idiocy.


> You seem to have a mental disorder

Personal attacks are not allowed on Hacker News.


Aren't you a mod here? You should ban me, or whatever the proper punishment is.


Rest assured that you will be banned eventually if you continue like that, but the proper first step is to tell someone what they are doing wrong, rather than ban them outright.


See also this thread about manual elevation, https://news.ycombinator.com/item?id=8313505


> But for now the evidence appears consistent with my innate prejudice - the Hacker News ranking algorithm tends to penalise stories that address social issues.

My innate prejudice says something worse: that many stories that have to do with social justice but are oblivious to their potentially adverse effect (intentionally or not) are not penalized and are rather handsomely rewarded. These are stories that seem to be about "innovation", but are really about a power struggle (e.g. stories about companies providing marketplace services in the "sharing economy"), with the sole exception of privacy (which fits with the techno-libertarian bias).

It is often said that discussion here shouldn't be about politics, but the worst kind of politics is that which you don't even notice (or choose not to notice).

And once something of substance does come up, it's so frustrating to see it quickly drop in rank because it's automatically tagged as controversial. So a story about a company offering cheap food delivery can stay at the top for a while, but a discussion about a workers' strike against Uber quickly disappears from the front page.


You may be including it in "automatically tagged as controversial", but I have the feeling that there are lots of people that are flagging those topics (which fits under "automatic" if you include reflexive flagging of social issues topics, but doesn't if you limit "automatic" to the software running the site).

I pretty much don't flag anything, but I can see how that flagging could be driven more out of a feeling that those discussions don't accomplish anything than out of a desire to penalize those stories.


Right, flagging is another issue. But as to "those discussions don't accomplish anything", what does any discussion on HN accomplish? On the more technical stories, the discussions often educate. Thing is, HN isn't /r/programming, and there are many stories about company culture, startups raising money, SV superstars etc.. What do the discussions about that kind of stories accomplish? It's just people schmoozing for entertainment. What's wrong with that?

But I have a practical suggestion: Instead of having unpleasant discussions about important stories related to technology, innovation and startups (like women in startups, which is a far more important topic than, say, Uber raising another trillion dollars), simply block comments on these stories, and at least let people read them without having them drop due to too many comments. Sure, people will still flag the stories, but AFAIK, the flagging privilege can be revoked if a user abuses it.


I don't really have an answer to the first paragraph, like I said, I don't really flag things (exceptions are things like dupes of recent, decent conversations or if I happen to look at /new, the really dumb, offtopic stuff (like celebrity news or cat pictures, the stuff that is waay out there)).

I think we also probably see the importance of visibility on hacker news differently; you're arguing that it should be a certain way, apparently because you think it will do some good (the soft phrasing there is because I'm guessing at what you are thinking, not sarcasm), I don't think it is going to have much impact (for example, many of the people that make threads awful see it as an opportunity to defend, with guile and shenanigans, their pet point of view; that's pretty clear evidence that the story isn't reaching them). I also think the media spends a lot more time pandering than it does manipulating (just throwing that in there as a similar situation where I apply a similar world view).


I think there's plenty of evidence that humans by nature are uninterested in "politics", except when it might effect them. Then it becomes an object of great importance.

As for the role of the algorithm, I think its just an extension of "garbage in, garbage out".

Stories like this are a nice reminder that every community is also a bubble.


Can someone explain how flags are applied? For example, today a post where person is asking for a job got killed, but I have seen same posts (people looking for a job) before that were left on the front page. Why was this one killed?


wow - the money quote "I scraped Hacker Slide to get just over two months of data in the form of hourly snapshots of stories, their age, their score and their position" -- how did he do this? I can barely get to the homepage say 14 times out of 20, without hitting the error page from their CDN/DDOS Protection pages (cloudfare?) And that's not to scrape content, just to read the headlines from time to time.


An interesting read. Of course, it criticises HN for actively penalising talk on social issues, which might well mean that this story itself will be penalised. And so the great cycle continues.


Actually, I believe Hacker News stories are penalty words too. (discourages meta discussion which can be toxic)



According to HN's software, this story has been heavily ring-voted. That's what has affected its rank.

Call that "penalized" if you want, but it has nothing to do with the content, so that tweet is misleading—and ironic, too, given what provoked the "penalizing".

(Also, in case anyone's wondering, moderators haven't touched the post.)


While I realize you can't go into too much detail about ring-voting detection to prevent people from finding ways around it, can you elaborate a bit on what ring-voting means to HN, beyond the obvious?

The obvious definition would be a clique of accounts that all upvote each other's stories, but that seems rather unlikely in this case; is there a secondary criteria? Are you looking to prohibit what Wikipedia calls "meatpuppetry"? Are you also looking to prevent upvotes from people who reach an HN discussion via Twitter or other sites?


I'm happy to elaborate on nearly any aspect of Hacker News, but this is one area (maybe the only area?) where I resist the impulse to explain. We're in something of an arms race against a lot of people who try to get their content on the front page through non-organic means. Accidentally helping them do so would be a genuine shot in the foot.

But let's say what good voting means to HN. It means voting for something that you personally find intellectually interesting, regardless of who posted it and regardless of what you or your friends are hoping to promote.


it has precisely nothing to do with the content

Do certain types of content set off the ring-voting detection more frequently? The type of content likely influences how it's shared and how people respond to it, which would affect how it's voted (and/or flagged). Algorithms are not unbiased, particularly when no one's bothered to study their confounding factors.


> Do certain types of content set off the ring-voting detection more frequently?

Sure—people posting their startups, for example, set off the ring detector more frequently than the typical news article. Friends voting for friends' blog posts, too. But is that a bias of the algorithm? That strikes me as a stretch.


Interesting. How do you guard against coincidental rings?

(i.e. a subgroup of people who all love, say, Lisp, and therefore always vote up Lisp stories. They're not an organised group - but they do tend to vote in concert.)


I used to worry about that too, but it turns out not to be a big problem in practice. (I realize that's not a satisfying explanation, but this is one area where we can't give out details without enabling people to game the site. Having put a ton of effort into HN's anti-voting-ring measures, I dread the thought of having to climb all the way down the mountain and push that boulder up again.)


I'm wondering that as well. In particular, it seems like content that's more likely to be shared via various social networks would be more likely to set off the ring-voting detector.


You're confusing correlation with causation.

Sure, many article at the top of HN after often posted elsewhere (e.g. today) because they are incredibly newsworthy.

However, a HN post that people share on social media network for the purpose of upvotes is a different story.


I'm not assuming either correlation or causation; I'm suggesting a hypothesis that seems worth testing and evaluating.

There's a difference in motivation between "check this interesting article out" and "please upvote me", but the ring-voting detector can't necessarily tell the difference.


> There's a difference in motivation between "check this interesting article out" and "please upvote me", but the ring-voting detector can't necessarily tell the difference.'

In the former case, why would they be linking to HN in the first place, instead of the article itself?

HN has a basic voting ring heuristic to ignore votes from direct links to HN submissions from a non-HN referrer for this reason.


I can't speak for other people, but personally, when I find a good article, I link to the source I found it through, rather than bypassing the source and linking directly to the article. (I don't do that when submitting stories to HN since HN considers that "blogspam", but personally I like finding new sources of good content, as well as sources of good discussion, rather than just the content itself. And in any case that's a heuristic for what to submit to HN rather than what to link to from elsewhere.)

At a minimum, I would tend to put something like this: interesting article (link to article) via HN (link to HN).

Though I'm certainly less likely to do that now that I know that HN does not like such links.

I don't think it's appropriate to post something like that about one's own story, but for a story you found interesting and want to share, I think it makes sense.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: