Hacker News new | past | comments | ask | show | jobs | submit login

It's not that it's random, it's that the signal is too abstract for such simple methods to grasp it. Machine Learning which can predict Stack Overflow question scores is going to have to understand what's being asked, and how useful the responses are going to be for people interested in the question.



I’m not convinced. I tried manually predicting high score vs low score as well and couldn’t do so reliably.

The language model (viewable at https://stackroboflow.com ) scores much higher accuracy predicting the next word on Stack Overflow than other datasets like IMDb.

But on the IMDb dataset this approach led to state of the art sentiment analysis results by pulling out the encoder and adding a custom head whereas on Stack Overflow it didn’t grok anything about the score.


It may seem obvious with the benefit of hindsight, but I would have guessed that the better answers might be different in terms of word choice or style.


I'm not saying it was a silly thing to try, just that it's wrong to conclude from the result that the scores are random. I.e., I was mostly responding to "It’s so crazy to me that which posts get popular might just be random."


I’m not convinced they are randomly distributed but before I started playing with it I was fairly certain they weren’t.

Now all I know is I haven’t seen evidence that they aren’t.

I could see how there could be some element of luck in whether a post gets attention before getting buried. It depends who’s online when it’s posted and whether it piques their interest, and many other random factors that don’t have anything to do with the content of the post like how quickly other posts come in afterwards to bury yours, and how any automated algorithms decide to surface it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: