I can't evaluate how serious or useful this is but I really like the idea. Anything that might encourage more constructive discussions is a perfectly valid thing to explore. I love the idea of trying to characterize the contributions of different commenters based on their history, including my own. Not just for things like "hatefulness" or "incoherence", which is what this seems to be detecting, but other things as well, like "helpfulness" or "erudition" or apparent areas of interest.
Funnily enough, this comment how has my second-highest hate-score (the highest is a fairly level-headed comment on drug policy.) Apparently my cunning plan to fake being a nice person on HN is working.
I got hater scores of around 2.5% on both my accounts. On each one, the most hated comment was the one where I took the most pains to explain that I liked the user's main point and was just trying to clarify/disagree with a small part, i.e., to be sure I was not dodging the central point.
Reading them, they come off as passive-aggressive. I think that's a useful discovery for me. Thanks.
Small bug: It seems like the specific comments that you link to also include submissions, which once it encouters that it mismatches all the subsequent 2 character links with the comments they're associated with by an index of 1. I.E.
Yes and yes. :) The problem with the first one is I have to make an individual API call for each comment currently. If you go past 50 comments it starts to get slowwww. For example, I tried to pull back all of @pg's comments... Bad idea. It's like 13,000 of them. Check it:
https://hacker-news.firebaseio.com/v0/user/pg.json?print=pre...
I think I've seen a few archives of HN comments. Maybe grab some of those and only make API calls for newer comments? Also preserve the comments from the API calls?
Your website sucks. All the content is is smashed in a column with small width so I have to scroll left and right to read each paragraph. This despite there being gigantic 30% of the screen margins on each side. Try to build something even half-worthwhile next time you're at it. I could build a better website easily; you should go back to school.
Intentionally hateful comment by the way ;) Just curious to see how much haternews picks up on it. Although the problem I describe with the content column is real, all the insults and drama isn't at all! For reference: http://imgur.com/St9hAqR
Edit: My intentionally hateful comment shows up as being somewhat hateful, but not my most hateful comment. So not too bad!
And just a fun side experiment idea: I'm curious how the IBM Watson User-Modeling service would stack up out-of-the-box against how you currently do things:
1. Thanks for posting my blog post @chippy. :) The actual app ( haternews.co ) kept getting booted off HN...
2. There have been a lot of interesting comments on the three threads on here. People pointed out some bugs and overall issues which I will be fixing (also, the site should not crash half as much now). This is just a fun side project I have been messing around with so I can get better at using data science in various applications. If you would like to help build it out for fun further let me know! Also, feel free to submit a bug or suggestion for an improvement if you really want to.(https://github.com/kevinmcalear/hater_news/issues)
3. I wanted to build the "hater score" for two reasons. First, to see how accurately I could build a model to measure insulting comments in the wild and second (if it's accurate), to see how people would react to seeing how positive or negative they usually are on Hacker news (or other social networks).
4. I wanted to make sure everyone knows that just because something is your "Worst Comment" doesn't mean it is negative. Most people have very low scores and most of your comments are not identified as insulting. (It would be over 50% if it is actually an insulting comment.) So most people on HN are not actually haters. I just had a more "hater" focused design just for fun. There are in fact actual haters though, if you look hard enough.
5. Something I found interesting is clicking the "Back In The Day" checkbox. It takes your 50 oldest comments and analyses them, instead of your 50 most recent.
6. Finally, if you're not sure why some comments are getting ranked higher than others, feel free to look at the training data I used (it's from a kaggle competition from a while back.) and read my blog post. If you don't want to here are additional features I used on top of standard bag-of-words (CountVectorizer):
* badwords_count – A count of bad words used in each comment.
* n_words – A count of words used in each comment.
* allcaps – A count of capital letters in each comment.
* allcaps_ratio – A count of capital letters in each comment / the total words used in each comment.
* bad_ratio – A count of bad words used in each comment / the total words used in each comment.
* exclamation – A count of "!" used in each comment.
* addressing – A count of "@" symbols used in each comment.
* spaces – A count of spaces used in each comment.
If you have suggestions on other features I could collect let me know! I'll also be building a way to get actual training data from HN itself and letting HN users determine if a comment is actually insulting or not so that the predictions constantly improve.
Have you ever considered that machine learning isn't fairly dust and that you can't sprinkle algorithms on a criteria that's poorly formulated to begin with and get an objective criteria for evaluation? I mean what is "hate" - insults? expressions of frustration? Sly insults? Sarcasm?
Also, the "most hateful" comment was me quoting someone else's rather unpleasant comment, whereas I'd prefer my distaste for lousy ideas show through more directly.
1. I wish everything was made from *fairy dust. How awesome would that be? :)
2. "Hate" is definitely hard to quantify. It's in fact quite difficult to map words to their intentions and get it right consistently (especially within a proper context). So difficult that people set up Kaggle competitions on exactly this. I actually got my "magical" training data from a competition that paid out $10k, which I explained in the article but here it is again:
They did a great job building a baseline training data set to evaluate several different models on. Which are all briefly explained or at least shown in code in the article. And what "hate" actually means here is the probability that a comment is considered insulting. The "hater score" is just an average of the most recent (or oldest, depending on your settings) comments' probabilities that they are insulting.
3. I read and looked at several different attempts to build something similar by various data scientists who were kind enough to share their findings, including a huge contributor to scikit-learn (https://github.com/amueller).
4. Taking out quoted text would be a great feature to add. I have about 5 or 6 new features I will probably add and see if the model works any better for it, thanks for the suggestion (another person was suggesting the same thing). :)
5. This was just to see how well "sprinkled algorithms" and magical coding works in the wild world of actual comments. I love learning and improving my knowledge base with actual experience so I figured why not build something and see what happens. :)
My favorite part about your algorithm is that it even detects SELF-HATRED! That puts an interesting, and depressing!, spin on the project. My pseudo-unconscious self-loathing was uncovered through my meager number of HN posts. Eerie.
The worst comment detected by your project is an expression of relief in learning that onions will not, in fact, brown in fewer than 30 minutes.
"All your life you're just thinking, "I'M AN IDIOT, WHY WON'T THESE BROWN!?" and then, one September's day, you find out it was all lies... the entire time. Lies all the way down."
'Hater' isn't about hate. It comes from 'player-hater', where 'player' is slang for a pimp. Many people disapprove of the lifestyles of pimps, mainly because of their use of women's bodies as an income stream, and the emotional manipulation and physical abuse it takes to keep that up.
Pimps characterized this disapproval as a hatred borne from jealousness of a that lifestyle, masking dissatisfaction with one's own square, lame lifestyle, an expression of one's own inner turmoil, cowardice, and weakness.
Assholes at all levels of society picked up on this sentiment (through cultural colonialism.) 'Hater' now means anyone who criticizes the speaker for anything.
Yes. But it's less fair for everyone else to have to deal with your lack of maturity. The world would be a better and more civil place if people would move beyond the frankly medieval concept of justice as retributive parity, and certainly the internet would as well.
It's just words in a box. Unless they're threatening you personally, or doxxing you or something, just take a deep breath, recognize when you're dealing with someone who has neither the capacity or willingness to respect you, and move on with life.
I could not move on very far from Microsoft when they were at the height of their power. Surely I could use Linux but the outside world will try constantly to infect me with something MS-proprietary.
my "most hateful" comment was this:
"Thanks, would love to see your app when it's ready!"
Not sure if it's ready for prime-time.