> “Why does the day of Osama Bin Laden’s death have such a low happiness score?”
> Many people presume this day will be one of clear positivity. While we do see positive words such as “celebration” appearing, the overall language of the day on Twitter reflected that a very negatively viewed character met a very negative end. It was a day of complex emotion which is best explored in the word shift for the day, rather than the single number of its average happiness.
That's.. a pretty dodgy statement. A more straightforward and honest explanation would have been "We use a bag of words approach which relies on the assumption that people use negative words when they feel down and positive words when they feel good. This sort of models gives good results in most simple cases, but doesn't handle complex cases, and can't take into account that people sometimes use the words "death" and "dead" in a positive way."
There's nothing wrong with using these sorts of basic models. That's what pretty much everyone who provides sentiment analysis does, and it's good enough for most cases. But there's no need to hide the limits of the system either.
"Tweets represent a non-uniform subsampling of all utterances made by a non-representative subpopulation of all people. However, there are hundreds of millions of people presently using the website to express their activities and interests, and as such it is an important social signal."
> "Tweets represent a non-uniform subsampling of all utterances made by a non-representative subpopulation of all people. However, there are hundreds of millions of people presently using the website to express their activities and interests, and as such it is an important social signal."
Not many of those people actively tweet themselves, though.
"I suppose I should be expressing some ambivalence about the targeted killing of another human being. And yet, uh, no. [...] Last night was a good night for me " - Jon Stewart
I recently had a similar idea (i.e. plotting contentment) related to keeping employees engaged / detecting how events at work affected the general mood: https://news.ycombinator.com/item?id=5646466
This post's great as it shows a really good way to model that data (e.g. their use of different colours for days of the week helps to determine if you had a good day for a special reason or if that day people are generally in a better mood).
So now Twitter really is out there as the biggest social science database there is. And much of what we get are descriptives (and analysis on the stockmarket). What are interesting research questions that need tackling (let's say: non-financial) and can use a dose of massive sentiment analysis?
I would like to know how you can use global and local sentiments for all kinds of analysis of social welfare.
I wonder how much happiness stems from aggregate "good" weather across the locations at whcih people are tweeting. I would wager that it correlates quite nicely.
Then I would, once again, question why the hell people live in gloomy places when most of them enjoy sunshine (and lollipops?)
Edit: At first it seems we are trending down in happiness, but surely there is some bias here. One thought is that the number of new or fresh twitter users has declined, and so the overall eagerness and excitement around tweeting has declined as well.
Very good, I was considering doing a similar experiment, but possibly grouping happiness by country.
And I was also thinking some way of determining what triggered the expression of happiness, you know like , 'I am happy now, I just did a skydive' etc..
“How will you deal with context?”
We are currently developing a principled method to identify
relevant phrases, for example to deal with the multitude of both
positive and negative uses of profanity. We expect to be scoring
phrases instead of words, where appropriate, in the near
future."
Also, they mention elsewhere that they use a bag of words classifier, so they can't really account for things like this easily.
I'm very surprised how easy it is to bring something down by just posting a link on HN. Simple things like blogs, but also most weekend projects don't hold up. I've never had anything going down due to HN load and also never spent a dime on protecting it. Good code, a fetish for saving CPU cycles and anticipation is what it takes.
The chart is not displayed if using Firefox. Switched to Safari and it works perfectly.
It's a bit of a shame that most demos now are only focusing on webkit/blink browsers... Please guys, think of folks like me who use the best browser in term of Freedom ;)
> It's a bit of a shame that most demos now are only focusing on webkit/blink browsers... Please guys, think of folks like me who use the best browser in term of Freedom ;)
Chrome here, wasn't displaying.
Edit: At home now, tried with Firefox, that works.
I'm not that surprised by the low score of 5/2/11, because double negatives are hard. It's actually hard to separate sentiment when interactions (such as double negatives) come up:
"A bombkilled a terrorist." Good news. Three negative words.
Many psychologists believe that the quickest parts of the human brain don't process double negatives at all-- that's why thinking "this is not going to kill me" doesn't help during a panic attack, but "this will end" does-- which is probably a small part of why news-watching (even positive news like Osama's death) makes people unhappy.
"My bestfriend has been killed by a heartattack." Three positive words. One negative. One (attack) that is slightly-negative but has energetic/positive connotations.
"My best friend defeated cancer" vs. "Cancer defeated my best friend." Similar tokens; opposite meanings.
What really surprises me is that it seems contrary to economic trends: in late 2008, when the economy was going to hell, the sentiment average goes up in a major way. Across 2009-13, while the economy slowly recovers, the sentiment level declines. Day-of-week average differences are very slight, but the more people are working, the more unhappy they are. This could mean that structurally unemployed people are self-deceptive, or tweet happier things because they have more time per tweet, or it could genuinely mean something.
His research specifically mentions the classification of n-grams as a future area of work. (I have been working on implementing his work in python, so I have been diving into this over the past few weeks.)
> “Why does the day of Osama Bin Laden’s death have such a low happiness score?”
> Many people presume this day will be one of clear positivity. While we do see positive words such as “celebration” appearing, the overall language of the day on Twitter reflected that a very negatively viewed character met a very negative end. It was a day of complex emotion which is best explored in the word shift for the day, rather than the single number of its average happiness.
That's.. a pretty dodgy statement. A more straightforward and honest explanation would have been "We use a bag of words approach which relies on the assumption that people use negative words when they feel down and positive words when they feel good. This sort of models gives good results in most simple cases, but doesn't handle complex cases, and can't take into account that people sometimes use the words "death" and "dead" in a positive way."
There's nothing wrong with using these sorts of basic models. That's what pretty much everyone who provides sentiment analysis does, and it's good enough for most cases. But there's no need to hide the limits of the system either.