Show HN: We just launched a Bayesian-Based Sentiment Tracker

toast76 · on Nov 20, 2010

Call me crazy, but things going UP is generally good. I don't see how you can logically describe something as having an "increase in worsening satisfaction".

"AT&T is ranked 1st out of 244 brands"... they must be AWESOME...oh wait, no they're not.

It says it's a customer satisfaction index, when it's actually a customer DISsatisfaction index.

Also, I don't mean to sink the boots in but... "that is awesome that you got SVU to discuss the boycott of the pedophile book on Amazon! I cannot wait to see how it goes!" Is that a good comment or a bad comment?

random42 · on Nov 20, 2010

This is not how sentiment analysis work (or should work). I worked on something similar (Naive Bayes based sentiment analyzer https://github.com/mohitranka/TwitterSentiment). I also work for a company which is in the same space as groubalcsi.com (Brand/Product opinion mining)

Sentiment analysis is not a classification problem (like spam detection), but it is an identification problem, because sentiments are always associated with an entity (and attribute, if specified).

For example, a tweet saying, "Dell is not as good as apple" requires to identify entities (Dell and Apple) and associate sentiments to them (Negative and Positive, respectively). It is incorrect to try to associate sentiment (whatever it may be) to the tweet itself.

rayval · on Nov 20, 2010

Interesting but possibly flawed exercise. It would be good to show the entire set of brands sorted from bottom (i.e., good) to top (i.e. bad).

I sorted the data and present here two groups:

1. This is a sample of supposedly the most satisfying, from the best on down (er, up): TGI Fridays, Best Western, Zenith Electronics, JVC, Chili's, Denny's, Hampton Inn, Olive Garden, Applebee's, Sams Club, Yahoo, AOL.

2. By contrast, here is a sample of some of the worst, listed from the top (high dissatisfaction) on down: Wikipedia, Apple, Nokia, Facebook, Volkswagen, YouTube, Amazon, Nike, Sony, Ikea, Range Rover, Rolex, Porsche, Google, Netflix, Louis Vuitton, CNN, American Express. Wall Street Journal, Intel.

Group 1 and 2 do not overlap in their scores.Meaning that Intel (the best of the worst) is at 404, with a higher dissatisfaction rating than AOL (the worst of the best).

This grouping does not make sense to me, because if you showed me the two lists above and asked which of these two sets had better satisfaction scores, I would have picked Group 2 over Group 1.

What could explain this? Perhaps there is demographic skew, in that down-market brands (Dennys, Sams Club, Zenith) are not talked about as much among upscale social media people, who would rather complain about Apple, Sony, and Porsche.

Or perhaps there is a mismatch of expectations. People expect the premium brands to deliver more, and complain loudly when they fall short in the slightest. And conversely, perhaps people expect a mediocre experience with downmarket brands.

tel · on Nov 20, 2010

What are the units of dissatisfaction used throughout the page? How do they map to the y-axis of the dissatisfaction graph? What sense of scale do I need to have to understand the units? Is a 945 bad? How bad? Is hate linear? Since AAPL scores roughly half as much as AT&T does that mean that the average Twitterer hates AAPL half as much? What happens if someone scores a perfect 1000? Can they be hated no further?

What time zone is the next update measured in? What makes your classifier 'Bayesian' besides just using something called a 'Naive Bayes Classifier'? What is the 90% accuracy determined from? Why should I care? Is a 24-hour improvement in customer satisfaction a significant thing? How quickly does hate fluctuate? What is your uncertainty in each of these measurements? Is there an overall brand hate level that I can compare these things to? How are they affected by overall sentiment toward companies?

    ------

It's an interesting complementary site to your primary interest in Groubal. I'm just a skeptic to methods in sentiment analysis in general. To analyze data properly is very hard. Applying tools to observe what happens is still interesting though.

But I'm not sure I learned a whole lot to see graphs proclaiming that Twitterers dislike AT&T, Time Warner, Banks, Internet Providers, and Zynga. Tylenol and Enterprise were interesting to find though. I have no idea what it means for Tylenol to be 100 units less hated though.

So perhaps what you should tune your ML stuff to seek out is not just some difficult to quantify measure of dissatisfaction, but instead look for things like Tylenol and Enterprise where people might not expect themselves to have such trouble with the brand. In such a case, it becomes automatic, insightful rabble-rousing instead of methodologically sparse hate-ranking.

spxdcz · on Nov 20, 2010

If anyone's interested, we're using the Google Graph API for all the graphs (the spark lines and the big transparent ones at the top), and the Bayesian stuff is based on the PHP work I wrote up here: http://danzambonini.com/self-improving-bayesian-sentiment-an...

EDIT: Also, we're not really using it yet, but I thought it was interesting how you can also easily calculate the 'agreement' on sentiment by using the MySQL STDDEV function (or similar) to work out the variance in sentiment.

acangiano · on Nov 20, 2010

Just FYI, clicking on a company name that has been removed, leads to a 403 error (e.g., Rogers).

Also, I sort of broke the site by passing a non-existing company name (e.g., http://www.groubalcsi.com/company/ted).

spxdcz · on Nov 20, 2010

Thanks so much - I knew I could rely on HN'ers to find these things. I'm hoping the Rogers one is a one-off (we're just adding it this morning), but I'll double check all of this. Thanks again.

EDIT: just fixed the 'TED' (company doesn't exist) issue. Thanks!

EDIT2: just fixed the Rogers issue too. Thanks! (Plus, I love Coda for making my life easier/faster for versioning and uploading changes!)

physcab · on Nov 20, 2010

pretty cool. I would change high meaning bad and low meaning good, unless its a rank out of the total. Its a bit counter-intuitive. Why do you just place an emphasis on dissatisfaction instead of giving the option to look at both?

spxdcz · on Nov 20, 2010

Thanks for the feedback.

Yeah, certainly the 'high = bad' thing is something we grappled with (and still do). The site is a sister-site to a consumer-complaint/petition website (http://www.groubal.com/), hence we're more interested in measuring/highlighting who is doing 'badly'. But yes, this could be done in a more intuitive way (showing the 'bottom' of a graph that had the axis in the traditional orientation, for example).

SkyMarshal · on Nov 20, 2010

I think you only need to change the wording to clarify it:

'Lowest Satisfaction' => 'Most Disatisfied' or 'Highest Disatisfaction' or 'Most Complaints'

'Customer Satisfaction Index' => 'Customer Disatisfaction/Complaint Index'

'High scores are negative' => 'High scores indicate high disatisfaction' or 'Increasing scores indicate increasing disatisfaction'

High scores bad, low scores good.

Link 'high' and 'increasing' with 'bad' and 'disatisfied' a little more explicitly and consistently, since most people make the opposite association.

mdda · on Nov 20, 2010

Amount of dissatisfaction is too mushy IMHO. Just title it "Crapometer" or "Hate-o-meter".

Better yet, just flip the Y-axis. I'd think that it would be easier to get a company to pay to improve upwards. Do you really have to match the sister site?

spxdcz · on Nov 20, 2010

Thanks - I like good, easy suggestions - the best kind!

jhamburger · on Nov 20, 2010

Something came to mind that will skew this heavily. People will mention a company by name for mainly one of two reasons, either to complain or to tell people about some cool new thing. If someone mentions a ubiquitous company like Google, Verizon, etc, it's usually to complain. They're probably not telling the world about the wonders of Google search. On the other hand, if someone mentions a smaller company it's probably the cool-new-thing factor.

endtime · on Nov 20, 2010

Are you sure about that distinction?

"Google builds robotic car"

"Why Groupon is terrible for merchants"

DeusExMachina · on Nov 20, 2010

Is the time span so short just because of initial lack of data? If not, I think it would be useful to change the span of the graph to more than a week, to understand the long term trend. Fore some of the lines I see high fluctuations, so the graph is not so meaningful.

jhamburger · on Nov 20, 2010

I like how "iHop" is capitalized as if it were Steve Jobs' take on the pogo stick.

rhizome · on Nov 20, 2010

I don't know if it's intentional, but I would expand your scope beyond complaints. If your math is good you could have a very nice reputation tracker and analytics package in general.

spxdcz · on Nov 20, 2010

It picks up on any Twitter/FB updates that we can measure a discerning 'sentiment' for, so although it's limited to companies/brands at the moment, it could certainly be used for other things.

rorrr · on Nov 20, 2010

Google is #7 worst web company? http://www.groubalcsi.com/sector/web-based-services

BMW is the worst in motor vehicle? http://www.groubalcsi.com/sector/motor-vehicles

spxdcz · on Nov 20, 2010

Just an example, some of the latest sentiment on Google, that would suggest why it hasn't got a great ranking (though also, it's not a terrible ranking):

Ok, my phone or Google Voice is unable to pick up calls when I press '1' to accept, and I can't connect via Gizmo5. WTF?

Google gps sucks all of a sudden

WTF? I open up Google and the first thing i see is "Will Justin Bieber get naked for Love Magazine?" and im like WHAT???