Well, correlation certainly isn't causation, but being a data-nerd, I had to take a look. I grabbed Anne Hathaway's web presence from Google Trends, and Berkshire-Hathaway's performance from Google Finance. After a little massaging, I dumped the numbers in to R, and there is indeed a reliable correlation between Anne's (per-week) web presence and B-H's weekly average close (or closest preceding close).
data: hath$AnneTrend and hath$BerkClose
t = 4.6739, df = 373, p-value = 4.135e-06
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.1372140 0.3286587
sample estimates:
cor
0.2352165
It's an R^2 of 0.23, which isn't much, but it's not completely spurious; Scarlett Johansson doesn't show the same pattern (there's a weak correlation, but not reliable at p < 0.05).
Pearson's product-moment correlation
data: hath$ScarlettTrend and hath$BerkClose
t = 1.7687, df = 373, p-value = 0.07775
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.01016440 0.19071018
sample estimates:
cor
0.09120053
However, Scarlett's career hasn't mirrored Anne's all that well. What about someone a little closer to Anne, say, her Oscar co-host, James Franco? Note that apart from that, the two haven't really done anything together (http://imdb.to/g6vwbp), although I'd say they've both come to fame recently.
data: hath$JamesTrend and hath$BerkClose
t = 4.5991, df = 372, p-value = 5.826e-06
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.1336854 0.3256934
sample estimates:
cor
0.2319475
And that might be the clincher. James's little ups and downs shouldn't match Anne's, and yet his recent rise to prominence seems to share the same, slight relationship to B-K's stock value. Thanks for reading!
Is there life on Mars? I talked to an employee at NASA, and he said, "Probably not, but I guess its not out of the question". Then I wrote a story about it.
Dang! by just reading the headline, I was about to change my name to Goldman Sachs. Heck I'll do it anyways. With name like Goldman you can't go wrong.
Oct. 3, 2008 - Rachel Getting Married opens: BRK.A up .44%
Jan. 5, 2009 - Bride Wars opens: BRK.A up 2.61%
Feb. 8, 2010 - Valentine's Day opens: BRK.A up 1.01%
March 5, 2010 - Alice in Wonderland opens: BRK.A up .74%
Nov. 24, 2010 - Love and Other Drugs opens: BRK.A up 1.62%
Nov. 29, 2010 - Anne announced as co-host of the Oscars: BRK.A up .25%
I'm not a statistician, but 6 dates could hardly be considered a correlation. I would also think that no Anne Hathaway news should result in no positive price changes for a correlation to exist.
Seems like the article confuses correlation with "funny coincidence".
And it certainly doesn't "drive" the stock, as the headline states. That implies a causation, which is even stronger than a correlation.
I would also think that no Anne Hathaway news should result in no positive price changes for a correlation to exist
That's not correct. Imagine the stock price can be derived from two factors A and B such that price = A + B. A is correlated with the price, but a positive price change could also happen caused by B. So no change in A does not guarantee no change in price. And in the case of BRK.A there would be many such factors.
well, if you look at the chart, BRK.A has been climbing that whole time. someone less lazy than me could count up how many days it went up, and how many days it went down and give you a a good number. let's just say there are 100 down days, and 200 up days.
what are the chances of selecting 6 up days without replacement? 200/300 * 199/299 * 198/298 * 197/297 * 196/296 * 195/295 ~ 0.085
(i'm also not a statistician, and i'm not real sure i did that right.)
No it does not. The linked article relies on a naive assumption about NLP and is devoid of any content.
To substantiate: entity resolution [1] is a widely studied research area in NLP, and it's fair to say that researchers aren't as naive in their methods as the linked article would suggest.
But even if naive "NLP" methods were being used, it would be silly not to mine the link structure of news articles (or sources) to disambiguate Anne Hathaway and Berkshire Hathaway [2]: there aren't too many news sources that mention both in the same article, and are outweighed by the ones that mention only one or the other (e.g., Us weekly vs. WSJ).
But even if there were some habitual false positives, no researcher or programmer worth their salt would unleash their mining algorithm on the whole web. More likely, you would target low-latency news sources like Bloomberg's financial feed.
So I would say, no -- anne hathaway news almost certainly does not drive berkshire's stock if text mining and NLP are the methods, and shame on the Atlantic.
It doesn't have to be NLP gone wrong. It could be a bad algorithm based on web traffic, picking up accidental clicks from Anne Hathaway fans (or some similar shared metric).
It's also possible that this is a natural effect of the market -- that people seeing Anne Hathaway headlines are subconciously reminded to do something about their stock holdings. In which case, bad NLP would actually work better than good NLP. :)
Most algo trading is going to use Bloomberg or Reuters data, since they are very "clean" datasources that tend to be more focused on finance sectors, and not a ton of hollywood gossip. Generally they are tagged with the actual stocks in the articles, so you would have to be doing it on purpose.
Sorry, I mean an algorithm based, for example, on the number of clicks on Berkshire Hathaway articles on a financial site being influenced by Anne H. fans clicking on the wrong search result.
Most, yes, but since everyone is doing this, to stay ahead, you have to do something else if you want to do better than them. Hence mining the "regular" internet instead.
However, it only takes one trader with a naive NLP approach to make "Ann Hathaway" news legitimate Berkshire Hathaway news. i.e., one dumbass with a big enough bankroll buying on Ann Hathaway news makes Ann Hathaway news relevant to the market.
I'm not ruling out malice, but wouldn't you say that naivete would be a fairly unlikely reason for this at a place with a big enough bankroll? I've met some hedge fund types and they may be many things, but they're not idiots.
The headline case seems highly unlikely, but there is a fund in London that trades on Twitter sentiment, an idea which sounds like something out of a Bruce Sterling satire. You can always sell shovels to traders and the media, though - hire a few PhDs and a graphic designer and crank out some infographics and popular attention is assured. Emphasize risk and caution regularly and sooner or later events will prove you to be a genius, at which point you can do an IPO.
BRK has a market cap of over 200 billion. To make price move significantly, there would need to be pretty big trades, no? If huge trades are made on the basis of random names mentioned online, that could explain a lot of our troubles with capital allocation...
It seems to me that if one algorithm did this, other algorithms would start to notice a correlation between Anne Hathaway news and BRK stock, so they would also want to buy when there is Anne Hathaway news. Something like this could easily "infect" large portions of NLP that way if there is insufficient damping in the system.
Coincidence strikes again. Someone should convince journalists that correlation is not causation; not even when some other random fact happens to suggest a connection to our overly enthusiastic pattern-seeking minds.
data: hath$AnneTrend and hath$BerkClose t = 4.6739, df = 373, p-value = 4.135e-06 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.1372140 0.3286587 sample estimates: cor 0.2352165
It's an R^2 of 0.23, which isn't much, but it's not completely spurious; Scarlett Johansson doesn't show the same pattern (there's a weak correlation, but not reliable at p < 0.05).
Pearson's product-moment correlation
data: hath$ScarlettTrend and hath$BerkClose t = 1.7687, df = 373, p-value = 0.07775 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.01016440 0.19071018 sample estimates: cor 0.09120053
However, Scarlett's career hasn't mirrored Anne's all that well. What about someone a little closer to Anne, say, her Oscar co-host, James Franco? Note that apart from that, the two haven't really done anything together (http://imdb.to/g6vwbp), although I'd say they've both come to fame recently.
data: hath$JamesTrend and hath$BerkClose t = 4.5991, df = 372, p-value = 5.826e-06 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.1336854 0.3256934 sample estimates: cor 0.2319475
And that might be the clincher. James's little ups and downs shouldn't match Anne's, and yet his recent rise to prominence seems to share the same, slight relationship to B-K's stock value. Thanks for reading!
(edit: Ugh, awful formatting! Apologies.)