No it does not. The linked article relies on a naive assumption about NLP and is devoid of any content.
To substantiate: entity resolution [1] is a widely studied research area in NLP, and it's fair to say that researchers aren't as naive in their methods as the linked article would suggest.
But even if naive "NLP" methods were being used, it would be silly not to mine the link structure of news articles (or sources) to disambiguate Anne Hathaway and Berkshire Hathaway [2]: there aren't too many news sources that mention both in the same article, and are outweighed by the ones that mention only one or the other (e.g., Us weekly vs. WSJ).
But even if there were some habitual false positives, no researcher or programmer worth their salt would unleash their mining algorithm on the whole web. More likely, you would target low-latency news sources like Bloomberg's financial feed.
So I would say, no -- anne hathaway news almost certainly does not drive berkshire's stock if text mining and NLP are the methods, and shame on the Atlantic.
It doesn't have to be NLP gone wrong. It could be a bad algorithm based on web traffic, picking up accidental clicks from Anne Hathaway fans (or some similar shared metric).
It's also possible that this is a natural effect of the market -- that people seeing Anne Hathaway headlines are subconciously reminded to do something about their stock holdings. In which case, bad NLP would actually work better than good NLP. :)
Most algo trading is going to use Bloomberg or Reuters data, since they are very "clean" datasources that tend to be more focused on finance sectors, and not a ton of hollywood gossip. Generally they are tagged with the actual stocks in the articles, so you would have to be doing it on purpose.
Sorry, I mean an algorithm based, for example, on the number of clicks on Berkshire Hathaway articles on a financial site being influenced by Anne H. fans clicking on the wrong search result.
Most, yes, but since everyone is doing this, to stay ahead, you have to do something else if you want to do better than them. Hence mining the "regular" internet instead.
However, it only takes one trader with a naive NLP approach to make "Ann Hathaway" news legitimate Berkshire Hathaway news. i.e., one dumbass with a big enough bankroll buying on Ann Hathaway news makes Ann Hathaway news relevant to the market.
I'm not ruling out malice, but wouldn't you say that naivete would be a fairly unlikely reason for this at a place with a big enough bankroll? I've met some hedge fund types and they may be many things, but they're not idiots.
To substantiate: entity resolution [1] is a widely studied research area in NLP, and it's fair to say that researchers aren't as naive in their methods as the linked article would suggest.
But even if naive "NLP" methods were being used, it would be silly not to mine the link structure of news articles (or sources) to disambiguate Anne Hathaway and Berkshire Hathaway [2]: there aren't too many news sources that mention both in the same article, and are outweighed by the ones that mention only one or the other (e.g., Us weekly vs. WSJ).
But even if there were some habitual false positives, no researcher or programmer worth their salt would unleash their mining algorithm on the whole web. More likely, you would target low-latency news sources like Bloomberg's financial feed.
So I would say, no -- anne hathaway news almost certainly does not drive berkshire's stock if text mining and NLP are the methods, and shame on the Atlantic.
[1] http://en.wikipedia.org/wiki/Name_resolution#Name_resolution... [2] (PDF) http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.66....