Some reasons here: http://fastml.com/yann-lecuns-answers-from-the-reddit-ama/

ewjordan · on Sept 2, 2016

"Ask them what error rate they get on MNIST or ImageNet"

While I agree that Numenta probably doesn't have any sort of full-fledged AI, the human brain does terribly on MNIST and ImageNet compared to the state of the art. So we would fail that test.

Getting stuck on toy problems like ImageNet and overoptimizing solutions that can't possibly be applied more generally (except as dumb preprocessors) is not likely to lead in the most interesting directions, even if it's incredibly useful and profitable in the meantime.

argonaut · on Sept 2, 2016

Humans appear to do quite well on ImageNet (anecdotally, one person got 5.1% error: http://karpathy.github.io/2014/09/02/what-i-learned-from-com...). Of course there are recent deep models that do better than that, but the author opines (and I agree) that an ensemble of trained human annotators would do better than the best deep models.

MNIST is the true toy dataset (doesn't really tell you much about your algorithm's performance) - while there aren't any reported human evaluations of MNIST, LeCun estimates the human error rate is 0.2% - better than any deep models (admittedly without justification: http://yann.lecun.com/exdb/publis/pdf/lecun-95a.pdf).