"They’ve also found that the models tend to become more accurate the more data t...

ckaygusu · on July 17, 2014

If you think human brain is so sophisticated that it can perform its cognitive duties with little data, this is simply wrong. While it is definitely not a simple organic construct, it does get stimulated significantly all the time. See [1] what happens when you cut out this factors.

Regarding artificial systems, I think more data is the only way to reach super-performing classifiers. The data you supply doesn't have to be big but at least the data you extract from raw data should be big. For example, a method called Integral Channel Features [2] is designed to act in such a way.

[1] http://en.wikipedia.org/wiki/Sensory_deprivation

[2] http://pages.ucsd.edu/~ztu/publication/dollarBMVC09ChnFtrs_0...

jfields513 · on July 17, 2014

Your first statement here is not true.

Humans are excellent at learning from very few or even 1 example. Show a toddler a single image of an elephant and the toddler will generalize perfectly on new examples; show a machine a few thousand images of elephants and it might generalize decently if your machine is really clever.

There are very few tasks where machine systems achieve anything resembling human level performance. But on all such tasks, the machine requires far more data and still underperforms.

daughart · on July 17, 2014

That toddler has already processed lots of visual image data, examples of objects, nonliving and living, animals, mammals, etc. Don't you think that constitutes a large, important dataset for the problem of elephant recognition?

jfields513 · on July 17, 2014

Yeah i agree. When the child is shown the labeled example of an elephant and infers the traits that make an elephant an elephant, her previous visual experiences provide background knowledge that restricts the space of hypotheses she considers. After all, there's an infinite set of logically consistent hypotheses.

Nevertheless, if you provide your machine system with the video of all the child's visual input, it still won't generalize well from single examples, the way children do effortlessly.

adamlett · on July 17, 2014

This reminds me of the saying: It took me ten years to become an overnight success.

Humans generalize well from few examples because, well, they've already processed billions of examples. A toddler may have never seen an elephant before, but it may have seen cars, trucks, birds, dogs, people, trees, skies, buildings etc, giving it concepts for bigness, smallness, aliveness, humanness and much else. With all these concepts in place, then yes it becomes easy to see what makes an elephant distinct from a dog or a person. And it would be too for an artificial neural network.

An interesting fact is that newborns have very few concepts to begin with. It takes some months for them for instance to learn to differentiate between alive and dead things (the family cat vs a teddy bear for instance).

valarauca1 · on July 17, 2014

It does but it's unsorted.then post processed.

If you show a child 1000 images or animals. Then show different photographs of animals. And tell the child what animal each animal photograph is, you can now go back to the original 1000 and the explained ones will likely be recognized dispute them never beig initially sorted, or modeled as such.

Going from 5-10 to 1,000,000 is what computers have a problem with. They go from 1,000 to 1,000,000 easily, or even million to billions.

trevorstrohman · on July 17, 2014

You're describing unsupervised training. It works with computers too, as in this article on using Google Brain to build an unsupervised image classifier. http://www.wired.com/2012/06/google-x-neural-network

valarauca1 · on July 17, 2014

Unsupervised training is the deep learning equal of context clues. It sees people talking about cats, sees an image guesses cat. Sees similar images + similar words, eventually builds a cat type prototype.

What I'm talking about is a child can see a photo of an unknown animal, I can show that child a cartoon elephant (which is the original animal). I then ask what the original animal is, the child likely responds correctly.

Reprocessing of already learned data as the scheme of the world changes based on new information.

fzltrp · on July 17, 2014

This is the ability to abstract concepts and then recognize them in different settings (for instance, the idea of a child being a miniature version of a given animal, with less pronounced traits). In order to understand the clues you are talking about, an AI has first to be familiar with the terms used in the discussed topic, so as to be able to construct a definition by itself (what is "miniature", "traits", "pronounced"). These terms' definition must be synthetized somehow before hand, or perhaps as the discussion goes, but then the amount of necessary information in that discussion must be much larger, for the AI to untangle them properly.

Houshalter · on July 17, 2014

Using unsupervised learning (the same as humans) there are machines that can learn from a picture of a single elephant. You first learn a compressed "representation" of images with fewer dimensions than an entire bitmap. Then you can compare that small vector to others.

snowwrestler · on July 18, 2014

I can only speak for my situation, but my daughter saw easily hundreds and maybe thousands of examples of elephants, including real live ones at the zoo, before she could ever seem to use the abstract concept to identify new examples.

robert_tweed · on July 17, 2014

You're sort of describing the problem of "over fitting", which is now very well understood in machine learning circles. That's when you get a model that describes it's training data very well, but doesn't generalise well.

The thing about using lots of data is that prior to publication of "The Unreasonable Effectiveness of Data" in 2009, most people did think that good algorithms were the most important thing. What that research showed was that a bad algorithm given more data will eventually outperform "better" algorithms, at least when those algorithms are initially judged based on their performance on smaller datasets.

So what happened with neural nets was that after some initial excitement about how they were more like the human brain, etc., it was found that "stupider" algorithms actually performed better and ANNs were written off for a while. It turns out that the reason NNs were performing badly was that they weren't being fed enough data.

Nowadays it's pretty easy to saturate a feed-forward neural network with data to the point where it's performance will never get much better. Deep learning techniques allow you to train bigger and more complex models with more data, but these more complex neural nets won't perform very well unless you feed them tons of data.

So in reference to your point about the brain, the thing about brains is that they actually learn based on massive amounts of data too. Think about how much data you have from continually streaming video ~16 hours/day, plus sound, plus touch, proprioception, and other inputs, over the course of many years.

Deep learning tries to emulate this to some degree with "pre training" which is where you feed lots of data into a deep network and have it learn "something" (it learns by itself at this stage). Then you start teaching it more complicated, high-level concepts. This pre-training allows it to do things like recognise common patterns in images, which the later training allows it to then associate with semantic ideas like "this is an apple", "this is a person", etc.

TL;DR: What seems to work best is fairly "dumb" algorithms, scaled up to be able to handle vast amounts of information and fed a ton of data to learn from. This is also how the human brain works.

nl · on July 17, 2014

No. The outcome is the goal.

It's rapidly becoming apparent that some algorithms (eg Deep Learning related models) work much better at scale than on small amounts of data. It doesn't make sense to discount these better algorithms because they don't work as well as other models when tested against less data.

It is also apparent that these models require significantly more computing power to perform well than other models. That doesn't make them less worthy, just a cost people must consider.

It turns out that intelligence is hard..

mendicantB · on July 17, 2014

That's another good point. The required computing power that is now cheap and widely available has changed our ability to even try these methods.

mendicantB · on July 17, 2014

Every model's output quality is dependent on the quantity of data it ingests.

Statistics developed as a science because of the need to overcome the weakness of large samples being expensive. Machine learning has taken off as a direct result of the field's ability to take advantage of and get serious performance gains from the massive amounts of data being generated and leveraged recently.

Here is the best summation I can reference, and I can tell you from personal experience it is very true:

"The accuracy & nature of answers you get on large data sets can be completely different from what you see on small samples. Big data provides a competitive advantage. For the web data sets you describe, it turns out that having 10x the amount of data allows you to automatically discover patterns that would be impossible with smaller samples (think Signal to Noise). The deeper into demographic slices you want to dive, the more data you will need to get the same accuracy."

http://www.quora.com/Big-Data/Why-the-current-obsession-with...

cliveowen · on July 17, 2014

I completely agree with this and, as I said, it's obvious that more data produces better predictions, even with simple models. My point is that it looks backwards to me putting effort into finding more and better data (creating a corpus for a given subject is a challenge in itself) instead of trying to come up with a model that infers more and produces better predictions with less data. Once you have such a model then you can surely collect and feed it a lot of data to improve the output, but until then, why even bother?

gipp · on July 18, 2014

It's not like they aren't trying to improve the model as well, all the time. It's just saying that right now the benefit of getting more data for existing (already very sophisticated) models is greater than the incremental benefits of model improvements given existing data.

mendicantB · on July 20, 2014

Bingo. More data beats better algorithms, see

http://anand.typepad.com/datawocky/2008/03/more-data-usual.h...

awj · on July 17, 2014

It may seem counterintuitive, but that's a relatively common result in machine learning. Often the issue isn't so much the quantity of data as the representative nature of your training data used to create the model. All other things being equal, a larger set of training data is likely to be more consistent with the true data. Caveats abound, but that's the general idea.

One way to think about it is to look at problems with human perception like forced perspective. It's relatively easy to create a situation where the only available information results in mental models that describe the size of an object incorrectly. Given a different point of view (i.e. more information) the faults in the model become obvious.

quarterwave · on July 17, 2014

How quickly does a model's accuracy and precision improve with increasing sample size? Indeed, there are counter examples (like the periodogram estimator for psd) where variance does not decrease with window length.

We often don't know which model to use. Occam's Razor [1] can be effective in favouring simpler models, but I tend toward the view that a good data scientist is invariably needed to build good models. Hence I view Big Data more as a consulting business than SaaS.

[1] For an excellent Bayesian discussion on why Occam's Razor actually works, see Chapter 28 of David J.C. MacKay's book 'Information Theory, Inference and Learning Algorithms'.

rwissmann · on July 17, 2014

Until you have reached a very large subset of all available information, more data allows you to make better predictions. Period. That is as true for machine learning as it is of the human brain.

You often want your models to also perform well when you have fewer data points. Those are two separate - if in effect related - design goals.

niangb · on July 17, 2014

This is true to the extent that you are not overfitting your dataset. Neural networks and random trees are quite good at fitting anything! And still they can perform poorly on your validation set.

MaysonL · on July 17, 2014

Overfitting will only occur when the dataset is too small for the model.

niangb · on July 23, 2014

Not only. Your example is too particular. I would say that overfitting tends to occur when one do not understand the underlying dynamic of a system you are trying to model. Any model with enough degrees of liberties can fit anything and still explain nothing.

jesuslop · on July 17, 2014

Possibly when a brain start getting redundant information its predictions start to peak in accuracy.

jakek · on July 17, 2014

https://static.googleusercontent.com/media/research.google.c...

cliveowen · on July 17, 2014

Doesn't this further prove my point? If you're saying that some tasks, like NLP, are too complex to tame and you should just throw more data at it, you're basically capitulating to complexity and taking the easier route. Isn't that the opposite of what researchers should be doing?

penguat · on July 17, 2014

There's no shame in taking a non-optimal path, and then working out how to get better at it later. Research is about solving problems, answering questions. Why should that have to be the hard way?

cbsmith · on July 18, 2014

Have you ever taught a kid to learn? I don't think there is a lot of evidence suggesting the human brain reliably learns without a number of data points...

Seriously, Norvig has been big on this since forever: the reality is that consuming large amounts of data with relatively subtle features tends to be one of the few areas where computers can easily outclass the human brain.

mathattack · on July 17, 2014

It would be great to handle problems with only a little data, but more is better is generally true. Also, the goal can be to calibrate the model with a lot of data, and then have it operate on small amounts of data.