Hmm, could you give us a TL;DR on how this differs from model selection in stati...

apu · on May 1, 2011

A good intro page with lots of links: http://deeplearning.net/tutorial/

A lot of the newest work is under a few different names: "deep learning", "convolutional deep networks", "unsupervised feature learning", etc.

Here's a great talk by Andrew Ng of Stanford, who's a recent convert to this area: http://www.youtube.com/watch?v=ZmNOAtZIgIk

(Note that this is quite a one-sided view of things, but it does convey the excitement of deep-learning researchers and the potential of what might be possible.)

Although I'm not in deep learning myself (I'm a computer vision researcher), here's a TL;DR as I understand it: rather than having people in specific domains such as computer vision or speech processing create their own features, the idea is to take raw inputs (pixels in the case of images) and train multi-layer neural net architectures that "learn" the relevant higher-level features in an unsupervised way (i.e., without labeled training data). Some of these seem to be pulling out interesting features and perform competitively on a few benchmarks in vision and other fields.

I'm not sold on this yet, because it seems like the complexity of designing features has merely been traded in for the complexity of designing different learning architectures, but it's certainly becoming quite popular these days (mostly led by Geoff Hinton of Toronto, Yoshua Bengio of Montreal, and Yann LeCunn of NYU).

yid · on May 1, 2011

Thanks for that, definitely some prominent folks involved.

nivertech · on May 1, 2011

Is this the same as non-parametric learning?

apu · on May 1, 2011

No, non-parametric learning is an unrelated topic, which seeks to estimate probability distributions without using "parametric" (= having known structure) models. The canonical example of a non-parametric approach is using histograms, or their generalization, called Parzen Windows.

http://en.wikipedia.org/wiki/Parzen_Windows

basman · on May 2, 2011

That's an orthogonal distinction. The methods thus far have typically been parametric, in that there's a fixed network topology and the learning algorithm adjusts the (fixed set of) weights on the edges. There's no reason, though, why you couldn't have a nonparametric version that adaptively chose the number of hidden nodes in the networks and the connectivity structure.