Neural Networks making a come-back?

hooande · on May 1, 2011

The OP is including a lot of different concepts under the neural network umbrella. Things like restricted boltzmann machines and hierarchical temporal memory are technically neural networks, but many computer scientists would consider them to be different enough in approach to think of them separately. ie, you wouldn't say "let's use a type of neural network to solve this problem" you would probably say "let's use a restricted boltzmann machine".

It is true that these things are becoming more popular. I've found in practice that a modern computer scientist is still more likely to solve a simple learning problem with some form of regression, if only because it's faster than training a NN.

srean · on May 1, 2011

Restricted Boltzmann machines are as bonafide neural networks (NN) as you can get and has been around since the golden age of neural networks. You have the same layered structure, feed-forward connections, same "squashing function". The only "restriction" is that the unknown and the known nodes must live on different layers so that the connections are only between a known and an unknown node. It has been called different names and the theory explaining them has had different names too, for example "Harmony Theory" of Smolensky.

I think NN is a broad enough category that no matter what you want to use or describe, you will have to qualify your "lets use blah" statements with a particular kind of neural network. Similar in spirit to statements like "lets use a parser" vs "lets use a LALR parser".

But back to the topic of new found interest on NNs, part of the reason is that there have been new developments in training algorithms which work significantly better than what were used traditionally. With these methods NNs require far less baby-sitting. NNs traditionally really required a huge lot of that.

The other reason is that sheer scale and size of the data sets that are available now, have forced machine learners to move from powerful but batch optimization algorithms (quadratic programming for instance) to simple and online gradient based algorithms that have been the forte of the NN community all along.

Training a NN is no different than regression. It is another name/technique for (some what systematically) creating a tower of increasingly complex regression functions. If the simplest(linear) one works, its imperative that one uses the simplest one in the interest of good predictive accuracy on unseen data. Bundled together with the low training time that parent mentioned, its a win win.

basman · on May 2, 2011

In my experience, the reason is less due to speed and more because there's a perception that training algorithms for things like RBMs still involve a certain amount of "black magic" in tuning parameters, deciding when it's converged, etc, in contrast to linear/logistic regression or support vector machines, where you can basically turn a crank and get an answer out.

aksbhat · on May 1, 2011

Rather than using Google scholar, I would suggest looking at papers in ICML, NIPS and Journal of Machine Learning Research.

For vision based research I would suggest CVPR and ICCV conference and IEEE Pattern Analysis and Machine Intelligence journal.

yaroslavvb · on May 1, 2011

Make these stats and I'll link them from the page :)

geuis · on May 2, 2011

I have this impression that there are many more startups popping up that generally operating under the AI umbrella. Companies are doing things relating to medicine, finance, etc. There's even a couple on the May Who's Hiring post here on HN. Quite an exciting time. Seems like the AI Spring is in progress and we're coming up on Summer.

csomar · on May 1, 2011

I don't think this is significant at all because of the information boom. As the Internet and Storage has boomed and become more cheap and available, more content and data is hosted on the web. More and more is indexed by Google. May be we need to adjust the stats graph that are based on Google Scholar with the inflation of data in the last decade.

mreid · on May 2, 2011

Maybe I missed something but I don't understand this criticism. The author states that 'The number of hits for each year was divided by the number of hits for "machine learning".' Wouldn't that control for the inflation of data in exactly the way you are proposing?

ohashi · on May 1, 2011

Is there much commercialization of neural networks going on? I own NeuralNetwork.com but am not suited to turn it into a business. If anyone is in the space and is interested in using the category killer name, I'd love to talk.

jal278 · on May 1, 2011

I work with NNs in a research context and am working on a startup commercializing some research-related NN technology. What price range were you thinking?

ohashi · on May 7, 2011

You don't have any contact info in your profile. Could you contact me via mine?

jal278 · on May 8, 2011

my email is jlehman at eecs dot cs dot ucf dot edu. the captcha on your blog wouldn't work for me -- I see a recaptcha in the html, but it wasn't displaying in either firefox or chrome for me

ohashi · on May 11, 2011

Your email bounced for me. Try my username at gmail.

ComputerGuru · on May 1, 2011

Seeing as its without the trailing 's', I think a lot of people will end up going elsewhere......

daoudc · on May 1, 2011

Interesting that the trend isn't replicated on the web... yet: http://www.google.com/trends?q=neural+networks

I wonder if there is a lag between academia and the web?

FrojoS · on May 1, 2011

I also find it surprising to see in which countries the term is most popular according to g-trends. None of the top 10 is part of the G8. Also English is only the third most frequent language. I suppose they are lagging behind, but I might be wrong. At my Uni, the only course that had some focus on NN felt pretty outdated.

ignifero · on May 1, 2011

Might this be due to the explosion on neuroscience publications that are not related to artificial neural networks? There haven't been major breakthroughs in ANNs lately

dododo · on May 1, 2011

in machine learning, the 'big' thing about neural networks at the moment is deep learning via some sort of unsupervised pre-training. there's be a lot of work published on this recently.

http://deeplearningworkshopnips2010.wordpress.com/

yid · on May 1, 2011

Hmm, could you give us a TL;DR on how this differs from model selection in statistics, e..g, choosing the structure of an HMM?

apu · on May 1, 2011

A good intro page with lots of links: http://deeplearning.net/tutorial/

A lot of the newest work is under a few different names: "deep learning", "convolutional deep networks", "unsupervised feature learning", etc.

Here's a great talk by Andrew Ng of Stanford, who's a recent convert to this area: http://www.youtube.com/watch?v=ZmNOAtZIgIk

(Note that this is quite a one-sided view of things, but it does convey the excitement of deep-learning researchers and the potential of what might be possible.)

Although I'm not in deep learning myself (I'm a computer vision researcher), here's a TL;DR as I understand it: rather than having people in specific domains such as computer vision or speech processing create their own features, the idea is to take raw inputs (pixels in the case of images) and train multi-layer neural net architectures that "learn" the relevant higher-level features in an unsupervised way (i.e., without labeled training data). Some of these seem to be pulling out interesting features and perform competitively on a few benchmarks in vision and other fields.

I'm not sold on this yet, because it seems like the complexity of designing features has merely been traded in for the complexity of designing different learning architectures, but it's certainly becoming quite popular these days (mostly led by Geoff Hinton of Toronto, Yoshua Bengio of Montreal, and Yann LeCunn of NYU).

yid · on May 1, 2011

Thanks for that, definitely some prominent folks involved.

nivertech · on May 1, 2011

Is this the same as non-parametric learning?

apu · on May 1, 2011

No, non-parametric learning is an unrelated topic, which seeks to estimate probability distributions without using "parametric" (= having known structure) models. The canonical example of a non-parametric approach is using histograms, or their generalization, called Parzen Windows.

http://en.wikipedia.org/wiki/Parzen_Windows

basman · on May 2, 2011

That's an orthogonal distinction. The methods thus far have typically been parametric, in that there's a fixed network topology and the learning algorithm adjusts the (fixed set of) weights on the edges. There's no reason, though, why you couldn't have a nonparametric version that adaptively chose the number of hidden nodes in the networks and the connectivity structure.

albertzeyer · on May 1, 2011

Also, take a look at some of the work by J.Schmidhuber (http://www.idsia.ch/~juergen/). For example, the LSTM architecture is quite powerful. And he very successfully applied those RNN models to a wide number of areas (handwriting recognition, robotics, ...).

larryfreeman · on May 1, 2011

Are you sure about this? I'm not an expert by any means but when I participated in the Netflix Prize a few years, there seemed to be lots of investigation about different variations of neural networks.

For example, the Restricted Boltzmann Machine (http://en.wikipedia.org/wiki/Boltzmann_machine#Restricted_Bo...), as far as I understand it, seems to be a variation of neural networks.

If you can post a link to an article that covers recent work in the area and explains why they none of them are breakthroughs, I'd love to read it.

yid · on May 1, 2011

The backpropagation neural network itself is a variation of many nonlinear, additive statistical models. The field emerged as an independent entity largely because it had "cool" neurological connotations and was largely ignored by statisticians. The Netflix prize was ultimately won using linear algebra with sensible starting values (the KDD paper that describes the winning method is remarkably easy to read).

dododo · on May 1, 2011

backpropagation is just the chain rule of differentiation. each layer of an ANN is logistic regression, which statisticians have been and continue to be interested in.

i'm not sure your caricature of the netflix winning solution is correct: i believe it was a blending of around 25 different models (including, i think, the RBM someone pointed about above) each in themselves quite varied from one another. this is typically how these challenges are won.

yid · on May 1, 2011

A lot of the top teams had blended submissions, but my characterization was not incorrect: http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5197...

elq · on May 1, 2011

I work on the cinematch team at netflix... I've reviewed all of the winning code... your characterization is incorrect.

The netflix prize would not have been won when it was without rbm.

pdhborges · on May 1, 2011

Is http://www.commendo.at/references/files/kdd08.pdf (this) the paper you are talking about?

yid · on May 1, 2011

Nope, it was by Bell and Koren.

This one is related, but not the one I originally mentioned: http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5197...

I was at the talk but can't for the life of me remember the title, sorry.

ignifero · on May 1, 2011

not an expert either, and indeed, the Hinton papers on RBMs are the last thing i remember reading about. But their invention dates back to the 80s. I 'm just noticing that if you search "neural network" on google scholar for the recent years, you will find many neuroscientific results. Of course i dont have the data to back it up.

jkan · on May 3, 2011

Well, but he's conditioning on "machine learning" already (the little caption on the histogram says "P(neural network | machine learning"), so that most likely filters out the straight neuroscience papers.