Deep Learning - The Biggest Data Science Breakthrough of the Decade

stiff · on March 14, 2013

I am sorry, but does anyone else have the impression that a lot of people is commenting here with so much confidence while clearly not knowing anything about the topic? It takes almost an equivalent of an undergraduate mathematics degree and than a lot of experience in ML itself to get a decent understanding how things like Deep Belief Networks work, so I don't wonder none of the comments so far hinted at any understanding of anything particular about deep learning, just general derogatory comments "not used in industry", "overhyped", pointers to whatever someone heard in an undergraduate ML class on older types of networks etc.

Maybe if you don't have anything on topic to say, just do not comment? You really are not obliged to have an opinion on everything.

(Waiting for the downvotes)

rm999 · on March 15, 2013

>not used in industry

Well, that ones about me. Yes, I have a plenty of experience in machine learning, including undergraduate research in neural networks, a graduate degree in machine learning, and more than five years of industry experience (including several years building some of the most utilized neural network models in industry). I have read many of the deep network papers in detail, and have played around with them on actual data.

And yes, I think your comment deserves to be downvoted; unlike those of us with insight into the issue you added nothing to the discussion other than derision. It bothers me that comments like yours end up at the top of so many threads like this.

edit: and I'd like to point out as someone in the industry I have a good reason to temper expectations. Undeserved hype leads to bubbles, and bubbles create collateral damage when they pop. The AI industry has dealt with this at least twice already, and I don't want to see it happen again. The results so far are extremely exciting, but deep networks still need to prove they deserve the hype.

stiff · on March 15, 2013

What kind of insight into whether deep networks are interesting or not does the fact that they are (supposedly) not used in the industry bring to the table? To me, insight would be something like "they do not generalize well unless you have really a lot of data" or anything concretely related to the research done on them.

By the way, I am not asking anyone what their degree is, I am just asking to bring arguments and experiences or stay neutral.

rm999 · on March 15, 2013

You missed the point of my comment. I was speaking about the hype, I wasn't putting down deep networks or making a technical analysis of the algorithms. Anyway, here are my major reservations, including my technical analysis. You're oddly demanding in a casual non-technical online discussion about a topic you don't seem qualified in, so if this isn't enough for you we really aren't going to find common ground.

There isn't great library support for deep networks. This is a big deal, I don't want to spend tons of time building my own library or working with buggy/poorly supported/infant libraries. In production systems we prefer extremely well-established libraries that work in our language/environment of choice. Also deep belief networks are a couple orders of magnitude slower than linear models (probably the most commonly used type of model in industry). They require more parameter selection. They're not even useful for a lot of tasks - if I'm already spending tons of time building useful features (often a requirement in industry for non-technical reasons, like reporting or legal constraints) deep networks aren't going to be very useful. Much of their utility is taking raw, unstructured data and creating useful features for a supervised model. You can't easily interpret them as models unless you are working with visual data, they are a black box.

cscurmudgeon · on March 14, 2013

Instead of doing an obvious ad-hominem (Yes :D, this is one of the times it is genuine!), can you address what the other commenters are pointing out?

I have degrees in Physics, EE and a PhD in CS. I munched axiomatic set theory and infinite ordinals during the course of my PhD. I dabbled in theoretical machine learning for 3 years. See, I can play the credentials game too.

But does that address the fact that they are not used in the industry? AI is full of charlatans and broken promises. Sadly, by listing "deep learning" alongside Deep Blue and Watson, it seems more charlatany.

stiff · on March 14, 2013

I am not playing the credentials game, quite the contrary, I remember trying to understand DBFs while at the university and failing miserably due to the complexity of the subject. I am also not defending deep learning in any way or having any stance about the subject myself. I just think in a place like HN you should not criticize technology without having practical experience or technical arguments, and of course "not used in industry" is not a technical argument. With Google just yesterday hiring George Hinton who lead most of the deep learning research and with Jeff Dean working on it there already[1] it is also overall a rather weak one.

[1] http://research.google.com/archive/large_deep_networks_nips2...

rm999 · on March 15, 2013

>of course "not used in industry" is not a technical argument. With Google just yesterday hiring George Hinton

Again, you're quoting me on that. Yes, 'not used much in data science' is a valid argument that it's not one of the biggest breakthroughs in data science.

And if you want to discuss the topic (while blanket criticizing people for not knowing what they're talking about) at least get the father of deep network's name right: it's Geoff Hinton, not George.

cscurmudgeon · on March 15, 2013

1. Google hiring X is not the same as X's model being scientifically valid.

2. Model X for intelligence being highly technical or cool in some mathematical way is not scientific validation.

3. Google hiring X is not the same as X's model being successful in the industry. I have long switched over to DuckDuckGo for technical queries.

Anyways, what AI people should first address always is point number 2.

AI has always jumped from one cool thing to the next without answering whether that cool thing has any scientific basis.

Don't bring another AI winter ;)

It is always cool to see excitement over research in AI! (As long as it does not drown out other competitive approaches which might bear fruit in the long run.)

stiff · on March 15, 2013

1. Google hiring X is not the same as X's model being scientifically valid.

That's exactly what I am complaining about.

2. Model X for intelligence being highly technical or cool in some mathematical way is not scientific validation.

I never said that, it just narrows the amount of people that can comment on it with merit.

3. Google hiring X is not the same as X's model being successful in the industry.

It was just a side-comment.

cscurmudgeon · on March 15, 2013

The downvote is amusing. Care to engage in an interesting conversation rather than resorting to drive-by-downvoting? I guess I touched an sensitive point with the Google fan boys here. Sigh.

tensor · on March 15, 2013

Nobody has ever convinced me humans are intelligent. Every specific example of human intelligence put forward, it eventually ends up that machines do it better.

How is one to scientifically validate against something that can't even be defined?

cscurmudgeon · on March 15, 2013

>Nobody has ever convinced me humans are intelligent. Every specific example of human intelligence put forward, it eventually ends up that machines do it better.

First of all that is completely wrong even for simple things like image recognition (try building a face recognizer which works under all possible conditions).

But more starkly consider the following question:

Is Geoff Hinton a machine?

joe_the_user · on March 14, 2013

The OP title include "The Biggest Data Science Breakthrough of the Decade", the topic has already stopped being "is this interesting" but rather "is there something to justify the hype" or, surprise, "is this 'The Biggest Data Science Breakthrough of the Decade'"

Sure, I only have a graduate math degree and only follow these latest developments casually and perhaps I just miss exact way this newest artificial neural network stuff is really that different than the older stuff. But the only thing that's being touted is a NYTimes article. As another poster said, if you'd like to add to the conversation, give us some "meat" here.

My small exposure to ML also left me feeling the whole train, test, operate cycle is a pain in the neck.

badmofo666 · on March 15, 2013

All (good) statisticians validate their models. Nothing new there.

The main difference between these newer networks (besides much improved performance) is that the algorithms can handle "deeper" networks better (more hidden units). If we're talking about Deep Belief Networks, they're not much like the old ANNs. DBNs are generative probabilistic graphical models using Bayesian inference.

Conceptually, going deeper (LOL) allows the networks to learn higher level concepts. For example, a 1 layer ANN (perceptron) can only learn linear functions, while a deep network is able to internally form a belief of what, say, a cat is.

More technically: Much of the work in ML is deciding what your inputs (features) should be. When classifying text documents, should you use word counts, bag of words, word stemming, character counts, etc. Should the model be linear, polynomial, gaussian, trigometric, etc. Deep learners try to automatically do feature selection and control the degrees of freedom in the model for you.

Also, deep learning is catching on in some industries. It has recently had huge successes in speech recognition, and all major companies developing this technology have started using it (e.g. Siri for one).

ergodic · on March 15, 2013

At the risk of being a bit annoying... regarding speech recognition, as said here

https://news.ycombinator.com/item?id=5376319

they use DNNs not DBNs (DBNs only used for pre-training, sometimes). Also if you read Microsoft's paper Table2, 7 hidden layer networks, which clearly qualifies as deep, work just fine with Back-propagation. Just a bigfat-MLP, no preprocessing! but 17.4 Word Error Rate (WER) vs 17.0 WER for DBN pre-training.

>they're not much like the old ANNs. DBNs are generative probabilistic graphical models using Bayesian inference.

MLPs (DNNs) can also be interpreted probabilistically. Just a directed model where inference is attained by marginalization of the hidden binary nodes in a layer-wise manner and by using a naive mean field approximation. All that to say the classic "forward-pass" ;).

Also could you indicate me a source confirming that Siri (Nuance) also switched to DNNs?. I am interested in that.

>Conceptually, going deeper (LOL) allows the networks to learn higher level concepts.

That is the really interesting part!. Now, I have not seen a proof for that. Wondering at individual neurons modeling individual features of e.g. a face or so is also a trend of the 90s and does not count as proof. I said this because it is what I usually hear.

Until now the justifications I saw for multiple layers of perceptrons being suitable for modeling arbitrary high level abstractions are reduced to

1) MLPs are universal approximators. This in my opinion is a superficial argument. GMMs also allow modeling "any" distribution and Taylor series any linear function, but in reality there are physical limitations to this argument. Maybe is true if you had a billion layer net, but will you get there?. If you had that computing power maybe a more realistic modeling of the brain might work better

2) They resemble how brain architecture works and similar arguments. Which I am fairly sure is not true. There are more human-brain based approaches to AI like e.g. cortical learning algorithms and those just seem to model that stuff to a certain extent.

spikels · on March 14, 2013

It is purely a matter of opinion whether DBNs are "overhyped" but I hope you would agree that they are currently being "hyped". And I hope you understand how this can actually damage the potential of what is likely some very good technology like say like what happened to neural nets. :)

It is also a matter of opinion how widely they are being used in industry. Certainly they are being studied in many companies but they do not a appear to used much in production because of their complexity and high training cost. This is still cutting edge technology.

In my experience most professionally trained mathematicians and statisticians are still pretty skeptical of these claims. Wouldn't you agree?

stiff · on March 14, 2013

It is OK to be skeptical, all I am trying to say is that most of the comments leave the impression of the poster trying to lean over backwards to say something at all related to the topic, often ending up with generic truisms. For example how is model building being a small part of practical ML criticism of deep learning? How is it on-topic at all?

spikels · on March 14, 2013

Not my comment but I think they mean that it is a relatively small part of the overall process and thus not "The Biggest Data Science Breakthrough". Maybe not a fair criticism but they have a point.

I would love to see a breakthrough in data cleansing or how about just standardized coding, labeling and formatting. Unfortunately I've used lots of 3rd party data sources and wasted more time on these brainless activities than I want to think about. Consider yourself lucky if you only work on web logs where you control what they look like.

ergodic · on March 14, 2013

rather than being so aggressive and waiting for downvotes you could choose to be constructive ;) and answer a concise technical question to a sub-thread that you yourself started

https://news.ycombinator.com/item?id=5377101

muyuu · on March 15, 2013

I haven't commented although my field of study is ML, but I have to say this is one of the shallowest infomercials I've seen hitting the front page here at HN.

They say absolutely nothing of relevance other than how awesome it's supposed to be, and oh by the way this is Cloudera and it's great, and I happen to work in Kaggle and it's magnificient. After preying your personal data to let you listen to the infomercials.

warmfuzzykitten · on March 14, 2013

After 10 minutes, not a single slide apart from the cloudera ad has been shown. This is a podcast.

jph00 · on March 15, 2013

Oh that's annoying. When I did the webinar I was sharing my screen and showing both slides and scribbles in onenote. Is there no button to view that in the recording? Sorry I can't check myself now, I'm on my phone.

ergodic · on March 14, 2013

Well, it is definitely something but it being the "Breakthrough of the Decade" seems pretty unlikely to me (given my available evidence).

I do not know well other examples beyond case of Automatic Speech Recognition, but since this case caused a lot of noise, I bet it is responsible for a reasonable chunk of the Deep learning "buzz". Here is my take about this.

If you look at papers from Microsoft like Seide et al 2011 and similar papers the reported improvement against state of the art (up to 30%) is really impressive and seems solid. Now, the technique is more or less using a very big multi-layer perceptron (MLP), a technique already established two decades ago (or more). There is some fancy stuff like the deep belief network based initialization, but it does not make big differences. The core of the recipe itself is not very new. What has changed is the scale of data we have available and the size of the models that we can handle.

With this I am not implying that this is not a very interesting discovery. But it is important to bear in mind that the change in the amount of data could also make other 20 year old techniques interesting again. On the other hand, neural networks had a bad name in the last years for understandable reasons. They are a blackbox, or at least less transparent than the statistical methods. This makes them prone to cause the "black box delusion" effect. You hear a new algorithm is in town, it has fancy stuff like remotely resembling human thinking architectures or cool math but you can not completely grasp it guts, then "voila!" suddenly you are overestimating its relevance and scope of applicably. MLPs were hailed as "the" tool for machine learning already once, I think for these same reasons. For me the right position here is a prudent skepticism.

On the other hand, this should also push people to try new/old radical stuff since the rules of the game seem to be changing, it is not a moment to be conservative in ML research :).

lrei · on March 14, 2013

I've heard this argument ever since Norvig's Unreasonable Effectiveness of Data. While having a ton of data available is great, it has its limits. I believe you are overestimating the effectiveness of data (as, imo, Norvig did). And here specifically, it's not the case for the hype:

from the NYT article [1]: "The achievement was particularly impressive because the team decided to enter the contest at the last minute and designed its software with no specific knowledge about how the molecules bind to their targets. The students were also working with a relatively small set of data; neural nets typically perform well only with very large ones."

NNs in general have enjoyed lots of successful practical (commercial) applications in pattern recognition though they were sort of replaced in the "state-of-the-art" by SVMs in many cases until RBMs and DBNs came along. I agree with your caution for skepticism though, only time will tell how good DBNs are.

I think the black box criticism is BS for the most part. In some cases (google's search being a famous example) it might be great to have a human readable and tweakable solution (assuming you have the resources) but for something like recognising handwritten digits from images, not so much.

[1] http://www.nytimes.com/2012/11/24/science/scientists-see-adv...

jre · on March 14, 2013

Regarding the black box criticism, it seems to me that most popular algorithms (SVM, Random forest, ...) become black boxes once you go past the simple 2D example and apply them to real problems. Real-world decisions trees are pretty unreadable and include some rules that really don't make more sense than the weights in a neural network.

ergodic · on March 14, 2013

> it might be great to have a human readable and tweakable solution (assuming you have the resources) but for something like recognising handwritten digits from images, not so much.

Agree, but with black-box I meant not something that is opaque to my grand-mother but partially opaque to engineers that implement MLP machine learning applications and the tech-lead that takes the decisions. The thing is that even research people (or maybe specially them) tend to positively bias things they do not completely understand (so I think, maybe its just me ;)). That is what I meant with black-box delusion. As you say only time will tell.

Regarding DBNs, again, the case of ASR uses DNNs which is to say big-fat MLPs. The model is handled as a DBN only for pre-training, and layer-wise pre-training does a similar job anyway.

SiVal · on March 14, 2013

Regarding the "black-box delusion", it's not just you. You see a magician do a trick, and it's amazing. Then he explains how it is done, and the excitement vanishes. Oh, that's all it is, no big deal.

Any sufficiently advanced technology is indistinguishable from magic, and who knows what wonders magic might accomplish? But once you understand the "trick", it's obvious that it can't do much more than what it's doing. Oh, well. The magic is gone.

stiff · on March 14, 2013

I have no idea if it is the breakthrough of the decade, but I think deep learning isn't just taking a perceptron with many hidden layers and applying backpropagation to it, as you seem to say, all the interesting things about it you summarized as "fancy stuff" and "not making a big difference", without any context, references or arguments. I do not feel competent to discuss it as I have very little experience in this field, but it doesn't feel too informed even given whatever little knowledge I have. Certainly faster computers and more data have helped, but just like in traditional algorithms research, they cannot completely make up for having exponential growth functions with respect to computational needs of the amount of data required. There have been large improvements in both respects in the deep learning community, in fact rarely does the term "deep learning" refer in practice to traditional completely supervised learning that you are talking about.

There are nice and more balanced overviews here:

http://ufldl.stanford.edu/wiki/index.php/Deep_Networks:_Over...

http://en.wikipedia.org/wiki/Deep_learning

ergodic · on March 14, 2013

If it was not clear enough, "fancy stuff" and "not making a big difference" refers to Seide et al 2011 mentioned in the same paragraph. Table 2 is particularly revealing to this regard.

http://research.microsoft.com/apps/pubs/default.aspx?id=1531...

As I said I can only speak with more or less certainty regarding ASR. I am fairly sure that the success in ASR (with Google and MS embracing DNNs for ASR) contribute significantly to the mainstream impact of deep learning.

stiff · on March 14, 2013

There is a second paper where they specifically point out the differences between their approach and previous approaches using neural networks and it isn't only the number of layers that has changed but also the internal architecture of the network, the "responsibilities" of the layers, so again, it isn't just a traditionally trained MLP with a lot of layers:

http://research.microsoft.com/pubs/157341/FeatureEngineering...

ergodic · on March 14, 2013

I read it in diagonal but the paper seems to use the same DNN architecture as before. They seem to tweak the pretraining with layer-wise back-propagation (instead of full MLP-as-DBN pre-training). This does not imply anything new with respect to what I commented and the cited paper.

The only reference to differences I found is about differences between a DNN and a MaxEnt models, which is again not an argument for differences between DNNs and MLPs.

Could you point me to a concrete paragraph?, I would be happy to be mistaken in this regard.

jph00 · on March 14, 2013

DNNs can be thought of a stacked Restricted Boltzmann Machines. Their structure and training is very different to traditional MLPs. They derive in some ways from convolutional neural nets.

I describe some of the key differences between DNNs and MLPs in the webinar. Also, the webinar explains how recent advances go far beyond just applications to speech recognition - in particular I focus on a case study in chemoinformatics.

ergodic · on March 15, 2013

>DNNs can be thought of a stacked Restricted Boltzmann Machines

Agree, as explained in Hinton et al 2006.

http://www.cs.toronto.edu/~hinton/absps/ncfast.pdf

But this is just for pre-training, as I said. If you look at Seides paper, they pre-train treating the MLP as a DBN and then they train it as a classic MLP with BP. Also using layer-wise BP pre-training does bring performance close to DBN pre-training, with no use of DBNs paradigms at all.

>Their structure and training is very different to traditional MLPs

I insist if we are talking of the same DNNs explained in Microsofts paper, this is not true. If we were to be talking about different DNNs please elaborate I would love to hear about that (seriously, no irony here).

jph00 · on March 15, 2013

There's also the random knockout of neurons, as mentioned in the webinar.

ergodic · on March 15, 2013

I did not find that on the paper, are you referring to randomly switching off neurons?. I would be surprised if this would not be a technique of the original neural networks wave.

stiff · on March 14, 2013

In comparison to older MLP research, besides the new training algorithm, there is this new insight that the deep structure of the network might be efficient for generating very good encodings of the input variables, like described here:

http://en.wikipedia.org/wiki/Autoencoder

I am not very familiar with speech recognition, but I think what they talk about here:

Instead of factorizing the networks, e.g., into a monophone and a context-dependent part [5], or decomposing them hierarchically [6], CD-DNN-HMMs directly model tied context-dependent states (senones). This had long been considered ineffective, until [1] showed that it works and yields large error reductions for deep networks.

might be related to this fact. 20 years ago it wasn't known why would you pick a deep network instead of a shallow one, there was even this famous theorem of Kolmogorow that a lot of people in ML misunderstood, that a network with just one hidden layer can in theory learn any function with arbitrary precision.

ergodic · on March 15, 2013

Again, the use of senones instead of monophones or diphones is just changing the output targets is not a novelty per sé.

badfortrains · on March 14, 2013

The thing that NNs have in their favor that other "20 year old techniques" lack is their ability to model any mathematical equation. There is no fundamental limit to the complexity of systems NNs can model (as there is with other AI techniques).

The problem with NNs is the difficulty of training them. Back propagation with random initial weights is simple, but it can easily converge on suboptimal local maximum if the learning rate is too aggressive. On the other hand, a slow learning rate requires an exponential increase in training time and data. Back propagation as a method was never really broken, it simply wasn't efficient enough to be effective in most situations. Deep belief techniques seem to remedy these inefficiencies in a significant way, while remaining a generalized solution.

Essentially deep belief networks seem to optimize NNs to the point where new problems are now approachable, and greatly improve the performance of current NN solvable problems. The complaint that "the core of the recipe itself is not very new", seems irrelevant in light of the results.

iskander · on March 15, 2013

>The thing that NNs have in their favor that other "20 year old techniques" lack is their ability to model any mathematical equation. There is no fundamental limit to the complexity of systems NNs can model (as there is with other AI techniques).

I'm sure that a decision tree can also be viewed as a [universal approximator](http://en.wikipedia.org/wiki/Universal_approximation_theorem) if you let tree height go to infinity (just as you need to let layer size grow unbounded with a NN). In practice, this power is at best irrelevant and often actually a liability (you have to control model complexity to prevent overfitting/memorization).

And, importantly, being able to theoretically encode any function within your model is not the same as having a robust learning algorithm that will actually infer those particular weights from a sample of input/output data.

ergodic · on March 14, 2013

Again, please, have a look at Seide et al 2011 before commenting. Besides that I am not complaining, just saying, wait a little more before you claim the breakthrough of the decade.

jackpirate · on March 14, 2013

>There is no fundamental limit to the complexity of systems NNs can model (as there is with other AI techniques).

Sure there is. For example, they will never solve the halting problem. They will also (probably) never solve NP-complete problems for very large instances.

spikels · on March 14, 2013

While deep learning is a very cool technique and is currently getting the best results in a few domains I think all the hype may become a problem. I was around for the prior round of neural network excitement and much time, effort and money was wasted. In that case it turned out that other techniques were more tractable and thus easier to use and improve upon.

It must be the association with the human brain that just makes neural networks more exciting than other techniques. But dispite the appeal of imitating nature has this usually been the easiest way to make progress in the past? Seems like it would be harder to achieve both goals at the same time.

So far the results are looking pretty good but it is probably best to keep the hype at a reasonable level unless it is crucial of your business model. ;)

jhartmann · on March 15, 2013

I have a machine learning startup that is using deep learning neural networks, so I'm probably biased here. I really think there is something that is worth the hype here, this is the first time we can solve significant problems without lots of feature engineering to make the neural network be able to solve the problem. While I'm sure there are going to be tons of things that deep belief neural networks can not do well even with these new capabilities and breakthroughs, there is a crapload of data out there that is begging to be analysed. Being able to get reasonable performance without a ton of feature engineering and years for a black arts team to build something that can get the data into a state where problems can be answered is SUPER exciting. The Neural Networks we are using are more specialized and more like Yann Lecun's, and we aren't using dropout like Hinton but we already have something that gets very good accuracy in our problem domain. There are some new techniques that are just coming out of Montreal, one in particular I'm very excited about called Maxout that looks like it will be another significant advance. One of the problems networks like this usually have is that the activation functions saturate above a certain level and once a neuron is in the saturated state the gradient training process will not move it anymore. Maxout is different in that it doesn't have this property, and it seems to maximize the benefit of the random selection process of dropout.

While I don't have the math credentials to match Hinton I think as more 'normal' folks like me get into the game there will also be some interesting things going on. We are trying some interesting things that seem very promising, and I'm sure there are lots of other folks beginning to play with these things that will have some interesting ideas and approaches as well.

So I personally think this is super exciting, and while it might not be applicable for every problem Deep Learning will definitely have a big impact.

rm999 · on March 14, 2013

>I was around for the prior round of neural network excitement and much time, effort and money was wasted. In that case it turned out that other techniques were more tractable and thus easier to use and improve upon.

And before 1980s style neural networks there were 1950s perceptrons. That was a much bigger mess, it took more than ten years for someone to point out how 'dumb' perceptrons were (they couldn't even model an XOR), which led to a collapse in AI funding that lasted more than 25 years.

spikels · on March 14, 2013

Can we be a little more thoughtful this time and avoid the boom and bust cycle that so often leads to problems?

You would think that since it already happened with neural networks before it would be less likely to happen again. However it may be that the same factors that lead to the last cycle are still in operation and it is actually more like to happen again. Something like the reasons for the seemingly endless series of real estate bubbles.

jph00 · on March 14, 2013

I created this talk for the Enterprise Big Data track of O'Reilly's Strata conference - so it's not a technical description of how deep learning works. Rather, it's an attempt to show why it's important, and how it fits into current data science trends.

The "Biggest Data Science Breakthrough of the Decade" in the title is a rather bold claim, I know... But I think it might be justified. If there are are bigger breakthroughs, I'd be interested in people's thoughts about what they might be.

spikels · on March 14, 2013

Not sure which decade you are talking about. If you mean the 2010s or the next 10 years we'll just have to see what the next 7 or 10 years bring.

But if you mean the past 10 years I would have to say that the "distributed storage and processing" revolution (Hadoop and others) has had a much bigger impact on data science than all of neural networks including deep networks.

Why the need to hype what is already a well publicized development? I'm starting to cringe whenever I hear "data science" or "big data" and I love this stuff.

pyre · on March 14, 2013

If by "of the Decade," you mean 2010-2020, then you've still got another 7 years to be proven right/wrong.

espeed · on March 14, 2013

Quoc V. Le (http://ai.stanford.edu/~quocle/) of Stanford and Google did a talk today at Univ. of Washington:

  "Scaling deep learning to 10,000 cores and beyond":
  Presentation Univ. of Washington (March 14, 2013)
  https://www.cs.washington.edu/htbin-post/mvis/mvis?ID=1338

You can see one of Quoc's previous talks online:

"Tera-scale deep learning: - Quoc V. Le from ML Lunch @ CMU http://vimeo.com/52332329

You may remember Jeff Dean's (http://research.google.com/pubs/jeff.html) post on this: https://plus.google.com/118227548810368513262/posts/PozFb134...

The corresponding research at Google...

"Building high-level features using large scale unsupervised learning"

http://research.google.com/pubs/pub38115.html

http://research.google.com/archive/unsupervised_icml2012.htm...

Previous HN discussion: https://news.ycombinator.com/item?id=4145558

--

How Many Computers to Identify a Cat? 16,000 http://www.nytimes.com/2012/06/26/technology/in-a-big-networ...;

rm999 · on March 14, 2013

I'm concerned deep networks are being overhyped. They're certainly exciting, but they haven't seen much use in industry yet; it's too early to make claims about how they have impacted data science.

Also, data science involves a lot more than building predictive models. In my experience >95% of effort goes into something other than building a model. In kaggle contests you usually concentrate on that <5%, which IMO is the fun part but it's not the reality of industry. There are many big breakthroughs in data science that don't involve model building.

edit: I haven't listened to the podcast yet (at work), my comment is more about the title.

mtrimpe · on March 14, 2013

I think this new speech recognition improvement by Microsoft depends on deep networks and it's results are very impressive.

http://research.microsoft.com/en-us/news/features/speechreco... http://research.microsoft.com/en-us/projects/mavis/

That might just be a single application but if it extends into other domains it might end up being very valuable indeed.

It will eventually become just another tool of course just like anything else but if it brings 10-20% improvements in even a few other long-stagnant areas I would agree with saying that it is a big deal.

lrei · on March 14, 2013

I'm genuinely curious (not being snarky or wtv): what do you put 95% of your effort into?

mshron · on March 14, 2013

Not the OP but:

* Problem definition

* Infrastructure

* Data transformation

* Exploratory analysis (arguably part of model work)

* Results presentation

Then again, this is an ongoing disagreement I have with the Kaggle folks over what constitutes "data science," where I'm pretty confident that "applied machine learning" is a better explanation of what their contests are about.

lrei · on March 14, 2013

I see. Thanks.

I'd say data transformation is a part of feature engineering (commonly the bulk of the effort in a ML application). And exploratory analysis is part of model work. W/o those 2 one would be building a model out of dreams and wishes.

Data Science is probably a poorly chosen description. I'd say common use includes infrastructure work which for most of us consists in engineering work.

rm999 · on March 14, 2013

I kind of got them to say it here:

https://news.ycombinator.com/item?id=4655927

BTW, I'm a big fan of the data analysis that came out of okcupid, is that all your work?

hsshah · on March 14, 2013

We should discourage submissions like this here on HN that require registration to view the content. So even though this topic is of great interest of mine, I will not be upvoting it. Sorry.

drucken · on March 14, 2013

While I agree with you, note there is already flagrant "abuse" of this on HN by the posting of newspaper links with paywalls, in particular the New York Times.

Aloisius · on March 14, 2013

I would love to watch this, but O'Reilly's presentation streamer is awful. I tried jumping ahead, but the video stream doesn't actually jump with me so I end up listening to one part and watching another (tried under FireFox, Safari and Chrome on Mac).

I don't suppose someone has an alternative version somewhere?

zby · on March 14, 2013

I registered and listened for maybe 9 minutes - and nothing interesting, just talk about the talk. How I hate webcasts!

anigbrowl · on March 14, 2013

Agreed. Don't care much for podcasts either, although I can see the value for people who drive frequently.

jph00 · on March 14, 2013

I believe the talk about the talk only covers 3 min 20 s.

daniel-cussen · on March 14, 2013

Does anyone know how RAM-intensive deep learning is? If the answer is "not very," I think the GA144 might be a good candidate because it's a lot of CPU-capable (independently-branching) cores.

lrei · on March 14, 2013

dont have any numbers for you but it's typically CPU-bound and not memory-bound. deep learning is often done using GPUs because of massive-parallelism.

wladimir · on March 14, 2013

It may be computation-bound (I'm not sure) but training deep networks generally does use a lot of memory, because of the giant training sets. You're right that GPUs are a good fit, for example libraries such a theano exploit this.

patrickk · on March 14, 2013

If you're watching the webcast, skip to 3:20 to avoid all the promo and buildup fluff at the start.

pilooch · on March 15, 2013

Deep learning is so attractive to any AI and machine learning practitioner! The results are beautiful to witness or read about. This is clearly another step in a direction that many of us have been waiting (or working on) for a long time!

That said, AI, like every other sciences, experiences trends and bubbles. If you give a decent look to usages and problem solving with machine learning, deep learning techniques are not exactly the final answer. Typically, they're slow to train, to my knowledge there is no good 'online' algorithm yet to train them (i.e. for autoencoders, recursive autoencoders, Boltzman machines). Many applications, and a trend toward 'lifelong learning'[1], require fast incremental learning that yields results in near real-time, or at least in minutes rather than days.

I've compared a couple of unsupervised machine learning algorithms with recursive autoencoders: the latter can learn deeper, very often, but at a computational costs (days vs seconds). Deep learning computation will improve, for sure though.

[1] http://cs.brynmawr.edu/~eeaton/AAAI-SSS13-LML/

softbuilder · on March 14, 2013

This is a new topic to me. Are there any whitepapers or open source projects touching on this?

amit_m · on March 15, 2013

Argh! Worst streaming experience ever. Also, no contents in the first 20 minutes.

It took me a while to understand that the slides were in a popup was blocked. After reloading, the slides don't match the audio.

wmat · on March 14, 2013

Can this be viewed without registering?

jph00 · on March 14, 2013

I don't think so. I even had to register to view - and it's my talk!

I haven't received any marketing stuff from Cloudera or O'Reilly however. Honestly, I doubt those companies would do anything questionable with registrations.

learningram · on March 15, 2013

I wish there was a download option

wfunction · on March 15, 2013

Why are they are they tricking us into giving away personal information?

pbharrin · on March 14, 2013

You need Flash to watch this even on Chrome.