Obstacles on the Path to AI

vonnik · on Oct 22, 2015

"There is no way in hell that you can learn billions of parameters with RL." I really love LeCun's provocative stances, but I get suspicious when people talk about impossibilities. RL is making huge strides. People adopted the same tone with neural nets years ago, and LeCun proved them wrong...

paulsutter · on Oct 22, 2015

He's saying you won't learn billions is parameters with RL /alone/ because "one scalar reward per trial isn't going to cut it".

I find that convincing and a key insight. RL is going to be fundamental to AGI, but he's saying curiosity / unsupervised learning will be necessary. And I say this as a big believer in the need for more work on RL.

vonnik · on Oct 22, 2015

Yes, I should have added the context about the scalar reward, but I'm not sure why that changes anything.

This may seem like a naive question, but it's sincere: What makes a scalar reward less effective at modifying a Q function than a scalar error that's used in backprop and assigned to a neural network's coefficients?

karpathy · on Oct 23, 2015

supervised learning: (after millions of operations you had just performed) out of 1000 you predicted label #136, but actually the true result for this input was label #25.

reinforcement learning: (after millions of operations you had just performed) out of 1000 you predicted label #136. That's not right, but I won't tell you what you should have done. Also, it could have been right, but maybe you had screwed up something in one of your last 100 predictions and I won't tell you what it was. Good luck.

paulsutter · on Oct 23, 2015

To move the analogies into general intelligence, start with the example of Elon Musk (a top notch example of GI). What drives Elon Musk? In interviews he's said he wants to be useful. He's achieving that goal in a big way.

Take the moment he decided to build a rocket. It was after he had a difficult meeting to buy a russian missile. He quickly roughed out, from first principles, that he should be able to build one himself. The /drive/ to make that analysis was in pursuit of his goal (and required reinforcement learning).

But what lifelong process filled his brain with all the complex and nuanced information needed to rapidly draw that conclusion? His curiosity, aka unsupervised learning, which led him to learn so many things over the years that culminated in that moment.

For this reason, I might make the analogy that RL is the "conductor of the symphony", rather than calling it the "cherry on top" as LeCun does here.

psb217 · on Oct 23, 2015

This is an overly restrictive view of RL. Yann's claim about the potential utility of RL, taken at face value, is clearly false. Backprop on a deterministic computation graph is equivalent to deterministic policy gradient in the same graph, where the reward is given by the value of the optimization objective.

vonnik · on Oct 23, 2015

that makes sense. thanks.

EDIT: followup question: to what extent does DeepMind's Deep-Q learning address the problem of associating remote causes with final outcomes?

davmre · on Oct 23, 2015

The point is that supervised training on large networks requires millions or billions of examples. In RL, that means potentially billions of execution trajectories, which can be really expensive if you're executing on an actual, physical robot. And even then, supervised classification objectives are more informative than just a scalar reward: when a supervised system makes a wrong prediction you don't just tell it "sorry, that's wrong", you also tell it what the right prediction would have been. That extra information is really valuable for learning.

In the black-box RL setting, you only see whether what you did was right or wrong, not what the right thing would have been to do. And unlikely a classification system where the output space is relatively small (ImageNet has 1000 classes), an RL agent is searching over an exponentially large space of possible trajectories. Which means that without some additional source of supervision you can spend a long time wandering in the wilderness with no idea of whether what you're doing is reasonable or how to get any reward at all. And when you do get some reward, you have no idea which of the possibly hundreds or thousands of actions you took deserves the credit.

A lot of recent RL research is about finding additional sources of supervision, such as training an agent to mimic the policy of an "expert" (e.g., a search algorithm that runs in a simulator to find an optimal solution, but which requires too much computation to actually apply directly at test time, as in http://arxiv.org/abs/1504.00702), or coming up with proxy objectives like "empowerment" or "curiosity" (which you can define in terms of information theoretic quantities, such as mutual information between the agents' action sequence and future state, e.g. http://arxiv.org/abs/1509.08731) to supplement the actual reward signal. This latter path, the notion of "intrinsic reward" that paulsutter alluded to, is in some sense the merger of RL with unsupervised learning, and a lot of the power is that you're getting new reward signals constantly, not just when you finally manage to achieve some arbitrarily difficult task.

zhanwei · on Oct 23, 2015

It seems to me to be all about having the right prior and planning for exploration. Policy search methods (http://arxiv.org/abs/1504.00702) assume that there aren't many trajectories that make sense (based on prior knowledge/testing in simulator) and search for the best ones among those that make sense using real-world data. Even within policy search you need some kind of exploration such injecting gaussian noise in trajectories. The hard part is to come up with a model for exploration.

jhartmann · on Oct 23, 2015

The error in backprop is a vector quantity, not a single scalar for each time step. In RL the goal is to optimize the overall sum of the reward over all time steps. Backprop attempts to minimize the magnitude of the error for a given loss function, by moving in the negative direction of the gradient of the function. Backprop just moves a lot more variables to a desired outcome, that is what LeCun is saying. The representational power of a single scalar 'score' doesn't have much ability to optimize such a large n-dimensional function in any efficient way.

vonnik · on Oct 23, 2015

Maybe I'm thinking about things wrong, but can't you have a scalar quantity to represent error for backprop at the end of a neural network (in a supervised regression problem for example)? That scalar error becomes a vector as it is assigned to various weights. I'm not trying to be obtuse, but it seems like the scalar reward in RL is also modifying countless variables in the Q functions of the state-action pairs that led to the final outcome/reward... Are those, by definition, smaller in number than the parameters of a neural network?

albertzeyer · on Oct 23, 2015

In biology, it's a combination of Reinforcement learning and Unsupervised learning. But he probably means just RL on its own. Although I'm not sure if that statement would be correct.

tomp · on Oct 22, 2015

Everytime I read something like this, I get sad that I don't understand most of it.

But then I get happy because at least I understand a little bit :)

pzone · on Oct 23, 2015

Reading some random slides from a specialist talk is not exactly the easiest way to learn this stuff.

draven · on Oct 23, 2015

Most often the slides alone are useless, they are only supposed to be a support for the talk. Talks recordings (or transcripts) are more useful, you can only get a high level idea of what the talk is about from slides.

I would be awesome to have a platform where you get recommendations of what to learn / read or online courses in order to understand a given talk.

mkorfmann · on Oct 23, 2015

Yes absolutely! A platform where not only the answers are praised, but also curious questions.

lowglow · on Oct 23, 2015

recs?

thangalin · on Oct 23, 2015

The talk:

http://techtalks.tv/talks/whats-wrong-with-deep-learning/616...

_0ffh · on Nov 1, 2015

Thanks for the link, but goodness gracious, 6 GB for one hour of low quality video? What are these people thinking, that internet bandwidth is bestowed on all of us from on high?

cesarsalgado · on Oct 23, 2015

This is not the same talk. The "what's wrong with deep learning" talk was given in CVPR 2015. The slides linked in this HN post was presented in BayLearn: Bay Area Machine Learning Symposium.

grondilu · on Oct 22, 2015

That's only the slides. Is there a video of the talk?(assuming there is a talk, that is)

haddr · on Oct 22, 2015

I'd love to see the video too... LeCun is probably most interesting person to hear in this topic...

crazypyro · on Oct 23, 2015

In case you didn't see the comment originally (I didn't either), someone else posted a link.

http://techtalks.tv/talks/whats-wrong-with-deep-learning/616...

vkhuc · on Oct 22, 2015

Although I'm working on deep neural nets, this material is too advanced to me. Looks like deep nets + bayesian reasoning is the next big thing.

imh · on Oct 22, 2015

Pick up this book. It's fantastic.

https://mitpress.mit.edu/books/probabilistic-graphical-model...

draven · on Oct 23, 2015

The title sounds familiar, it's also a course on coursera:

https://www.coursera.org/course/pgm

Last session was in 2013 though.

vkhuc · on Oct 23, 2015

Thanks. That's the book I'm reading :)

ankurdhama · on Oct 23, 2015

The real obstacles to the path to AI is that we don't even have the right questions to ask and people are looking for answers to the wrong questions and announcing the next big thing in AI.

KasianFranks · on Oct 22, 2015

I like the focus on vector space.