Neural Architecture Search with Reinforcement Learning

ericjang · on Jan 19, 2017

The primary author is a Google Brain Resident (https://research.google.com/teams/brain/residency/). The Brain Residency is a really great program for starting a career in Deep Learning and ML research, and I'm really impressed by how quickly these new researchers churn out great work like this.

disclosure: I work at Google Brain

cr0sh · on Jan 19, 2017

I really would like to, someday, do the residency, but at the same time, I don't know if I can.

Right now, I'm doing the Udacity Self-Driving Car Engineer Nanodegree. I won't rehash that or my other knowledge here...

My issue, though, is my age, coupled with the fact that I don't have a real degree (I've got this old associates degree from a tech trade school that is almost worthless). After I finish the Udacity thing, I have this idea of pursuing an online BA, then an MA - likely in CompSci. But, we are likely looking at anywhere from 4 years or more (likely more) to do both things - and not a small amount of money. After that point, I'll be close to 50 years old.

There's a very good and likely chance that the residency won't be around at that point; or even the possibility that technology around ML may have radically changed the world to the point where it would be difficult (or even meaningless) to try to "catch up"...

Still - I'm not going to let that possibility stop or restrain me; that said, is it viable for me to even think about this kind of thing - doing such a residency? Would I even have a chance of being considered, vs someone younger? Furthermore - how does one do a residency of this nature, while still paying bills (I live in Phoenix, AZ - I own a house, plus have other bills)?

These are just questions I have - not really concerns, as I am no where near the point I need to be to do the residency - and there isn't any guarantee that I will be. I'll just continue to enjoy to journey, rather than worrying about a specific end-goal.

ericjang · on Jan 20, 2017

> is it viable for me to even think about this kind of thing - doing such a residency? Would I even have a chance of being considered, vs someone younger? Furthermore - how does one do a residency of this nature, while still paying bills (I live in Phoenix, AZ - I own a house, plus have other bills)?

Good on you for pursuing continued education.

Yes it's viable - never lose hope in yourself for anything. The Brain Residency has no age requirement; in fact, the program looks for diverse academic/career experiences. You'd be surprised how close you are to your dream job if you are passionate, hardworking, and lucky.

I too, was intimidated by the math & magic of ML when I first took Andrew Ng's Intro to Machine Learning Course 4 years ago (my first exposure). I took regular CS courses in college and started focusing on ML 1.5 years ago. Beyond taking courses, I highly recommend building your own projects - it exercises your independent research ability and creativity.

The Brain residency pays a good wage for the SF Bay Area (comparable to new grad salary at average tech firms), so you should be able to pay your bills. I've heard rumors that many other companies highly invested in ML are following in Google Brain's example and starting similar residency programs. Your ML career path need not be at Google, though we're a pretty solid choice ;)

cr0sh · on Jan 20, 2017

> I too, was intimidated by the math & magic of ML when I first took Andrew Ng's Intro to Machine Learning Course 4 years ago (my first exposure).

Thank you for your kind words and encouragement. I will take them to heart.

I got my first taste of modern machine learning when I took Andrew Ng's ML Class (sponsored by Stanford), in the fall/winter of 2011; I completed and passed it. At the same time, I was also taking Thrun and Norvig's AI Class; I had to drop out due to personal reasons.

In 2012, after Thrun founded Udacity, they released the CS373 "Build Your Own Self-Driving Vehicle" course (I probably have that title wrong for the time; they changed the name of the course later) - it was meant as a stand-in for the original AI Class (I think there were licensing issues or something that prevented them from presenting it - they later incorporated it as a part of their offerings). I jumped at the chance, and completed that course as well.

When they announced this Nanodegree course, I knew I had to apply. So far, things are going well with the course. I'm in the November (2016) cohort, and currently working on the behavioral cloning project.

I do have in mind several personal projects to pursue, once I can come up for air from this course (most of my free time has been consumed by it). I do need more education, though, which is why I want to pursue the BA and MA. Two of my weak areas are calculus and stats/probabilities - which ML is heavily reliant on (plus, I really want to understand what is going on "under the covers" as well).

Onward and upward!

saycheese · on Jan 20, 2017

>> "disclosure: I work at Google Brain"

+

>> "I too, was intimidated by the math & magic of ML when I first took Andrew Ng's Intro to Machine Learning Course 4 years ago (my first exposure). I took regular CS courses in college and started focusing on ML 1.5 years ago."

------

That's awesome, congrats!

Appears you have written about it too:

https://news.ycombinator.com/item?id=13440813

nl · on Jan 20, 2017

I believe Chris Olah[1] doesn't have a degree. I suspect he doesn't need one now though...

[1] http://colah.github.io/cv.pdf

cr0sh · on Jan 20, 2017

Based on reading his CV and his github page, I suspect he never needed one in the first place. He seems like a very intelligent and driven (by curiosity) individual, who will likely go very far. Thank you for posting the link; I think I may be able to learn from him, and his example.

ilaksh · on Jan 20, 2017

Are any of you trying to apply some of these ideas to general intelligence or is that only Deep Mind?

If for example Google Assistant is ever going to be able to _really_ understand what people are talking about, it will most likely need to be based on a virtually embodied agent gradually trained somewhat like a human child. If you or any of the other Google Brain team members are interested, you should study the existing field of AGI. See 'AGI-16 videos intelligence' on youtube for example. I mention this because these papers such as the one linked here frequently reference general or animal-like intelligence, yet almost all of the best funded AI efforts seem explicitly ignorant of or dissmissive of the existing AGI research.

nl · on Jan 20, 2017

You understand that AGI16 is a futureism conference and has little to do with actually building anything that works?

Whilst theories like a 'virtually embodied agent trained like a child' sounds like it makes sense there is nothing to say it will work, except that humans are like that. But we don't know how brains work enough to copy that anyway.

cosmoharrigan · on Jan 20, 2017

> AGI16 is a futureism conference

I would like to point out that the AGI Conference series [1] is an academic conference, with refereed proceedings published in the Springer Lecture Notes on Computer Science [2].

[1] http://agi-conf.org/

[2] http://link.springer.com/book/10.1007%2F978-3-319-41649-6

argonaut · on Jan 23, 2017

It is a legitimate conference. That doesn't mean it's any good. It doesn't have any standing in the AI or machine learning research community (unlike: NIPS, ICML, AAAI, KDD, CVPR, ACL, etc...). The Springer LNCS series is just an oddball mishmash of papers from all sorts of sources. A legitimate publication, but not one with any standing in the community.

It's unfortunate that which conferences/journals are "good" (e.g. worth reading) or not is one of those things that's really 100% obvious to anyone in the research community, and totally esoteric to anyone not in the community.

nl · on Jan 27, 2017

Major thread bump here, but there are a number of (second) authors of papers that are recognizable and credible ML/AI researchers. In particular I noticed Marcus Hutter and Jürgen Schmidhuber.

But yeah..

nl · on Jan 20, 2017

It's weird. Some of those papers are pretty good, but most of the videos are bad.

cynicaldevil · on Jan 19, 2017

Wow this feels tailor made for people like me. I want to do further studies in deep learning after I graduate, but I'm not really keen on spending 4/5 years of my life on it. I'll be keeping an eye on this....

gallerdude · on Jan 19, 2017

I think this is the way that Neural Networks achieve some modicum of generality - chaining them together.

Let's say you have a robot that you want to grab a can of beer off the counter. You say "grab that beer" and point to it. The first neural network interprets the speech and visual input. A second neural network chooses the proper neural nets to continue the task based on the information interpreted from the first net - it picks one for walking and one for grabbing.

randcraw · on Jan 19, 2017

Of course compound task federation like this is not exactly new. Blackboard systems did the very same thing using expert systems back in the 1970s-80s.

The problems then remain today: complex and compound tasks don't decompose neatly into well-defined constituent subtasks. Sub-task recombination rapidly devolves into a rat's nest of selecting from competing subtask components that don't have clearly distinct semantics, that are not free of contextual dependencies, nor do they plug-in-and-play independently.

ced · on Jan 19, 2017

Do you have any interesting material on this? Where's a good in-depth analysis of blackboard systems' pros and cons?

randcraw · on Jan 19, 2017

I'd start here: https://en.wikipedia.org/wiki/Blackboard_system

I'm not up-to-date, but I haven't heard of new work on blackboards or federated rule-based systems for maybe 20 years now, after expert systems grew increasingly probabilistic (and procrustean), and BBSes based on binary rules showed little sign of escaping the classic brittleness of RBSes.

The wiki article mentions 'Bayesian Blackboard' systems. Maybe they had greater success?

joshmarlow · on Jan 20, 2017

BBS systems also interest me quite a bit. Check out https://en.wikipedia.org/wiki/LIDA_(cognitive_architecture). It's based on a Global Workspace model (conceptually similar to a Black Board System). You can find papers on LIDA online (including one or two from the references on the wiki page). If you find anything particularly interesting, feel free to share!

westoncb · on Jan 19, 2017

So maybe now we need an NN that figures out the best way of wiring up various pre-existing NNs?

halflings · on Jan 20, 2017

Isn't that exactly what the paper is about?

westoncb · on Jan 20, 2017

It's sort of related. In the paper they are still learning parameters for a single new network, as opposed to training a new network to be good at linking up other networks.

Smerity · on Jan 20, 2017

Extreme paper tldr - Humans usually construct neural network components and the graph of how they fit together by hand. This work sets up a "controller" neural network that constructs two core components in many neural networks, an RNN and a CNN, through reinforcement learning. This is an intensive and slow process, requiring 400 CPUs and 800 GPUs for the RNN and CNN respectively, but achieves better than or near state of the art results for language modeling and vision classification respectively.

This paper is currently under review for ICLR 2017 and is one of the papers I was most excited about. I previously wrote an article, "In deep learning, architecture engineering is the new feature engineering"[1], which discusses the (often ignored) fact that much of the work in modern deep learning papers is just assembling components in different combinations. This is one of the first works that I feel provides a viable answer to my complaint/concern.

The paper itself tackles two problems - first, that optimizing architecture is usually black magic directed poorly by humans, and second, humans rarely spend their time tailoring towards a specific task, instead seeking generality. Zoph and Le do this by having one neural network generate the architecture for a later one through a large series of experiments. They perform experiments in both vision (classification) and text (language modeling), replacing the convolutional neural network component and the recurrent neural network component respectively.

First is that many of the choices regarding constructing the neural network architecture are somewhat arbitrary and only hit upon experimentally by the practitioners themselves. Andrej Karpathy noted in one of his lectures (paraphrased) "Start with an architecture that works, then modify from there" - mainly as there's a lot of "black magic" in these architectures that has only been discovered by spilling blood to the experimental god of a hundred GPUs and/or "graduate student descent" (i.e. where you lock a poor grad in a room for an indeterminate period of time and tell them to do better on task X). Being able to turn to a neural network to run this painful search for you instead is a good idea - assuming you have the large number of GPUs or CPUs necessary. In the paper they use 400 CPUs for the language modeling search and 800 GPUs for the CNN classification search!

The other is whether we should generalize or specialize these architectures. There are many variants of architectures that are not built for or tested against each possible new task. For example, within recurrent neural networks (RNNs) we have the RNN/GRU/LSTM/QRNN/RHN/... and a million minor minor variants between them, each of which perform slightly different depending on the task. While we would like to imagine the architectures that humans make would get progressively closer to "the perfect generic RNN cell" over time, it makes sense that certain cells are, could, or should be optimized to a specific task. Seeking generality isn't always the correct answer. Humans want to seek generality as we don't have the time to tailor to each specific task - but what if we could? Maybe in that situation Occam's razor is actually an impediment to our thinking.

While this is early days, and hugely resource intensive, it is likely to get more feasible over time either as we get more computing power or become smarter regarding how we use it. As a researcher in neural networks, I don't consider this a threat, but instead a useful tool, likely in the same way that a compiler likely only helped assembly programmers.

If people are interested, I can write an article covering many of the details of this paper like I did for Google's Neural Machine Translation architecture[2]. In that article I try to step through how these systems work from the ground up, and the reasoning behind many of the decisions in the paper, hopefully in an understandable manner for a general audience.

P.S. Merity et al. is one of the numbers they beat in the language modeling section, so you may read this entire post in a bitter tone if you'd like ;)

P.P.S. This paper has been out since November 2016 or earlier - I think it was a recent MIT Tech Review article that might have resurfaced it? (oops: wrote Wired initially, meant MIT Tech Review - thanks @saycheese)

[1]: http://smerity.com/articles/2016/architectures_are_the_new_f...

[2]: http://smerity.com/articles/2016/google_nmt_arch.html

saycheese · on Jan 20, 2017

Found it on MIT Technology Review:

https://news.ycombinator.com/item?id=13439691

Do you have a link to the Wired article you're thinking of?

lucidrains · on Jan 20, 2017

I'm interested! Your blog posts are great! :)

EvgeniyZh · on Jan 22, 2017

If you had RSS, that would be nice.

jorgemf · on Jan 19, 2017

I would love to see a Neural Architecture Search that create Neural Architecture Search as a future research project. Meta-meta-learning. I like the idea of improving the network which creates other networks.

Also the size of the network can be used as part of the evaluation in order to minimize the networks and maximize the accuracy.

westoncb · on Jan 20, 2017

I would guess the computational complexity involved in training the top level network would make the project infeasible.

jorgemf · on Jan 20, 2017

You can relax the problem in order to reduce complexity and calculations. For example: cut the training of slow learning networks, have a database of already trained networks, decrease the number of epochs and examples, etc. Or even create another network which predicts the convergence of a network and use it as heuristic. If you also take into account the size of the network and force the learning to minimize them, then you can train them faster.

We cannot do it at home, but for sure Google can try it in their servers.

saycheese · on Jan 19, 2017

Ran across this research reading this article, "AI Software Learns to Make AI Software" - which is already posted here:

https://news.ycombinator.com/item?id=13436195

deepnotderp · on Jan 19, 2017

This is pretty old, and neural nets can train neural nets too (better than humans as usual). Check learning to learn w/ gradient descent by gradient descent

westoncb · on Jan 19, 2017

Err.. this paper cites the one you're referring to, so I do not think it's older (also, the dates they were published).

deepnotderp · on Jan 20, 2017

(Learning to learn is older, I just thought it would be cool)

jordansmithnz · on Jan 20, 2017

Wait... if the neural net can design other neural nets, can it be taught to design itself?

mastazi · on Jan 20, 2017

Ironically, this is one of the oldest ideas in computer science https://en.wikipedia.org/wiki/Von_Neumann_universal_construc...

Houshalter · on Jan 20, 2017

Von Neumanns constructor is about self replication. Not quite an AI that can self improve.

I think the first person to put the idea forward was I. J. Good in 1962. He speculated that someday AIs would be good enough to do AI research better than their human masters. Then they would start making even better AIs. Which would make even better AIs, and so on. Leading to what he called an "intelligence explosion". He thought it would be "the last invention that man need ever make."

http://web.archive.org/web/20160428183531/http://le-cretin-t...

mastazi · on Jan 22, 2017

Parent's comment was about self-replication, I was addressing that.

EvgeniyZh · on Jan 22, 2017

Can it be taught to design a better version of itself?