The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks

opwieurposiu · on July 6, 2018

Working on DNNs is more similar to gambling on the slot machines then it is to traditional programming. I gather training data, write a python script, and then pay around $1 for GPU time to see if it's a winner. Instead of watching spinning reels I watch loss function graphs. 95% of the time it is a bust, but every once in a while you get a big payoff. I feel very smart when I win, but deeper investigation inevitably reveals I have no idea why A works and B does not.

taneq · on July 6, 2018

That's the difference between you and evolution, though, right? You care about the outcome and the reasons behind it. Evolution doesn't care about either.

okket · on July 6, 2018

Evolution does have a strong bias towards offspring survival, though it is in some cases OK with the law of big numbers and statistical probability.

taneq · on July 6, 2018

I'd say that's less something evolution has and more something it is. Although "overall surviving fecundity" or something probably describes what it's optimising for pretty well.

_ugfj · on July 6, 2018

Evolution doesn't optimize for anything but the result are such that, to quote:

> The prevalent genes in a sexual population must be those that, as a mean condition, through a large number of genotypes in a large number of situations, have had the most favourable phenotypic effects for their own replication.

taneq · on July 6, 2018

It optimizes for something, not in the sense of intent but in the sense that there's something that increases as evolution takes place. It's just pretty hard to describe exactly what. Darwin called it by the delightfully circular name "fitness", where evolution is the survival of the fittest and the fittest are those that survive.

(Of course, fitness is relative to an environment and the environment changes as things evolve, so it's not like there's some global 'fitness' property that increases. Just that, all else held constant, the thing which evolves tends to become better suited to reproducing in its environment.)

k__ · on July 6, 2018

Doesn't this make the profession rather stressful?

I mean how can you sell your work to anyone if you fail most of the time?

nerdponx · on July 6, 2018

What deeper investigation have you conducted that was unhelpful that way? Curious.

dmichulke · on July 6, 2018

A very cynic and provocative follow-up to this hypothesis (if it turns out to be true) would be:

DNNs are nothing more than a huge set of NNs where at least one happens to solve your problem and the DNN finds out which one.

The approach is then essentially the same as a Random Forest with bagging but with the decision trees substituted by Neural Networks.

This in turn would mean that:

1. The "Deep <Your Algorithm here>" revolution is in fact not a revolution but just throwing many models (and thereby resources) at the same problem while obscuring that you don't have any idea which one will work because the DNN will do it for you.

2. The age old problem of initializing the neural network is not at all solved and can lead to drastically better results if finally somehow addressed.

skinner_ · on July 6, 2018

"DNNs are nothing more than a huge set of NNs"

As in deep neural networks are nothing more than a huge set of shallow neural networks? That's directly contradicted by tons of evidence. See for example the visualizations in https://distill.pub/2017/feature-visualization/

Your claim 2 is of course correct, supported by the phenomenon of transfer learning.

jfrankle · on July 6, 2018

(Author of paper here) This is approximately my growing suspicion.

Mathnerd314 · on July 5, 2018

Discussion on Reddit: https://www.reddit.com/r/MachineLearning/comments/85eo8v/r_t...

rdlecler1 · on July 6, 2018

I think the authors are confused about what’s happening here. It’s not revealing a subnetwork, you’re removing spurious interactions in the fully connected ANN thereby visually revealing the functional circuit topology that’s driving the behavior.

It’s helpful to understand the effect of pruning on artificial gene regulatory network, another class of ANN but mathematically they follow the same rules:

https://www.researchgate.net/publication/23151688_Survival_o...

cosmic_ape · on July 6, 2018

I should say its very easy to construct synthetic datasets on which it is very clear that the only role of large layers is to supply more opportunities for the initialization to get it right.

So, not much surprise there. But they claim they can extract the smaller subnet, which could be useful. Except they only provide experiments with very small nets so far, as the comments on reddit point out.