Uncertainty in Deep Learning (2016)

syllogism · on March 5, 2017

If you use Keras, you might have noticed the dropout_W and dropout_U arguments on RNN layers. These calculate dropout using Gal's recommendation, "variational dropout".

With other ways of applying dropout, LSTMs typically fail to converge --- and with no dropout, they often over-fit. Gal's variational dropout therefore brings a significant improvement to many leading models.

There are several other nice contributions in the thesis as well, including a recommendation for applying dropout to word embedding matrices that I don't think has been well explored yet.

matheweis · on March 5, 2017

Yarin Gal also wrote the excellent "What My Deep Model Doesn't Know..." [0] in 2015.

If these ideas look interesting, you might also want to check out Thomas Wiecki's blog [1] with a practical application of ADVI (a form of the variational inference Yarin discusses) to get uncertainty out of a network.

[0] http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html

[1] http://twiecki.github.io/blog/2016/06/01/bayesian-deep-learn...

plusepsilon · on March 5, 2017

I don't understand the math completely but it looks like dropout can be derived from a Gaussian prior (approximating the Bernoulli) in a Bayesian context.

One useful tidbit is that you can get prediction intervals from deep learning models by running it forward N times with dropout and take the mean and variance of that distribution (plus another precision term).

garagemc2 · on March 5, 2017

Can anyone explain like I'm 5? Or since this isn't reddit, like I'm 21?

ericjang · on March 5, 2017

Suppose you train a neural net on cat pictures to classify the breed of cat. We desire the property that if we were to feed in a picture of a horse instead of a cat, we could somehow measure how good the network's parameters are for classifying this particular image. This is uncertainty estimation, and Yarin's blog post + thesis provides an elegant way to compute this, which get nearly for free from the existing model.

Concretely, if you are trying to train a neural net to forecast stock prices or drive a car safely, not only do you want to have predictions, but you want to estimate some measure of how confident your model is of that prediction. This is eminently useful for models that lean towards the "black-box" spectrum, such as deep neural nets.

Note that parameter uncertainty and risk estimation are quite different, which are addressed in this preliminary work http://bayesiandeeplearning.org/papers/BDL_4.pdf

k_sze · on March 6, 2017

I just started trying to learn maching learning from the ground up in my free time. I'm still trying to work out the Chernoff bound theorem. So you see how much of a noob I am.

But does this basically mean that I can have a model trained on only cat pictures and it can still tell me, with some measure of certainty, that the picture of the horse is not a cat, all without training the model to answer specifically "is this a cat?"

sherjilozair · on March 6, 2017

What's the verdict on this? Does Dropout do parameter uncertainty or risk estimation? Gal seems to be claiming the first, while the paper you linked claims the second.

btown · on March 6, 2017

This seems to be a novel application of dropout for uncertainty. The author's 2015 post linked by matheweis [0] gives an approachable walkthrough:

> I think that's why I was so surprised that dropout – a ubiquitous technique that's been in use in deep learning for several years now – can give us principled uncertainty estimates. Principled in the sense that the uncertainty estimates basically approximate those of our Gaussian process. Take your deep learning model in which you used dropout to avoid over-fitting – and you can extract model uncertainty without changing a single thing. Intuitively, you can think about your finite model as an approximation to a Gaussian process. When you optimise your objective, you minimise some "distance" (KL divergence to be more exact) between your model and the Gaussian process. I'll explain this in more detail below. But before this, let's recall what dropout is and introduce the Gaussian process quickly, and look at some examples of what this uncertainty obtained from dropout networks looks like.

[0] http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html

aaronjg · on March 6, 2017

This sort of approach is also used in reinforcement learning https://arxiv.org/abs/1702.01182

deepnotderp · on March 5, 2017

Awesome!

Gal's variational dropout is one of the paths forward to Bayesian deep learning

clydethefrog · on March 5, 2017

This got me excited since I was expecting a critique of the shortcomings of current AI methods, in the spirit of Dreyfus [0]. It seems to me another analytical approach to reinvent the wheel of phenomenology. Is the divide (and the perceived hostility) between the continental and analytical schools so big, that the two don't even share ideas anymore to improve these AI systems?

[0] https://en.wikipedia.org/wiki/Hubert_Dreyfus%27s_views_on_ar...

btown · on March 6, 2017

This is a (significant) technical advance in unifying two approaches (Bayesian probabilistic modeling and deep learning) for well-defined machine learning problems. It makes no claims about the philosophy and design of artificial general intelligence.

terrahutte · on March 6, 2017

Resolving uncertainty is a deeply human trait, though usually it just gives us a false sense of confidence in what we think we know.

sabertoothed · on March 7, 2017

Why is resolving uncertainty a human trait? Human, as in non-humans don't do it?

Why would resolving uncertainty lead to a false sense of confidence?

Without some explanations it's impossible to understand what you want to say.