Random feedback weights support learning in deep neural networks

dthal · on Nov 29, 2014

This reminds me of something that came up in Andrew Ng's online ML class. He said that it is important to check the correctness of your gradient calculation in backprop (by comparing it to a finite difference of the loss) because if you have a bug there, your algorithm might more or less work anyway, making it hard to tell that there was a bug. Apparently you can still get sensible output even with an incorrect gradient.

ezy · on Nov 28, 2014

On reading this, my first question is about the properties of the "random" feedback matrix. They illustrate what is happening using a tiny 1-width machine and a "random" matrix of "1". It seems like some analysis needs to be done on what kind of "random" is most appropriate to replace the gradient update for larger machines. There could be something really interesting going on such that you could generate some optimal non-random B according to whatever the network topology is.

_3u10 · on Nov 28, 2014

The implications of this are huge, it should drastically reduce processing time for neural nets. I wonder if given this if networks could be updated asynchronously/continuously.

benanne · on Nov 28, 2014

I don't really understand how it would reduce processing time, could you elaborate?

The main implications seem to be for neuroscience, as far as I can tell. Backprop is considered biologically implausible because it requires either bidirectional communication over synapses (which doesn't happen) or weight sharing between neurons. But this allows the forward and backward connections to be decoupled (i.e. they are different synapses).

This is really interesting stuff, my first reaction was "why does this even work?" I think I still don't really fully understand what's going on.

Jonanin · on Nov 29, 2014

> Backprop is considered biologically implausible

This is not true. See Neural Back propagation [1]. There are known mechanisms for backwards feedback between neural connections, for example Spike Timing Dependent Plasticity - where neural inputs that are well correlated in time and potential to output firings are strengthened over time. These phenomena are vital to learning and neural development.

[1] http://en.m.wikipedia.org/wiki/Neural_backpropagation

Houshalter · on Nov 29, 2014

Yes but that's not really anything like the backpropagation algorithm in artificial neural networks.

bearzoo · on Nov 28, 2014

From reading the abstract it seems that they are claiming that introducing some randomness into your gradient of weight changes allows for the quicker convergence of solution - I did not read the paper. I also don't exactly understand why it works - it sounds like they are claiming traditional back prop has room for improvement.

Houshalter · on Nov 29, 2014

That's a very old strategy called jittering (also see stochastic gradient descent.)

This is something entirely different. They are not doing regular backpropagation at all, but somehow using neurons to learn how to backpropagate values. I haven't read the paper yet, just read their slides earlier, so that might not be correct.

singularity2001 · on Nov 29, 2014

Revolutionary paper indeed, if this idea generalizes to bigger deeper networks! (Then) why hasn't it been discovered before? It's like the Cambrian explosion of neuron networks, very exciting times.

singularity2001 · on Nov 29, 2014

okay but how does this save computing time? computing random matrices is not faster than (implicitly) transposing the weight matrix. so it only has philosophical implications, right?

Ok, got it: It will simplify the approach of how to create hardware based neural networks! no more complicated look-ups of the transposed weight matrix needed.

MarkPNeyer · on Nov 28, 2014

P = NP in the presence of a random oracle.

Houshalter · on Nov 28, 2014

No it doesn't? What on Earth are you referring to?

Madmallard · on Nov 28, 2014

keyword: oracle

sgt101 · on Nov 28, 2014

random oracle - the random thing is what throws me...