Deep Learning: Our Miraculous Year 1990-1991

tlb · on Oct 5, 2019

I encourage reading this, not as self-promotion, but as a first-person history of what it feels like to be too early with a technology.

Someone out there is probably experimenting with something world-changing, and has all the ingredients except for a few more iterations of Moore's Law. It would feel a lot like working on deep learning in 1990. If you think you might be on this path, it's worth studying the history.

light_hue_1 · on Oct 5, 2019

Definitely don't read it as a history. It's just a lie. Schmidhuber is laying claim to a lot of things he didn't do. And is taking anything that kind of relates in words to modern techniques and claiming that he invented the technique. Even if practically his papers have nothing to do with what the words mean today and had no influence on the field.

These are basically the only outliers who claim that automatic differentiation was invented by Linnainmaa alone. Many people invented AD at the same time, and Linnainmaa was not the first. Simply naming one person is a huge disservice to the community and shows that this is just propaganda, as much of Schmidhuber's stuff is.

1. First Very Deep NNs -> This is false. Schmidhuber did not create the first deep network in 1991. This dates back at least to Fukushima for the theory and LeCun in 1989 practically.

2. Compressing / Distilling one NN into Another. Lots of people did this before 1991.

3. The Fundamental Deep Learning Problem: Vanishing / Exploding Gradients. They did publish an analysis of this, that's true.

4. Long Short-Term Memory (LSTM) Recurrent Networks. No, this was 1997.

5. Artificial Curiosity Through Adversarial Generative NNs. Absolutely not. Andrew G Barto, Richard S Sutton, and Charles W Anderson. Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems. IEEE Trans. on Systems, Man, and Cybernetics, (5):834– 846, 1983

6. Artificial Curiosity Through NNs That Maximize Learning Progress (1991) I have nothing to say to this. This isn't something that worked back in 1991 and it's not something that works today.

7. Adversarial Networks for Unsupervised Data Modeling (1991) This isn't the same idea as GANs. The idea as presented in the paper doesn't work.

8. End-To-End-Differentiable Fast Weights: NNs Learn to Program NNs (1991). Already existed and the idea as presented in the original paper doesn't work.

9. Learning Sequential Attention with NNs (1990). He uses the word attention but it's not the same mechanism as the one we use today which dates to 2010. This did not invent attention in any way.

10. Hierarchical Reinforcement Learning (1990). Their 1990 does not do hierarchical RL, their 1991 paper does something like it. This is at least contemporary with Learning to Select a Model in a Changing World by Mieczyslaw M.Kokar and Spiridon A.Reveliotis.

Please stop posting the ravings on a person who is trying to steal other people's work.

1024core · on Oct 6, 2019

> 1. First Very Deep NNs -> This is false. Schmidhuber did not create the first deep network in 1991. This dates back at least to Fukushima for the theory and LeCun in 1989 practically.

Quoting the blog post:

"Of course, Deep Learning in feedforward NNs started much earlier, with Ivakhnenko & Lapa, who published the first general, working learning algorithms for deep multilayer perceptrons with arbitrarily many layers back in 1965 [DEEP1]. For example, Ivakhnenko's paper from 1971 [DEEP2] already described a Deep Learning net with 8 layers, trained by a highly cited method still popular in the new millennium [DL2]."

Let's try to be fair and objective here. You may have an axe to grind with Schmidhuber, but that does not give you the right to take things out of context.

account73466 · on Oct 6, 2019

"First Very Deep NNs -> This is false. Schmidhuber did not create the first deep network in 1991."

Well, now it seems that you lie here since you avoid the well-known fact that Schmidhuber attributes the first deep NNS back to the 60ies and 70ies. The same goes for many of your other points.

throwawayjava · on Oct 5, 2019

Do we have a few more iterations of Moore's law?

Agebor · on Oct 5, 2019

Even if we don't, the progress is not going to stop, for example on:

- lowering the price of each chip - you can get that by more automation.

- lowering the cost of energy used by a chip - you can have that by raise of renewable energy generation and its decentralisation (and again, more automation).

The point is that automation caused by AI will start a reinforcing feedback loop where more and more work can be done more cheaply, speeding up automation itself too.

Barrin92 · on Oct 5, 2019

>The point is that automation caused by AI will start a reinforcing feedback loop where more and more work can be done more cheaply, speeding up automation itself too.

there isn't much evidence that AI has accelerated the rate of automation, and people have been saying this about information technology for the last 4 decades already. By any account, automation and growth contribution of the technologies are low by historical standards.

The primary mechanism that has kept Moore's law alive up until now is miniaturization of transistors and we're going to run into a wall on that front pretty soon.

K0SM0S · on Oct 6, 2019

I will not counterargue your main point because this is indeed a matter of debate from a 'technical' standpoint.

However, in broader economic terms, I think the idea that AI may 'accelerate' the world in general is largely indirect: for instance, by saving time and money in other areas of life (because better tools, cheaper means, infra, etc), people become more able to perform their job. There are obviously diminishing returns to such optimization, as to any natural/economic process.

legulere · on Oct 6, 2019

Automation also often means that useful jobs get turned into bullshit jobs, that stay there e.g. for political reasons, sometimes leading even to decreased efficiency.

bigred100 · on Oct 5, 2019

Yeah. Who knows how fast the growth is gonna be or how it’s gonna look but people are already working on eg communication-avoiding algorithms for matrix or tensor operations to work best in the new regime. I’m not an expert in this area but if you allow me to paraphrase of someone who is, one of the reason algorithms people have employment is that all of these things get redone over and over to exploit advances in hardware.

tlb · on Oct 5, 2019

Not for clock speed, but yes for parallelism. It might look like the Cerebras [0] wafer-scale monster becoming a commodity you could fire up 1000 of in the cloud.

[0] https://www.cerebras.net/

deepnotderp · on Oct 5, 2019

We've got a few more iterations of Moore's law for sure. After that progress will likely happen in jumps and address non-xtor bottlenecks like memory access. E.g. wafer scale integration, 3D systems, photonics, etc.

zachthewf · on Oct 5, 2019

For what it's worth I interpret the GP's statement as referring to general foundational progress in whatever field, not Moore's law specifically.

pjbk · on Oct 5, 2019

I guess many institutions and research groups can write similar accounts. Even the late 80s were somewhat productive concerning NNs and what today we call ML, just by searching publications of that era.

We had also some relatively sophisticated tools, and looking back in time one could say they were deep-learning-ish. In my personal case I did some research for weather forecasting using BPN/TDNN, Kohonen and RNNs with the Stuttgart Neural Network Simulator [0]. It allowed some flexibility creating and stacking models.

[0] http://www.ra.cs.uni-tuebingen.de/SNNS/welcome.html

shmageggy · on Oct 5, 2019

God, Schmidhuber is insufferable.

This whole account has virtually zero mention of how later techniques improved upon or innovated on his, and very little account of how his contributions were (like everyone else's) evolutions of existing methods. It reads almost like Schmidhuber or his students invented and solved everything from scratch, and nobody else has done shit since.

The guy clearly wants to be more included in the standard narrative, but being so self aggrandizing is doing him zero favors. If were capable of writing an honest, charitable account of how his work fits into a much larger field, it would be much easier to take him more seriously.

mjn · on Oct 5, 2019

I mean it's self-promotional yes, but I read this as more of a blog post about advances specifically in his own group. For the Schmidhuberian take on the broader history of deep learning, this other one's the go-to article (though it's much longer): https://arxiv.org/abs/1404.7828

Not everyone likes that article either, but it does at least extensively cite prior work, i.e. accounts for "how his contributions were (like everyone else's) evolutions of existing methods". In particular, sections 5.1–5.4 credit a large amount of work from the 1960s-80s that he considers foundational.

nafizh · on Oct 5, 2019

Really? The title itself says, "Deep Learning: Our Miraculous Year 1990-1991". It's an account of their work during that year. And he cites like a 100 articles there.

This kind of rhetoric was partly responsible for why he didn't receive the Turing award last year which he thoroughly deserved. We seem incapable of appreciating achievements of people who don't match our ideal of personality type.

account73466 · on Oct 5, 2019

There is a subset of people who does not like Schmidhuber. According to my personal observations, this subset overlaps quite a lot with people who tend to underestimate the importance of proper credit assignment.

plmu · on Oct 5, 2019

One of the early applications was pattern matching for LHC. I was in one of the groups in which some (not myself) worked on this and put the neural networks, using the just developed theory, in hardware with FPGA's.

After a few years the three (post-docs) left and founded a startup. I lost contact with them. I think they were too early for broader applications, and had left the field completely in the early 2000's, when it really took of.

Here is a book that the author of the referenced article , and the people from my group (Utrecht University), contributed to: https://link.springer.com/book/10.1007%2F978-1-4471-0877-1

alexcnwy · on Oct 5, 2019

1989/1990 was also when convolutional networks first started working with LeCun’s breakthrough paper on digit recognition.

Incredible to think how much amazing research was happening back then and wonder what research is being done now that will change our lives in the next 30 years.

bonoboTP · on Oct 5, 2019

> In surveys from the Anglosphere it does not always become clear [DLC] that Deep Learning was invented where English is not an official language.

Even if you disagree with Schmidhuber's assessment of his own importance, I think this is clearly true.

There is a certain arrogance (or not-invented-here syndrome) in the Anglosphere (or North America) towards research done elsewhere.

nafizh · on Oct 5, 2019

It was a travesty Schmidhuber didn't receive the Turing award along with Hinton, Lecun, and Bengio last year.

bonoboTP · on Oct 5, 2019

It does seem to me that there could be some bias in this award's history.

The Turing Award has been awarded every year (usually to multiple people) since 1966.

Look it up on Wikipedia. How many laureates of the 70 can you find who performed their research outside of the Angloshpere? I didn't look in detail, but after a quick glance it seems about 5 out of the 70 (Daal, Nygaard, Shamir, Naur, Sifakis)? (Or how many who grew up outside the Anglosphere?)

Maybe that reflects the true state of things and almost all of CS was developed in the Angloshpere. Even if that's so historically, I think it may induce some bias when evaluating people's contribution from outside the Anglo community and network.

Ormus · on Oct 5, 2019

No it wasn't. Stop forming opinions from uninformed internet memes.

KKKKkkkk1 · on Oct 5, 2019

It seems that Schmidhuber is claiming credit for deep learning and is implicitly comparing himself to Albert Einstein. How accurate is his assessment?

goldemerald · on Oct 5, 2019

My goal is to one day have Schmidhuber angrily claim that my research was done by him in the 90s like what happened to Ian Goodfellow [0].

[0] https://www.reddit.com/r/MachineLearning/comments/5go4sa/n_w...

account73466 · on Oct 5, 2019

Schmidhuber was more right than wrong

dgacmu · on Oct 6, 2019

Interesting that you're a new account that has only ever posted in this thread in defense of Schmidhuber.

account73466 · on Oct 6, 2019

Must be Schmidhuber then. I am not but I have thousands of DL citations on my name.

freyr · on Oct 5, 2019

That is Schmidhuber in a nutshell.

2sk21 · on Oct 6, 2019

As an old-timer in neural networks, this was interesting. However I should note that we did not call it "deep learning" back then. It was simply "neural networks".

As I write this, I am looking at the book "Parallel and Distributed Processing", (with the blue cover) an edited compilation of papers on neural networks published by the MIT Press in 1987. I myself spent the summer of 1990 implementing the back-propagation algorithm as described in chapter 8 of this book which is entitled "learning Internal Representations" by Rumelhart, Hinton and Williams.

I myself got my PhD in 1992 for coming up with an algorithm for speeding up back-propagation when the training set is imbalanced.

An Improved Algorithm for Neural Network Classification of Imbalanced Training Sets. November 1993IEEE Transactions on Neural Networks 4(6):962 - 969

jumpingmice · on Oct 5, 2019

The prominent developers of deep learning techniques within google were quite upfront that they were applying old techniques that had not been practical until massive datacenters expanded the parameter space and training power.

nullc · on Oct 5, 2019

Have they been equally upfront in their patent applications?

kerng · on Oct 5, 2019

This is pretty cool. Always interesting to see how things eventually become mainstream whereas origins go back decades, sometimes more.