Neural Turing Machines (2014)

nickpsecurity · on May 21, 2016

Interesting. Reminds me of old attempts on using NN's or ML techniques to derive programs from data kind of merged with fuzzy-logic controllers. Here's two links from the analog, Turing-complete NN's they cited in the paper:

http://binds.cs.umass.edu/anna_cp.html

http://research.cs.queensu.ca/home/akl/cisc879/papers/PAPERS...

It would be interesting to see designs like theirs and others we see posted trained until they got really good at application area, then synthesized into digital or analog NN implementations with plenty of resiliency. The components are so simple they can run with few gates or other components. Anyone studying NN's know what current state of the art is for synthesizing efficient HW from a given NN model? With or without ability to continue to learn/improve.

petra · on May 21, 2016

There was work done by Doug Burger from microsoft, spliting an algorithm to between a digital Cpu and a neural compute unit(simulated in analog). In algorithms that we're not bottlenecked by the digital cpu, they've even seen gains of 50x in perf/w.

But most algorithms they tested we're bottlenecked by the digital part, so the average benefit was ~3x.

But maybe if you start the design using this "turing neural network", you could see consistently large gains.

pron · on May 22, 2016

Is it just me or is anyone else bothered by the lack of theory here? The results are great, but seem so heuristical and ad-hoc, like somebody tried lots of things and one seemed to work, but there's no clear idea as to why, and certainly no general theory tying things together.

zodiac · on May 22, 2016

I can't speak to the RL methods used here, but wrt to the architecture they designed, I think it might seem less ad-hoc if you know that generally speaking new neural net architectures work well when they

1) can do some useful computation

2) with a good loss function

3) with a small number of parameters

4) in a differentiable way

5) where the gradients can flow

So for instance LSTMs are better than RNNs due to (5), convnets are good due to (3), etc.

argonaut · on May 22, 2016

You've just numbered 5 different rules of thumb... That hardly supports your idea that things are not adhoc.

zodiac · on May 22, 2016

Firstly, (1) and (2) are not rules of thumbs for building nice architectures, but basic properties about supervised learning and what kind of problems it can solved.

Even if it were 5 rules of thumb instead of 3, "design an architecture to satisfy 5 rules" is a lot less ad-hoc than "somebody tried lots of things and one seemed to work", which was what I was replying to. We also know why each of these rules are important.

argonaut · on May 22, 2016

I'll concede that 1) and 2) are basic properties that an educated layperson would understand after reading a few ML blog posts. Hardly a working theory of neural nets.

They're still rules of thumb. You've argued that they are less adhoc than if they didn't exist. That's misses the point: that is still enormously adhoc.

zardo · on May 23, 2016

Rigour should be a signal to the historian that the maps have been made, and the real explorers have gone elsewhere.

pron · on May 23, 2016

I don't think "real mathematicians" do anything less worthy, novel or even revolutionary than "real explorers". Newton's contribution wasn't in collecting data about the motion of the planets or even in discovering patterns in that motion, but in uncovering laws. At this point in time (more precisely, in the past three decades), the study of neural networks feels like ancient astronomy (aside from some early theoretical results); we're in the data-collection and pattern-search phase. The scientific revolution is yet to happen.

zardo · on May 23, 2016

>we're in the data-collection and pattern-search phase. The scientific revolution is yet to happen.

Sure, and I would call that an age of exploration.

I just take issue with criticizing work in a field with no great explanatory theory, for a lack of theory. There are only two routes a researcher can take to address the criticism, abandon work in the field, or come up with the missing breakthrough theory.

It's sort of like saying no one should have bothered with biology until after Darwin figured out what was going on.

pron · on May 24, 2016

You're right. I didn't mean it as criticism, although it may have come out this way, only I decided to read a modern NN paper after not having touched NNs for almost 20 years, and was surprised to learn that there's little if any new theory.

pron · on May 22, 2016

Sure, but that's engineering (inspired by what theory of basic NN we do have), not theory.

argonaut · on May 22, 2016

Deep learning is very adhoc and experimental right now. Anyone reading through the literature would see that. That's why I'm pessimistic about AGI, but very optimistic about short-mid-term industry applications.

visarga · on May 22, 2016

It's very experimental. The great thing about them is that they open the possibility of building new types of interfaces like memory, stack, tape, database or search engine that work with neural networks in inference.

See https://arxiv.org/pdf/1505.00521v3.pdf

Houshalter · on May 22, 2016

It's not really ad hoc. Neural nets are well studied, but they have very limited memory. They attempted to solve this problem by giving a neural net an unlimited external tape memory, inspired by Turing machines.

In order to train a neural net with backpropagation, every step has to be completely continuous and differentiable. So they made a continuous valued tape.

argonaut · on May 22, 2016

> inspired by Turing machines

This is the very spirit of ad-hoc engineering and trial and error: trying some architecture out because it resembles / has high-level connections to some other unrelated concept.

zodiac · on May 22, 2016

You deliberately ignored the other part of his explanation. They didn't try out using an external tape just because they were inspired by Turing machines, but because previous RNN neural net architectures had a problem in that memory could only be stored in the hidden states, which is expensive. "Finding a limitation in existing architectures and designing a new architecture that avoids those limitations" is not ad-hoc.

Additionally, the concept of a Turing machine is hardly "unrelated" to the goal of creating architectures that can learn efficiently, since the registers + external memory architecture of computers has proven to be useful for cheaply and quickly doing real-life tasks using short programs, something we would of course want our trained ML models to do.

argonaut · on May 22, 2016

Completely agreed with pron's comment. Also, the process by which they designed the new architecture is adhoc. They saw a problem, and then tried a bunch of solutions, and got something to work. Sure, their effort was guided by intuition, but that is adhoc. My point is, there was no theory they worked out saying "you should do this," or "these are the non-empirical reasons this method is superior."

I'm not negative on neural nets. I actually like that the field is relatively shallow and fast-moving. I just dislike it when people on HN valorize neural nets to be more than they are.

zodiac · on May 23, 2016

You can't just say that things are either adhoc or not adhoc - it's a spectrum. On one end we will have engineers coming up with architectures by consulting a RNG and then testing out those architectures. On the other end we will have a theory, the theory will make 100% good predictions about which designs will perform well, and we can just send send those designs to a factory to fabricate into asics, no testing needed.

The adhoc-ness of how we design neural net architectures, CPUs, GPUs, high-temperature suerconductors, etc all fall on this spectrum, just at different places.

pron's original comment made it sound like designing NNs were on the extreme ad-hoc end of the spectrum, I was just refuting that.

argonaut · on May 23, 2016

pron makes no claim that neural nets are on the extreme end of ad hoc.

zodiac · on May 25, 2016

> like somebody tried lots of things and one seemed to work

argonaut · on May 26, 2016

Which is a perfectly accurate description of many architectural improvements in deep learning (e.g. residual networks).

zodiac · on May 26, 2016

Yeah, but it sounds like the extreme end of ad-hoc.

pron · on May 22, 2016

This isn't shooting completely in the dark, and obviously there's clear motivation, but motivation, inspiration and engineering are still not theory.

Let me put it another way: that paper reads like a technical report or a paper in an experimental science. It does not all resemble a "normal" academic CS or math paper. That's fine, maybe that's the stage this discipline is at, but I was surprised to see that there's still no new theory since when I worked with NNs in the 90s.

zodiac · on May 23, 2016

I agree with this.

I think it will be a while before we get a decent theory, though, and it's not because no one is trying, but it seems too hard to me. For instance, the way I see it, we don't even know very well how the low-dimensional space of (for example) "naturally occurring pictures" is embedded into the high-dimensional space of "all possible pictures". So we can't even precisely state what we want our NNs to do, much less why they work well. Put another way, if we ask a mathematician "so why is it that when I use gradient descent on this architecture, I perform reasonably well on unseen examples?", she would reply, "ok, what distribution did you draw your training and unseen examples from?", and we can't answer that.

kikill · on May 22, 2016

funny, i was just reading this article on medium, about NTM https://medium.com/snips-ai/ntm-lasagne-a-library-for-neural...

albi_lander · on May 22, 2016

Thanks for sharing, very nice article indeed. Looks like the guy works here --> https://snips.ai/. Their product looks promising and branded as an "intelligent memory", would be interesting to know if they're using an NTM implementation in their product.