How to Code and Understand DeepMind's Neural Stack Machine in Python

syllogism · on Feb 28, 2016

One thing you don't mention in your paper reading method is a stopping criterion. I'm opening and doing some level of "looking at" --- not necessarily reading! --- probably 10-20 papers a week. More if you count revisits to papers that I've "looked at" before.

The question is, how do you decide which papers to promote to further attention? It's always a ranking problem, because there's always other work you could be looking at. You start your reading method at the point at which you've already decided to read the paper. But how do you get to that point?

(I can describe how I decide, but it's not exactly easy to replicate. It's a pretty insidery perspective. Here's how it played out for this paper.

I looked at this paper soon after it was published because I follow the first author on Twitter. When I opened the paper I looked at who was on it. I know Phil, and I know the sort of work Ed has been doing. I glanced over the paper and looked at the experiments, and saw that it was similar to the neural turing machine and algorithm learning line of work, that's still being evaluated on toy problems. This work is interesting but I'm not doing research on these things, so I won't commit to learning it while it's moved <1 mountain empirically. I did read the idea enough to contrast it with Chris Dyer's stack LSTM parser, which is a model that performs very well empirically. I checked the related work for criticism/comment on that work, and the paper said it's inspired by it. Cool. I'll watch for future empirical evaluations.

In total I probably spent about 10-15 minutes looking at this paper. This is enough for me to remember I'd seen the work, and help me understand future related ideas a little bit better.)

williamtrask · on Feb 28, 2016

Totally agree with you on all of this. I think an update is in order to include.

binarymax · on Feb 28, 2016

It was unsaid, but reading between the lines, after the 'First Pass', if you don't find it useful then stop. Based on the goals of the first pass, you should know enough and have not spent much time on it.

kidgorgeous · on Feb 28, 2016

Great post! Thank you for taking time out of your busy life to write this. I think it's small gestures of kindness like this that advance the field of artificial intelligence, just as much as the large discoveries. Bookmarked!

williamtrask · on Feb 28, 2016

Thank you so much! Very kind of you to say that.

rkwasny · on Feb 28, 2016

This is awesome, it also touches the problem - scientific paper about algorithms without the accompanying code that implements it is hard to use/reproduce.

Houshalter · on Feb 28, 2016

This website crashed my browser and I lost a few tabs I have been saving, and am unable to recover them. It does this even with JavaScript off.

Stack machines are really cool. Are they computationally efficient though? As the stacks grow bigger the number of possible stacks to keep track of grows exponentially doesn't it? Or do they only keep track of some of them?

williamtrask · on Feb 28, 2016

Hey Houshalter! Thank you so much for respnoding... i think it's the trinket demos... and i think some of them autorun. I'll look into it.

fizixer · on Feb 28, 2016

I heard of Neural Turing Machines and now this is Neural Stack Machine. How are they same/different?

williamtrask · on Feb 28, 2016

Neural Stack Machines have a slightly more limited memory. Whereas in theory Neural Turing Machines can learn anything, Neural Stack Machines focus on algorithms that are conducive to stacks. Phil Blunsom addresses this some in his recent Russia talk. https://www.youtube.com/watch?v=-WPP9f1P-Xc

williamtrask · on Feb 28, 2016

actually it might have been his DLSS talk... not 100% sure the talks are very similar... http://videolectures.net/deeplearning2015_montreal/

vmorgulis · on Feb 28, 2016

Very interesting idea.

We could do superoptimization with that. For example, a superoptimized sort.

jaruche · on Feb 28, 2016

I wonder what is the diference between a neural stack and An LSTM? Aren't both keeping a state?

williamtrask · on Feb 28, 2016

They are similar in that way. Typically an LSTM is used to control the neural stack... so the stack sortof "sits on top of" the LSTM's memory... allowing it to be (in theory) infinite.

m00dy · on Feb 28, 2016

Nice post. need to read it again tough

williamtrask · on Feb 28, 2016

Thank you!