One thing you don't mention in your paper reading method is a stopping criterion. I'm opening and doing some level of "looking at" --- not necessarily reading! --- probably 10-20 papers a week. More if you count revisits to papers that I've "looked at" before.
The question is, how do you decide which papers to promote to further attention? It's always a ranking problem, because there's always other work you could be looking at. You start your reading method at the point at which you've already decided to read the paper. But how do you get to that point?
(I can describe how I decide, but it's not exactly easy to replicate. It's a pretty insidery perspective. Here's how it played out for this paper.
I looked at this paper soon after it was published because I follow the first author on Twitter. When I opened the paper I looked at who was on it. I know Phil, and I know the sort of work Ed has been doing. I glanced over the paper and looked at the experiments, and saw that it was similar to the neural turing machine and algorithm learning line of work, that's still being evaluated on toy problems. This work is interesting but I'm not doing research on these things, so I won't commit to learning it while it's moved <1 mountain empirically. I did read the idea enough to contrast it with Chris Dyer's stack LSTM parser, which is a model that performs very well empirically. I checked the related work for criticism/comment on that work, and the paper said it's inspired by it. Cool. I'll watch for future empirical evaluations.
In total I probably spent about 10-15 minutes looking at this paper. This is enough for me to remember I'd seen the work, and help me understand future related ideas a little bit better.)
It was unsaid, but reading between the lines, after the 'First Pass', if you don't find it useful then stop. Based on the goals of the first pass, you should know enough and have not spent much time on it.
Great post! Thank you for taking time out of your busy life to write this. I think it's small gestures of kindness like this that advance the field of artificial intelligence, just as much as the large discoveries. Bookmarked!
This is awesome, it also touches the problem - scientific paper about algorithms without the accompanying code that implements it is hard to use/reproduce.
This website crashed my browser and I lost a few tabs I have been saving, and am unable to recover them. It does this even with JavaScript off.
Stack machines are really cool. Are they computationally efficient though? As the stacks grow bigger the number of possible stacks to keep track of grows exponentially doesn't it? Or do they only keep track of some of them?
Neural Stack Machines have a slightly more limited memory. Whereas in theory Neural Turing Machines can learn anything, Neural Stack Machines focus on algorithms that are conducive to stacks. Phil Blunsom addresses this some in his recent Russia talk. https://www.youtube.com/watch?v=-WPP9f1P-Xc
They are similar in that way. Typically an LSTM is used to control the neural stack... so the stack sortof "sits on top of" the LSTM's memory... allowing it to be (in theory) infinite.
The question is, how do you decide which papers to promote to further attention? It's always a ranking problem, because there's always other work you could be looking at. You start your reading method at the point at which you've already decided to read the paper. But how do you get to that point?
(I can describe how I decide, but it's not exactly easy to replicate. It's a pretty insidery perspective. Here's how it played out for this paper.
I looked at this paper soon after it was published because I follow the first author on Twitter. When I opened the paper I looked at who was on it. I know Phil, and I know the sort of work Ed has been doing. I glanced over the paper and looked at the experiments, and saw that it was similar to the neural turing machine and algorithm learning line of work, that's still being evaluated on toy problems. This work is interesting but I'm not doing research on these things, so I won't commit to learning it while it's moved <1 mountain empirically. I did read the idea enough to contrast it with Chris Dyer's stack LSTM parser, which is a model that performs very well empirically. I checked the related work for criticism/comment on that work, and the paper said it's inspired by it. Cool. I'll watch for future empirical evaluations.
In total I probably spent about 10-15 minutes looking at this paper. This is enough for me to remember I'd seen the work, and help me understand future related ideas a little bit better.)