Thanks for pointing out this paper. We are not aware of this paper, as it was published only a few weeks before our publication date. Upon going through this paper, it seems like it is an extension of "Imagination-Augmented Agents for Deep Reinforcement Learning" (from Weber et al 2017, which btw, is an _amazing_ paper I can highly recommend, or even to just watch Theo's recorded talk at NIPS2017). Going through and preparing for the publication process for ML papers take time, and in some cases even months. In our case, it certainly took months to build the interactive article and go through many rounds of editing and revisions, and also test that the interactive demos are working well for all sorts of test cases, tablets, smartphones, browsers, in addition to just the arxiv pdf.
That being said, here are a few differences I noticed:
- We minimize the parameters needed for the controller module, and solve for the parameters using Evolution Strategies.
- We try to replace the actual environment entirely with the generated environment, discuss when this approach will fail, and also suggest practical methods to make this work better. (This part of our work is not really discussed in detail in this particular blog post here.)
- Rather than create new architectures, we take on a minimalist design approach. We tried to keep the building blocks as simple as possible, sticking to plain vanilla VAEs and MDN-RNNs, tiny linear layers for controllers, to reinforce key concepts clearly. For instance, when we were training the VAE, we didn't even use batchnorm, and just used L2 loss, so that someone implementing the method for similar problems would have less issues getting it to work, and didn't have to spend too much time tweaking it or tuning hyperparameters. This might come at the expense of performance, but we feel it is the right tradeoff.
- We wrote the article with clarity in mind, and invested considerable effort to communicate the ideas as clearly as possible, with the hope that readers with some ML background can understand, and even reproduce and extend some of the experiments from first principles.
I'm also curious what your thoughts are on this paper: https://arxiv.org/abs/1803.10760
As a hobbyist/independent researcher I think it's really interesting to compare the two in terms of the way you model the environment and the parallels with neuroscience. It seems like their use of a DNC could address some of the points you mention about the limited historical capacity of LSTMs and catastrophic forgetting.
I was very glad when I saw on github where you said the whole system could be trained in a reasonably short amount of time, because it makes it so much more feasible to try out and experiment with it as an individual. Awesome paper, and I thought the way the material was presented was excellent and made for a great read. I hope this kind of interactive presentation become more common in the future!
That being said, here are a few differences I noticed:
- We minimize the parameters needed for the controller module, and solve for the parameters using Evolution Strategies.
- We try to replace the actual environment entirely with the generated environment, discuss when this approach will fail, and also suggest practical methods to make this work better. (This part of our work is not really discussed in detail in this particular blog post here.)
- Rather than create new architectures, we take on a minimalist design approach. We tried to keep the building blocks as simple as possible, sticking to plain vanilla VAEs and MDN-RNNs, tiny linear layers for controllers, to reinforce key concepts clearly. For instance, when we were training the VAE, we didn't even use batchnorm, and just used L2 loss, so that someone implementing the method for similar problems would have less issues getting it to work, and didn't have to spend too much time tweaking it or tuning hyperparameters. This might come at the expense of performance, but we feel it is the right tradeoff.
- We wrote the article with clarity in mind, and invested considerable effort to communicate the ideas as clearly as possible, with the hope that readers with some ML background can understand, and even reproduce and extend some of the experiments from first principles.