Hi, I'm one of the authors of this paper (https://arxiv.org/abs/1803.10122, http...

jmw125 · on April 18, 2018

Do you not feel like ML researchers have a duty not to inflate their research with loaded, anthropmorphic terms that could me misinterpreted by the public? I'm talking particularly about using words like 'dreaming' and 'imagination' for what is essentially prediction (albeit in complex sensor modalities like vision).

Similarly, your title 'World Models' is equally ambitious and deceptive. It only hints at its relation to model-based reinforcment learning, and using 'world' to mean 'rendering of a gym environment' is definitely an exaggeration.

This is not to say i don't like your work, but i am becoming increasingly frustrated by the language and habits of the newer breed of ML / robotics reseachers.

birthcert · on April 17, 2018

How did you get into contact with Schmidhuber for co-authoring? What stage was the research at when he joined?

Were you expecting the net to generalize from dream to reality, before you wrote the paper, or did this materialize during experimentation?

Do you expect this approach is also feasible for more difficult games: higher dimensionality, longer delayed rewards?

Both congrats and thanks for writing this very accessible paper. Really found this a creative paper with a lot of inspiration, and the presentation of the results was marvelous.

(BTW: I remember you from the RNN-volleyball game. Back then you had quite some jealous detractors, telling you DeepMind would be too difficult/academic for you. You sure shut those people up!)

hardmaru · on April 18, 2018

> How did you get into contact with Schmidhuber for co-authoring? What stage was the research at when he joined?

The first time I discussed this topic with Jürgen Schmidhuber was at NIPS 2016, when he gave a talk about "Learning to Think" [1], during the break at one of the sessions, and we kept in contact afterwards.

> Were you expecting the net to generalize from dream to reality, before you wrote the paper, or did this materialize during experimentation?

When I tried this, I didn't expect this to work at all, to be honest! And in fact, as discussed in the paper, it didn't work at the beginning (the agent would just cheat the world model). That's why I tried to adjust the temperature parameter to control the stochasticity of the generated environment, and trained the agent inside a more difficult dream.

> Do you expect this approach is also feasible for more difficult games: higher dimensionality, longer delayed rewards?

I expect the iterative training approach to be promising for difficult games with higher dimensionality, where we need to use better V and M models with more capabilities and capacities (we can already find many candidates for V/M already by looking at the deep learning literature), and still train these models efficiently with backprop on GPUs/TPUs. Using policy search methods such as evolution (or even augmented random search), allow us to work only with cumulative rewards we see at the end, rather than demanding a dense reward signal at every single time step, and I think this will help cope with environments with sparse, delayed rewards. Even in the experiments in this paper, we only work with cumulative rewards at the end of each rollout, and we don't care about intermediate rewards.

> Both congrats and thanks for writing this very accessible paper. Really found this a creative paper with a lot of inspiration, and the presentation of the results was marvelous. (BTW: I remember you from the RNN-volleyball game. Back then you had quite some jealous detractors, telling you DeepMind would be too difficult/academic for you. You sure shut those people up!)

Thanks! The RNN-volleyball game from 2015 was a lot of fun to make. Back then, I trained the agents using self-play, with evolution, and I remember people telling me I should really be using DQN or something back then. Fast forward a few years, self-play is now a really popular area of research (for instance, many nice works from OpenAI and DeepMind last year), and evolution methods are really making a comeback. I think it is best to work with something you believe in, and sometimes it is okay to not pursue what everyone else is doing.

[1] On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models https://arxiv.org/abs/1511.09249

greenpresident · on April 18, 2018

Are you familiar with the current debate about predictive processing / free energy minimisation / active inference, driven by philosophers such as Clark & Hohwy on the one side and the neuroscience tribe around Friston along side them?

hardmaru · on April 18, 2018

I'm familiar with the works of Andy Clark. In particular, I found parts (but not all) of "Being There" and "Supersizing the Mind" to be interesting to read, although he does ramble on sometimes. There's an interesting article called "The Mind-Expanding Ideas of Andy Clark" that I can recommend reading:

https://www.newyorker.com/magazine/2018/04/02/the-mind-expan...

There's an older work from Hod Lipson, that is often referenced in Clark's writing, that I also found inspirational. An old TED talk (2007) from Lipson about "Building 'self-aware' robots":

https://www.ted.com/talks/hod_lipson_builds_self_aware_robot...

greenpresident · on April 18, 2018

Ha, I did not know that there had been a New Yorker profile profile of him just a few days ago. They are really jumping on the predictive processing train hard: In the same issue, there had been a profile of Metzinger [0], who plays an important role in bridging the divide between Friston's work on perception and theories of the self.

I'm very curious how that line of research will turn out. My interest comes from the behavioural economics perspective on decision making. Big names (Akerlof, Kahneman, Tirole) have approached narratives as a way to cope with multiple-selves [1] but I belief that the free energy principle, when integrated with the work by Metzinger may be able to introduce a naturalistic way to ground both preferences and the development of preferences in empirical findings from neuroscience.

[0] https://www.newyorker.com/magazine/2018/04/02/are-we-already... [1, paywalled] https://www.tandfonline.com/doi/abs/10.1080/1350178X.2017.12...

RSchaeffer · on April 17, 2018

I'm curious to know how your paper differs from Learning and Querying Fast Generative Models for Reinforcement Learning. It seems relevant, but you don't mention it iirc.

hardmaru · on April 18, 2018

Thanks for pointing out this paper. We are not aware of this paper, as it was published only a few weeks before our publication date. Upon going through this paper, it seems like it is an extension of "Imagination-Augmented Agents for Deep Reinforcement Learning" (from Weber et al 2017, which btw, is an _amazing_ paper I can highly recommend, or even to just watch Theo's recorded talk at NIPS2017). Going through and preparing for the publication process for ML papers take time, and in some cases even months. In our case, it certainly took months to build the interactive article and go through many rounds of editing and revisions, and also test that the interactive demos are working well for all sorts of test cases, tablets, smartphones, browsers, in addition to just the arxiv pdf.

That being said, here are a few differences I noticed:

- We minimize the parameters needed for the controller module, and solve for the parameters using Evolution Strategies.

- We try to replace the actual environment entirely with the generated environment, discuss when this approach will fail, and also suggest practical methods to make this work better. (This part of our work is not really discussed in detail in this particular blog post here.)

- Rather than create new architectures, we take on a minimalist design approach. We tried to keep the building blocks as simple as possible, sticking to plain vanilla VAEs and MDN-RNNs, tiny linear layers for controllers, to reinforce key concepts clearly. For instance, when we were training the VAE, we didn't even use batchnorm, and just used L2 loss, so that someone implementing the method for similar problems would have less issues getting it to work, and didn't have to spend too much time tweaking it or tuning hyperparameters. This might come at the expense of performance, but we feel it is the right tradeoff.

- We wrote the article with clarity in mind, and invested considerable effort to communicate the ideas as clearly as possible, with the hope that readers with some ML background can understand, and even reproduce and extend some of the experiments from first principles.

goolulusaurs · on April 18, 2018

I'm also curious what your thoughts are on this paper: https://arxiv.org/abs/1803.10760 As a hobbyist/independent researcher I think it's really interesting to compare the two in terms of the way you model the environment and the parallels with neuroscience. It seems like their use of a DNC could address some of the points you mention about the limited historical capacity of LSTMs and catastrophic forgetting.

I was very glad when I saw on github where you said the whole system could be trained in a reasonably short amount of time, because it makes it so much more feasible to try out and experiment with it as an individual. Awesome paper, and I thought the way the material was presented was excellent and made for a great read. I hope this kind of interactive presentation become more common in the future!

vankessel · on April 18, 2018

Fantastic paper, glad to see it is so powerful! I'm just graduating as a computer scientist and independently came up with a very similar idea. Nice to see it validated, especially with it solving previously unsolved problems. Called the latent space of the VAE "Mental Space", having similar purpose to the Vision Model.

Have you done any experiments feeding the cell states into the Controller in addition to the latent vector and hidden states? If so how did it perform?

hardmaru · on April 18, 2018

Thanks! We did try feeding both the "cell" of the LSTM in addition to the hidden state of the LSTM and the latent vector into the controller, and this works better. We discussed this in the Appendix section.

I would still encourage you to pursue your idea, since there are still lots of limitations in this model (discussed in the paper), and a lot of work remains to be done to solve more difficult problems.

nopx90 · on April 18, 2018

super impressive that it works! Have you thought to use a GAN instead of a Variational Autoencoder?