Model-Based Reinforcement Learning with Neural Network Dynamics

sytelus · on Dec 1, 2017

If I understand correctly, there are the novel features of this paper:

- Use neural network to learn (state, action) -> (state, prev_state) separately.

- Use this neural network to predict state in 100-step horizon and use that prediction to chose action in RL setting.

This method reduces sample complexity of RL. The method still isn't beating model-free methods with lots of data. I think the interesting part was using micro insects which unfortunately are not available commercially and results may not be reproducible by others.

dekhn · on Dec 1, 2017

If I can read the picture properly, the microinsect is just a small plastic part (injection molded? it doesn't look printed), some servos, and a microcontroller. Any lab that's serious in this area could recreate it. https://robotics.eecs.berkeley.edu/~ronf/PAPERS/dhaldane-icr...

Ah, OK. It's a bit more than that. They had to upgrade the bot significantly with next-gen materials to achieve high speeds, https://spectrum.ieee.org/automaton/robotics/robotics-hardwa...

tw1010 · on Dec 1, 2017

Not to sound negative – this is a cool paper – but isn't this in the grey zone of being actual research? It feels like a fairly trivial application of things already known since way back to work in simulation.

nl · on Dec 1, 2017

Citation needed?

I'm not aware of any work doing this combination - but I don't follow the simulation space closely. My impression is that NNs haven't been used broadly successfully in simulation until a couple of years ago.

mike_n · on Dec 1, 2017

yes, model-based (ie - learning physics and predicting future states and planning accordingly, rather than just choosing actions that you've learned will give you rewards) RL has been in the literature for many years, see Sutton's DYNA for example.

But I think the model-based part is still considered relatively hard in real-world environments due to the large search space of possible actions/consequences.

And note that here they didn't actually use the model at runtime, they just used it as an 'expert' (rather than getting a human to provide guidance) to give a model-free agent policy a head start (see DAGGER algorithm) and reduce the number of training samples.

AstralStorm · on Dec 1, 2017

I'd they figure out how to change these models, say, an NFA representation of one, we might have something very interesting...

Best of both worlds?