> RL/DRL assumes world is Markovian, i.e. past doesn't matter between two states...

bitL · on Jan 5, 2018

Likely as value function approximators for one piece of the whole algorithm (as is the case with DQN/DDQN). However the main algorithm is likely using variation of Bellman equation, that assumes Markovian property and gives strong guarantees about convergence.

gwern · on Jan 5, 2018

If you're using DQN or pretty much anything in DRL, you don't have any guarantees about convergence in the first place, and using a RNN does give you the history summary you need (at least up to the minimum error achievable with that fixed-length summary, not that that is any more likely to converge than the overall DRL algo is).

bitL · on Jan 5, 2018

I meant that under Markovian assumption value iteration used for Bellman equation is guaranteed to converge. So it makes math people happy, even if such property doesn't hold in the real world nor in the problem they try to solve, and the "deep" in DRL is just heuristics, though surprisingly working in many cases.