> RL is supposed to be the way to AGI Could you expand on that? The more I read ...

bitL · on Jan 5, 2018

RL/DRL assumes world is Markovian, i.e. past doesn't matter between two states, which is way too simple. It requires huge amount of tries/episodes and properly tuned exploration-exploitation ratio. It is somewhat based on biological reinforcement learning, so there might be basis in reality as it is with convolutional neural networks and visual field maps in visual cortex (even if very rough approximation). DRL is the technique that allows modeling decisions; so for predictions you have CNN/RNN/FCN, for generation GANs and for decisions DRL; together they are closest to AGI we have right now.

halflings · on Jan 5, 2018

> RL/DRL assumes world is Markovian, i.e. past doesn't matter between two states, which is way too simple.

There's plenty of RL papers using RNNs and some types of memory networks.

bitL · on Jan 5, 2018

Likely as value function approximators for one piece of the whole algorithm (as is the case with DQN/DDQN). However the main algorithm is likely using variation of Bellman equation, that assumes Markovian property and gives strong guarantees about convergence.

gwern · on Jan 5, 2018

If you're using DQN or pretty much anything in DRL, you don't have any guarantees about convergence in the first place, and using a RNN does give you the history summary you need (at least up to the minimum error achievable with that fixed-length summary, not that that is any more likely to converge than the overall DRL algo is).

bitL · on Jan 5, 2018

I meant that under Markovian assumption value iteration used for Bellman equation is guaranteed to converge. So it makes math people happy, even if such property doesn't hold in the real world nor in the problem they try to solve, and the "deep" in DRL is just heuristics, though surprisingly working in many cases.

vinn124 · on Jan 5, 2018

that is true: popular rl techniques (eg policy gradients) are very similar to "vanilla" supervised learning techniques and architectures, but they are unsupervised in the sense that they required zero human input.

alphago zero is the canonical example of tabula rasa machine learning.