Hacker News new | past | comments | ask | show | jobs | submit login

Yes, Hal Daume is referring to the issue I brought up. I'm not interpreting his comments as referring to issues with training the model jointly - he's referring to exactly what I'm describing - never having even seen the expert make a mistake. The only solution is to generate trajectories more intelligently (which is in line with Daume's comments).



Yes, it is true. In the case of Super Mario he does the learning by simulating level-K BFS from positions that resulted in errors (unseen states) and thus minimizes the regret for the next K moves.

Although, if you checkout his papers, the problems I've talked about, when you have more than enough data and when you know you should be able to generalize well you still can get subpar performance if you don't optimize jointly. AlphaGo model isn't optimizied jointly but its power mostly lies in the extreme representation ability of deep neural networks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: