Hacker News new | past | comments | ask | show | jobs | submit login

It seems that my original comment is unclear. My apologies for the ambiguity. I did not mean that they do not need any expert data, but that part of the training did not require a training data set.

They definitely need training data to learn the value function, but training the policy network is based on self-play. While MCTS is not new, I believe bootstrapping reinforcement learning with self-play to train a policy network that guides the MCTS is novel.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: