It seems that my original comment is unclear. My apologies for the ambiguity. I did not mean that they do not need any expert data, but that part of the training did not require a training data set.
They definitely need training data to learn the value function, but training the policy network is based on self-play. While MCTS is not new, I believe bootstrapping reinforcement learning with self-play to train a policy network that guides the MCTS is novel.
They definitely need training data to learn the value function, but training the policy network is based on self-play. While MCTS is not new, I believe bootstrapping reinforcement learning with self-play to train a policy network that guides the MCTS is novel.