I'm having trouble identifying what algorithmic innovation AlphaGo represents. It looks like a case of more_data + more_hardware. Some are making a big deal of the separate evaluation and policy networks. So, OK, you have an ensemble classifier.
The most theoretically interesting thing to me is the use of stochastic sampling to reduce the search space. Is there any discussion of how well pure Monte Carlo tree search performs here compared to the system incorporating extensive domain knowledge?
The most theoretically interesting thing to me is the use of stochastic sampling to reduce the search space. Is there any discussion of how well pure Monte Carlo tree search performs here compared to the system incorporating extensive domain knowledge?