AlphaGo was actually only trained on publicly available amateur (that is, strong amateur) games. After that, AlphaGo was trained by running a huge number of games against itself (reinforcement learning).
A priori, this makes sense: you don't need to train on humans to get a better understanding of the game tree. (See any number of other AIs that have learned to play games from scratch, given nothing but an optimization function.)
Yes, but is it known if there's some limit to what you can reach doing this? I mean, if they trained it on games of bad amateur players instead of good, and then played itself, will it keep improving continuously to the current level or hit some barrier?
That's why they only initially trained it on human players, and afterwards, they trained it on itself. I would guess (strongly emphasize: guess) that they trained it on humans just to set initial parameters and to give it an overview of the structure and common techniques. It would've probably been possible to train AlphaGo on itself from scratch, but it would've taken much longer -- amateur play provides a useful shortcut.
I don't think there is a theoretical upper limit on this kind of learning. If you do it sufficiently broadly, you will continuously improve your model over time. I suppose it depends to what extent you're willing to explicitly explore the game tree itself.
There is always a risk of getting stuck in a local maxima, thinking you've found an optimal way of playing, so you'd need more data that presents different strategies, I'd think.
> To do this, AlphaGo learned to discover new strategies for itself, by playing thousands of games between its neural networks, and adjusting the connections using a trial-and-error process known as reinforcement learning.
It's still based on human games. It plays itself but the way it plays was inherited from human. I wonder if there is some fundamental barrier to what you can reach with reinforcement depending on your base.
Having it learn on human games was just a way of speeding up the initialization process before running reinforcement learning, it didn't limit the state tree that was being searched later on.
It already went beyond human level, look for Go players commenting on the game, citing that they would have never thought about steps that the AI made. In a sense it brought new strategies to the table that humans can learn and apply in human vs human games.
Yes, but how far can it go beyond human level? Will it be slight margin, so it can win 4-1, or it will soon became able to beat top players with 1,2,10 stones handicap?
Some high level pros have stated that they would need a 4 stone handicap to beat the "perfect player", i,e "God of Go", so that would probably put a skill ceiling on this.
Will we though? AlphaGo trains on human games, so can it go well beyond that level? Will it train on its own games?