> There's no doubt that with further refinement, we'll soon see AI play Go at a ...

johnloeber · on March 15, 2016

AlphaGo was actually only trained on publicly available amateur (that is, strong amateur) games. After that, AlphaGo was trained by running a huge number of games against itself (reinforcement learning).

A priori, this makes sense: you don't need to train on humans to get a better understanding of the game tree. (See any number of other AIs that have learned to play games from scratch, given nothing but an optimization function.)

aquadrop · on March 15, 2016

Yes, but is it known if there's some limit to what you can reach doing this? I mean, if they trained it on games of bad amateur players instead of good, and then played itself, will it keep improving continuously to the current level or hit some barrier?

johnloeber · on March 15, 2016

That's why they only initially trained it on human players, and afterwards, they trained it on itself. I would guess (strongly emphasize: guess) that they trained it on humans just to set initial parameters and to give it an overview of the structure and common techniques. It would've probably been possible to train AlphaGo on itself from scratch, but it would've taken much longer -- amateur play provides a useful shortcut.

I don't think there is a theoretical upper limit on this kind of learning. If you do it sufficiently broadly, you will continuously improve your model over time. I suppose it depends to what extent you're willing to explicitly explore the game tree itself.

relic · on March 15, 2016

There is always a risk of getting stuck in a local maxima, thinking you've found an optimal way of playing, so you'd need more data that presents different strategies, I'd think.

krig · on March 15, 2016

It is already mainly training by playing against itself:

https://googleblog.blogspot.se/2016/01/alphago-machine-learn...

> To do this, AlphaGo learned to discover new strategies for itself, by playing thousands of games between its neural networks, and adjusting the connections using a trial-and-error process known as reinforcement learning.

aquadrop · on March 15, 2016

It's still based on human games. It plays itself but the way it plays was inherited from human. I wonder if there is some fundamental barrier to what you can reach with reinforcement depending on your base.

aflinik · on March 15, 2016

Having it learn on human games was just a way of speeding up the initialization process before running reinforcement learning, it didn't limit the state tree that was being searched later on.

relic · on March 15, 2016

It is based on human games until it can explore well enough to sufficiently break away from local optimums.

StreamBright · on March 15, 2016

It already went beyond human level, look for Go players commenting on the game, citing that they would have never thought about steps that the AI made. In a sense it brought new strategies to the table that humans can learn and apply in human vs human games.

aquadrop · on March 15, 2016

Yes, but how far can it go beyond human level? Will it be slight margin, so it can win 4-1, or it will soon became able to beat top players with 1,2,10 stones handicap?

Cookingboy · on March 15, 2016

Some high level pros have stated that they would need a 4 stone handicap to beat the "perfect player", i,e "God of Go", so that would probably put a skill ceiling on this.