> Won't the winner be the one who just takes AlphaGo's recommended move every time without changing anything?
No. Consider this scheme.
I take my copy of AlphaGo and for a full year, I'll build a database of all opening positions AlphaGo is willing to explore. I'll rank these opening positions from "best" (Black-wins with most consistency) to "worst" (White-wins with most consistency).
I'll put all of this information into a 16TB database on a singular, $400 16TB Hard drive, and load it into my computer during the contest. https://www.newegg.com/p/1Z4-002P-015K6
If you dare to pick AlphaGo's best move, you will lose. Because I already know which moves AlphaGo will take, and I already checked to find all of the positions AlphaGo wants to play (but loses anyway).
The only way you can equalize the field is if you yourself ALSO build a 16TB database to consult and override AlphaGo's instincts during the game. If you see AlphaGo wanting to play "losing position #6234115", you'll tell AlphaGo to "search harder" and find another move instead.
This comment sounds reasonable, but shows that you don't understand the challenge of Go on computers very well.
Go does not have openings. It has reasonable choices to make, with an insane branching factor, with few moves making much of a difference by themselves. Therefore your database will only extend a few moves, and all of the positions that it winds up with will still be very close to even. So your database confers very little advantage.
If you have alpha-go play itself a thousand times, it is unlikely that by move 10 you will wind up with the same board position twice.
This is exactly the problem that made Go so hard for computers in the first place. Alpha-beta search is useless within practical limitations of computer hardware.
That said in a different game, such as chess, your strategy would work very well indeed. (Which is why all decent computer programs have an opening book.)
There are 381 opening moves in Go, but really only 96 because of symmetry. 96 (opening moves) x 380 responses x 379 x 378 x 377 == ~2 Trillion positions after 5 ply.
These 2-trillion positions will easily fit in a 16TB hard drive for $400. That's 8-bytes per position, so you probably can get there with more symmetries and some compression applied.
----------
You're thinking too much like a human. There's no Go-openings in the age of Human-Go. But in the age of Cyborg-Go where 16TB hard drives are allowed, we can begin to exhaustively build openings.
We even have a super-human AI that can automatically, and algorithmically, explore this opening book. We can build AWS-instances with V100 Tensor cores to use neural-nets to explore all of these positions now.
-------------
> If you have alpha-go play itself a thousand times, it is unlikely that by move 10 you will wind up with the same board position twice.
Alpha-Go doesn't seem to implement much randomness at all into the moves it plays. The source of randomness is in time-controls (AlphaGo may choose MoveX before 30 seconds of analysis, or MoveY after 30 seconds of analysis), but this is a fairly constrained number of moves.
Play alphaGo by itself a thousand times, at precisely the same time MCTS-controls (say: 1-million nodes visited in the MCTS tree), and it will probably play the same game 1000 times in a row.
This makes AlphaGo extremely prone to opening database "attacks". Which is why I am using opening books as an easy example for how to beat a particular AlphaGo network. At least, until AlphaGo updates its algorithm for more random play.
If the goal is "Beat AlphaGo" in a game, then the opening book construction is far, far simpler. Even with random elements (ex: AlphaGo picks randomly from the top 10 best positions it generates), that is far more constrained than a full 381 x 380 x 379 x ... style opening book.
No, you're thinking too much like someone who understands chess but not Go.
Suppose that we built an opening book with a trillion reasonable courses of action on it. Each one analyzed well. As you have discovered, you will only go a few ply into the game. And all of the positions that you will be directed towards will have only a small edge.
Instead put a tiny fraction of the computing power necessary to build this book into self-training. You will get a better internal model and therefore a significantly stronger computer player. (That is how alpha-go was built in the first place.) This option will produce much better results for far less effort, and again leaves no role for a human to do anything useful.
The fact that a memorized opening book is useless has nothing to do with human vs computer vs cyborg. It has to do with the characteristics of the game. In chess, it is useful to memorize openings and both computers and humans do it. In Go it is a waste of energy, and it is a waste for both computers and humans.
I want to add that opening books are not THAT effective in computer chess either. Yes, they are significantly more effective than in computer go (because of the smaller branching factor and greater role of tactics). However, exponential increase in game tree size is crazy, even in chess. Thus, opening books really can't take you THAT far into a chess game (10 moves). An engine with a worse book but better search/eval will usually win.
The main purpose of opening books is to beat any player who is strictly relying upon Stockfish (or LeelaZero's) output to play the game. Because these engines play very deterministically, you can beat people who just play's Stockfish #1 (or LeelaZero #1) move over and over.
I think the main issue with your initial statement was that you said you could beat someone just taking AlphaGo's advice on every move. Usually when people talk about opening books, they think of the opening book as being part of the engine's decision making process, not an addition to it. Usually this is literally the case, as in chess where I can add an opening book to Stockfish's runtime options.
In my opinion, it's really strange to describe this as an expert "beating" AlphaGo, when really it's just a technique for making AlphaGo stronger than it is without a huge pre-calculated cache of moves.
fwiw, I'm with you. Excuse the gratuitous war analogy, but I suspect the approach of OP is akin to talking about which first 8 steps to take (north-east, SW, S) when going into battle -- it's such a small scope of the whole event that it's pointless, and talking about those first steps makes one seem naive to the actual holistic task
If the OP spent a month learning Go, I am sure that it would make sense to him as well. Work through a series like https://senseis.xmp.net/?LearnToPlayGoSeries while playing Go regularly against a variety of opponents. Before book 3 it should be obvious.
I don't mean to offend you, but it seems you've been doing something wrong -- in my opinion, in more than a month you could've advanced much further than 15k.
If you like, I could have a look at several your lost games and maybe suggest how to improve. Just a 3k at kgs, but still.
For the near term future, LeelaZero is the best public engine at computer Go.
This means if you enter a hypothetical "Cyborg" Go competition (computer-assistance allowed), the majority of newbies will simply be playing LeelaZero #1 plays over and over again.
You don't need to build an exhaustive opening book covering all possible moves. You only need to pick say, the top 5 moves LeelaZero ever considers. If you spend ~16-bytes per position and store 1-trillion positions on a 16TB Hard Drive, you'll be able to exhaustively map the top5 moves LeelaZero considers into 17-ply.
From there, you pick the positions that LeelaZero thinks its winning in, but in actuality is losing. You have a map towards 1-trillion positions to choose from, and your opponent (if they only pick from the top5 best moves LeelaZero ever outputs) will walk into your trap.
---------
As long as your opponent picks the top 5-moves from LeelaZero, you'll have the map towards victory. I think you're severely underestimating the abilities of a simple, dumb, opening database.
> Instead put a tiny fraction of the computing power necessary to build this book into self-training.
Do you believe that AlphaZero could continue to improve dramatically with another 6-months of training? Or if it can improve at another 6-months after that? At some point, the network will reach a local maximum, and it will be unable to improve beyond that.
Characterizing AlphaZero's moves through big-data analysis is innately going to become useful as self-training plateaus. Even Google wasn't able to get more than a few months of training in before the plateau.
At which point, it will be more reasonable to characterize the weaknesses of the network and build an opening book. Avoid the positions that the network was unable to learn about. Etc. etc.
Opening books, at a minimum, would grossly improve AlphaZero's play at competitive levels. Anyone with an opening book of AlphaZero's mistakes will be able to push AlphaZero into a mistaken position.
AlphaGo spent months training and continued to improve for the whole time. Its improvement slowed, just as it takes more work for humans to get from master to international master than it does from D-level player to C-level player. But it did not stop improving.
And then AlphaZero was better than AlphaGo after around a day of self-training.
Furthermore you are arguing for an opening book without considering how small an advantage an opening book would be. As I have said repeatedly, an opening book takes a tremendous amount of work to generate, will only go a few ply in, and the positions it directs you to will only be a tiny bit better.
Therefore for the foreseeable future, more training and better algorithms will produce better results than trying to create an opening book. Theoretically this could change. But that day will not be today or this year. I would be astonished if it happened during this decade. I would be surprised if it happened in a lifetime.
Your proposed approach is an excellent one for many games. But not for Go.
The plateau is real. I'm not sure if continuous self-play will lead to continuous progress for all of eternity. The system is clearly slowing down in self-learning.
I'm not trying to cast doubt upon reinforcement learning / MCTS / Neural Nets in the game of Go. It is clearly the best methodology we got today.
But anyone who has any experience with neural nets knows about the local-maxima problem. ALL neural nets reach a local maxima eventually. Once this point is reached, you have to rely upon other methodologies to improve playing strength.
Assuming Elo-growth for all time using a singular methodology is naive. We will go very, very far with Deep Learning, but are you satisfied with that limit? Other open questions remain: Go is very far away from being a solved game, even with a magical machine that plays 2000+ Elo stronger than humans.
Gosh, it's sure clear that you don't know what you're talking about.
At 5 ply, the complexity of the game hasn't started in any meaningful way. In a typical game, that's 4 corner moves, and then one of a: an approach to a corner (kakari), b: an enclosure (shimari), c: a wedge (waruichi) or d: creating a side framework (such as the Chinese fuseki or Sanrensei). There are some odd opening such as tengen, or corner-corner-corner-kakari which typically turns into a sente fight, but 99% of games will fall into the aforementioned pattern. The database you describe is about as useful as a database of amateur games, since most games, including AlphaGo's games, follow just a few basic openings that early, and even amateurs can play these first few moves "correctly".
Even if you get out to 10 ply you're still only getting partway into a single joseki sequence, often leaving three whole corners of the board which haven't even been approached, so this database still isn't very useful.
Incidentally, your numbers are also wrong. Symmetry reduces the first move to 55 possibilities, not 96, and there are 361 points on the board, not 381.
> Alpha-Go doesn't seem to implement much randomness at all into the moves it plays. The source of randomness is in time-controls (AlphaGo may choose MoveX before 30 seconds of analysis, or MoveY after 30 seconds of analysis), but this is a fairly constrained number of moves.
The basis of reinforcement learning algorithms is the exploratory nature of learning due to the initial application of largely random moves.
Only after some time the agent is given confidence into his learned ability and grafually moved into a more deterministic behaviour mode.
This is the exact opposite of your statement. Star Craft players have noted that the fleet of different AlphaStar instances training in ensemble observed very different behaviour due to this property of RL.
So you use perfect analysis to know, 5 moves deep, how to get the biggest advantage possible against AlphaGo. This is already pretty hard to analyze so deeply.
Then you wring out a tiny advantage-- a fraction of a stone. It's a small benefit compared to building other parts of understanding of the game.
So your tool of choice will be AlphaGo plus an algorithmically generated opening book? That's not meaningfully different from "everyone will bring the best available computer program" and again, it's a task for computer programmers, not Go grandmasters.
Human + computer might still beat computer in Go - this was true for a few years in chess, and even now to some extent in correspondence chess - but what you describe isn't really that.
Oh yeah? Well I'll just devote multiple computers and build an even bigger database with even more hardware and you'll never beat me, nyah nyah.
All you have done is establish a computational arms to see whose computing rig wins when you press the 'pick best move' button. You're not playing Go any more, you're playing Database Administrator.
No. Consider this scheme.
I take my copy of AlphaGo and for a full year, I'll build a database of all opening positions AlphaGo is willing to explore. I'll rank these opening positions from "best" (Black-wins with most consistency) to "worst" (White-wins with most consistency).
I'll put all of this information into a 16TB database on a singular, $400 16TB Hard drive, and load it into my computer during the contest. https://www.newegg.com/p/1Z4-002P-015K6
If you dare to pick AlphaGo's best move, you will lose. Because I already know which moves AlphaGo will take, and I already checked to find all of the positions AlphaGo wants to play (but loses anyway).
The only way you can equalize the field is if you yourself ALSO build a 16TB database to consult and override AlphaGo's instincts during the game. If you see AlphaGo wanting to play "losing position #6234115", you'll tell AlphaGo to "search harder" and find another move instead.
Good luck.