James put together a really nice summary of the ideas and the projects!
It was almost a year ago that lc0 was launched, since then the community (led by Alexander Lyashuk, author of the current engine) has taken it to a totally different level. Follow along at http://lczero.org!
Gcp has also done an amazing job with Leela Zero, with a very active community on the Go side. http://zero.sjeng.org
Of course, DeepMind really did something amazing with AlphaZero. It’s hard to overstate how dominant minimax search has been in chess. For another approach (MCTS/NN) to even be competitive with 50+ years of research is amazing. And all that without any human knowledge!
Still, Stockfish keeps on improving - Stockfish 10 is significantly stronger than the version AlphaZero played in the paper (no fault of DeepMind; SF just improves quickly). We need a public exhibition match to setttle the score, ideally with some GM commentary :). To complete the links you can watch Stockfish improve here: http://tests.stockfishchess.org.
MCTS is not a traditional depth first minimax framework. Key concepts like alpha-beta don’t apply. Although it is proven to converge to minimax in the limit, the game trees are so large this is not relevant. You could use the network in a minimax searcher, but it’s so much slower than a conventional evaluation function it’s unlikely to be competitive.
It is kind of the case, but it does not need to expand the whole node to find the maximum. It samples some children instead from a NN (the Monte Carlo aspect)
I strongly suspect alphazero is easily beatable, once you have your hands on it. This is just from experience that most neural network style systems are weak against adversarial opponents who understand their internals.
Of course I can't be sure, because Google refuses to give out anyone access to alphazero, or a network trained with it. Personally, that gives me more confidence they know there are significant exploitable weaknesses.
No need to wait for AlphaZero, you can try Leela Chess Zero today. From my experience the network without search has some blind spots, but the tree search is pretty effective in fixing them.
Adversarial? If the model exclusively trains against itself, you can’t really insert anything there. Do you mean, play confusing moves at the beginning of the game?
The way general adversarial networks work on tricking image recognition systems is that they vary pixels of the input image slightly to manipulate the output of the neural network.
For alphazero, the input is the board, which you can't manipulate arbitrarily. You can run an evaluation of a board based on a move and see if its significantly different than the evaluation that alphazero comes up with, and maybe try to exploit that. But if you have a better evaluation of some state than that of alphazero, you're likely a stronger player anyway so this extra step is unnecessary. Most of the value of the bot comes from the evaluation function of a board, along with some hyper-parameters. But the evaluation is probably the most important part and the most difficult to replicate.
That doesn't follow. For you to confuse it, you need to change the inputs. For images, this is fine, we can smoothly change lots of little things. For chess games or go you don't have that freedom.
There's current best weights available. Not alphazero, but I would expect that issues would be general and so if there are issues with leela zero they may transfer and if you don't see issues with leela zero they're unlikely to exist in alpha zero (at least, if they do they may be very particular to subtle training differences).
Would be very interested to see what you find if you get the chance.
You can change the inputs: it depends on when (ply) and which move you play. Some moves are uncommon enough to make it possible for you to uncover something?
You absolutely can change the inputs, but the point I wanted to make is that unlike images where you can make a human-irrelevant changes you can't really do that with chess or go.
If you want to construct a particular position on the board, you'd likely need to use multiple steps, require the AI to play very particular moves and then the outcome would be a certain move from the AI. Even then, a simple incorrect classification doesn't help all that much, you need your opponent to make repeated mistakes.
I think in reality if you uncovered a type of move it wasn't expecting you are likely to uncover a new strategy in general rather than a trick. Image classification however lets you play uninterrupted with tiny pixel value changes, and you only need a single incorrect output to "win".
It's suspect it's a bit harder for the network to be overfit like this, but it's probably possible it has some gaps in its evaluation. However, those gaps would have to persist beyond its search horizon and not concretely affect material or mobility and it just seems vanishingly unlikely you'll find any systematic way to exploit anything.
God what a well written article! I don't have much to say on the subject, but this was pure joy to read, it's crazy good. Clear, engaging, to the point, making a difficult subject accessible without dumbing it down, no fluff or unnecessary side stories, just awesomeness.
>An expert human player is an expert precisely because her mind automatically identifies the essential parts of the tree and focusses its attention there.
Instead of using gender neutral pronoun like "they", the author used a feminine pronoun.
What's very interesting is that the Komodo developers have implemented a Monte Carlo Tree Search version of their engine without neural nets for evaluation / move selection. This brand new engine can actually compete at the top level (still much worse than Stockfish and slightly worse than Lc0) [1] [2]
The exact implementation details are probably kept secret, but the idea is to do a few steps of minimax / alpha-beta rather than completely random play in the playout phase of MCTS.
This makes me think that the contribution of AlphaZero is not necessarily neural nets, but rather MCTS as a succesful method to search the game tree efficiently.
You missed the point then. Alpha beta pruning requires knowledge of the game rules. Neural network pruning doesn't. The advantage is that it's a general purpose technique.
Yes, that's the main contribution of the experiment / paper. But prior to AlphaZero the chess community did not even consider investing in MCTS engines -- alpha-beta pruning was thought to be far superior. I'm thinking that we might see classical engines exploring this concept more, and maybe it's even a natural step to go from alpha-beta pruning + iterative deepening to 'best-first' search with MCTS.
>> In fact, less than two months later, DeepMind published a preprint of a third paper, showing that the algorithm behind AlphaGo Zero could be generalized to any two-person, zero-sum game of perfect information (that is, a game in which there are no hidden elements, such as face-down cards in poker).
I can't find this claim in the linked paper. What I can find is a statement that AlphaZero has demonstrated that 'a general-purpose reinforcement learning algorithm can achieve, tabula rasa, superhuman performance across many challenging domains'.
Personally, and I'm sorry to be so very negative about this, but I don't even see the "many" domains. AlphaZero plays three games that are very similar to each other. Indeed, shoggi is a variant of chess. There are certainly two-person, zero-sum, perfect-information games with radically different boards and pieces to either Go, or chess and shoggi - say, the Royal Game of Ur [1], or Mancala [2], etc, not to mention stochastic games of perfect information, like backgrammon, or assymetric games like the hnefatafl games [3], and so on.
Most likely, AlphaZero can be trained to play many such games very powerfully, or at a superhuman level. The point however is that, currently, it hasn't. So no "demonstration" of general game-playing has taken place, and of course there is no such thing as some sort of theoretical analysis that would serve as proof, or indication, of such ability in any of the DeepMind papers.
I was hoping for less ra-ra cheerleading from the New Yorker, to be honest.
This is academia, so being specific about exact claims is absolutely welcome. I would expect the problem to split between two types of games that you mention:
- asymmetric games, like hnefatafl, that probably can be covered — considering that AlphaZero can handle a late-stage situation with asymmetric options;
- what I understand to be stochastic games, of dice-based games, like the Royal Game of Ur, Mancala and backgrammon. I would expect that you have to re-define success by representing risk profile in the strategy; that could introduce complexity that the network can’t handle.
I agree. There is no explanation why the neural nets work so well. There is just proof it works well for a handful specific games (Chess, Go, Shoggi). Beyond that it's just inference to the claim that neural nets work for all two-player, zero-sum, perfect information games.
However, with the framework they've built it's easy to verify the claim for new games (given sufficient computational power). Maybe that should have been the point made.
Any explanation as to why this should not be used for games without perfect information? As an example, why couldn't the face-down card in poker be modeled as part of the MCTS?
MCTS can be applied to imperfect information games, in fact it is quite robust against uncertainty.
It's AlphaZero's deep neural net component, that's used to learn an evaluation function and move orderings that will need a substantial redesign to take into account imperfect information. The difficulty of this redesign will vary considerably between games- in some games, information is gained throughout the game, by observing another player's moves (e.g. in Poker), in some others an initial state (e.g. a starting deal in card games) dominates the probability that a certain possible board state is the real board state (e.g. Bridge) [1].
On top of that, AphaZero's deep net has the shape of the board and the legal movements of pieces on it hard-coded, as part of the net's structure. That would also need a substantial redesign to accommodate a card game, or any other kind of game without a board and without pieces that move on it. In fact, different card games will require different architectures, most likely. It's very hard to see how, e.g., the same neural net structure could be used to encode both Bridge and Poker rules - and still allow learning chess, shogi and Go.
Given the great variety of board games out there (and that's only classical games, I'm not even considering modern board games, like Settlers, etc) a lot of very hard work would be required to even train AlphaZero to play any game that's not very similar to chess, shogi and go. Not to mention, training AlphaZero is very expensive (wikipedia quotes a cost of $25 million for AlphaGoZero, AlphaZero's predeecssor, and that's just to buy the hardware [2]). So I don't see how or when they'll demonstrate the "general" game playing power of their system.
Basically, I think all that stuff about "generalized" game playing is just so much pointless bragging. The way DeepMind designed AlphaZero is exactly how everyone else has designed their systems- hard-coded with structures appropriate to the targeted game (e.g. boards and pieces, etc). DeepMind are clever in that they chose three very similar games, and then threw an immense amount of money on the problem of solving them all in tandem. And still they had to train different models for each game. That's just no way to get to general game playing.
___________
[1] See chapter 5. Adversarial Search in AI: A modern approach, 3d Ed. for a discussion of imperfect information and stochastic games and the difficulties of designing evaluation functions for them.
Awesome article. Does anyone know how to begin applying the AlphaZero techniques to games where information is NOT perfect? I'm trying to apply it to Scrabble. There hasn't been much AI research in this game and right now the best AI just uses brute force Monte Carlo with a flawed evaluation function (which doesn't take into account the state of the board at all, just points and tiles remaining on the opponent's rack). It's still good enough to beat top human experts about half the time, but I want to make something better.
Is it impossible to apply to these types of games? Every time I read about AlphaZero the articles mention that the techniques are meant for games of perfect information.
If you search "UCT imperfect information" on Google, you'll turn up plenty of articles and slide decks, including one from David Silver that discusses reinforcement learning. The catch is that they're mostly dated before AlphaZero's emergence, so there's some original work involved to extend AlphaZero to this domain. This is likely something that DeepMind is working on themselves. It's possible that tweaking the search query might turn up more recent results. Good luck!
Thanks, I'll take a look. Yup, I'm aware of Maven. The one most commonly in use now is named Quackle, which uses the same techniques as Maven but has a slightly better initial evaluation function. But the evaluation function is very simple and just keeps into account score and leave. This doesn't suffice often.
Is there a way to play against an AlphaGo or equivalent but with adaptive difficulty? I know next to nothing about go and think it would be interesting to learn it just by playing vs. a neural network. Maybe over time the strategies it uses would be "transferred" over to me!
The match between Stockfish and AlphaZero was played with certain unjustified parameters (time control, ponder off, different hardware, no opening book or endgame tablebase for Stockfish etc.). By "unjustified," I mean that the authors of the paper did not justify their choice of parameters in the paper as being designed to implement a fair match.
At a glance, the parameters of the match seem unfair to me -- and tilted heavily towards AlphaZero. If the code, were open source, this would not matter; anyone could run a rematch. As it is, I haven't seen any convincing evidence that AlphaZero is stronger than Stockfish when Stockfish is allowed to use its full breadth of knowledge and run on equal hardware.
There has been a rematch recently vs Stockfish, with a couple of hundred games. AlphaZero won 155-6! [0] There are fascinating videos with grandmasters commentating on some of the games. They're played in an exciting, sacrificial, swashbuckling style, nothing like any other top computer engine, and it seems that may affect the play of top (human) players for the better. e.g. see
The Wikipedia link says this was against Stockfish 8. Could people please stop spreading FUD here?
If there's no public tournament, this might as well not have happened. I do not understand why Google is always special. Other engines are open, Google can test against Stockfish but not vice versa.
All these web companies take, take, take from Open Source and rarely give back.
I'll write a paper now that I beat Carlsen, but I'll refuse to do so in public.
Do you mean me? If so, why not say so. "Could people please stop X-ing here?" is very passive-aggressive. I looked up 'FUD' - Fear, uncertainty and doubt. Not sure how what I said counts as any of those. Your comment certainly seems to want to spread FUD, however. (And why is this your account's only comment on HN?)
I don't know the significance of your first sentence either. I know nothing about the versions of Stockfish used for this or anything else. Maybe you're taking too much for granted. i.e. that I know enough of the minutiae to understand your comments. Could you fill in the dots a bit? And who are 'all these web companies'?
I'm not super-interested in AlphaZero (or computer chess generally) - haven't read any of the papers, for example. But there's a lot of talk about the various Alphas in the online chess world, since before it was playing chess, and I found these videos very impressive. And it's ridiculous to say "This might as well not have happened". Maybe true for you, but not for the chess world, at all.
So, after the first match in which they played against an obviously crippled version of Stockfish, and published games that involved blundering full queens worth of material, I too was skeptical of the second paper. However the setup was entirely reasonable (they basically copied what had recently been used in a computer chess championship and it was plenty beefy) and no weird time controls were used.
You can still run the PGN of the published games back and find some places where Stockfish 8 will analyse its own moves and find them blunders, so it _would_ be good for all this to happen out in the open, but I don't think there's any large scale deception going on here. I think it's beyond reasonable doubt now that AlphaZero is easily as strong as they are claiming.
A lot of the moves most praised by GMs are seen as the only moves in the position by Stockfish 9/10. I think there's a huge amount of cognitive dissonance going on, so that people can label AlphaZero's play more 'human'.
Anyway, I wouldn't be surprised if AlphaZero lines have existed at the top of the game for some time. Would be a no brainer for someone to have made Google an offer after the first paper.
I watched almost all the games with commentaries, I agree with the commentators, the style is definitely unique and not seen before. If the engine could see these they would not get absolutely destroyed.
I think people are overly keen to imbue AlphaZero with characteristics they associate with human play. Stockfish plays lots of alien lines, often relying on perfect defense, that no human would probably attempt in a real game. It's painful and jarring to admit that perfect chess might make no sense to humans, even after the fact. So I think people have emotional reasons to want to call AZ's play more human-like and natural, because it redeems human chess intuition. And to an extent it's valid, AZ is strategically much more sound as you'd expect from it's brilliant evaluation of positions, but it is still powerful enough to play completely inscrutable lines and to accept positions that look plain ugly. And besides, Stockfish also plays some beautiful stuff and newer versions play many of the nice moves we're seeing praised in YouTube videos. Just my cod psychology take, not really a deep point, and not something I'd really argue strongly about.
The second point, I'm saying it's likely some teams, possibly Caruana's have had access to AlphaZero already, and if the public had better access to its analysis we might be able to look at more games from 2018 and see its insights pop up. Certainly some of the lines Caruana played at the World Championship match were strongly backed by AlphaZero in the videos released, but weren't Stockfish's choice, for example.
You can still run the games past the exact commit of Stockfish they used and it finds blunders in its own play, so it still feels like there's a lack of transparency. But I don't think anyone strongly believes AlphaZero isn't the best at this point.
You can checkout the exact commit of Stockfish from the paper and perform the analysis of the published games yourself. I doubt the Stockfish developers have bothered because it's an old version.
Can I just state again, because of your aggressive tone in multiple comments now, that I do believe AlphaZero is stronger, I don't believe there are real shenanigans going on, but it's _still_ sad that we can't reliably, publicly verify this stuff.
You are the one making this claim:
"You can still run the games past the exact commit of Stockfish they used and it finds blunders in its own play, so it still feels like there's a lack of transparency."
This claim implies foul play, I asked for a shred of evidence. Call me aggresive if you want, I just can't stand this kind of bullshit.
It's not bullshit, you can download the version of Stockfish and ask it to analyse the games! I did this with scid, you can too. Around move 27 in game one is one example. I don't intend to repeat the tedious process because my curiosity is satisfied, and you're just being obnoxious.
It's impossible for me to know how long the engines thought on each move, so I just analyzed the games at a variety of times and depths. Stockfish 8 at two minutes per move on a machine that is slower than that used in the match finds plenty of issues. If we know the amount of time AZ thought (or indeed... had access to AZ) it'd be possible to more closely reproduce the games.
This is all either of us would have done if we were peer reviewing the paper, don't really understand the hostility about trying to reproduce a published paper.
(And I know in the past, before reading the paper, I've assumed they were cheating this time because they made so little effort the first time around, but I'm happy to admit I was wrong).
Yeah, I'm certainly not claiming AlphaZero isn't the stronger engine (probably by quite a substantial margin). The first match _was_ a pretty clear fix, but the second paper is an entirely reasonable setup, so I have no real suspicions about the process on an ongoing basis, just a shame nobody can prove it for themselves.
It was almost a year ago that lc0 was launched, since then the community (led by Alexander Lyashuk, author of the current engine) has taken it to a totally different level. Follow along at http://lczero.org!
Gcp has also done an amazing job with Leela Zero, with a very active community on the Go side. http://zero.sjeng.org
Of course, DeepMind really did something amazing with AlphaZero. It’s hard to overstate how dominant minimax search has been in chess. For another approach (MCTS/NN) to even be competitive with 50+ years of research is amazing. And all that without any human knowledge!
Still, Stockfish keeps on improving - Stockfish 10 is significantly stronger than the version AlphaZero played in the paper (no fault of DeepMind; SF just improves quickly). We need a public exhibition match to setttle the score, ideally with some GM commentary :). To complete the links you can watch Stockfish improve here: http://tests.stockfishchess.org.