> Are those 180 years of games "seeded" by real games, or was it entirely self play?
The writeup implies that it's entirely self-play.
> Also, how does this system cope with gameplay changes that arise when the game is patched?
From the sound of it, they don't. Since it's a policy gradient method, which learns only from the last set of samples, hypothetically, they could simply swap out the DoTA binary on the fly in parallel and let it automatically update itself by continued training. (The difference between optimal pre/post-patch is a lot smaller than the difference between a random policy and an optimal policy...)
The fact that it's not seeded at all is very interesting. A lot of Dota expertise derives from knowing what the opponent is going to do at a particular time. I remember many comments from experienced Go players that AlphaGo made moves that no human player would make, so I wonder if that will appear in this case as well.
They do discuss some current differences in playstyle toward the bottom, like faster openings and more use of support heroes, which the self-play has invented (along with rediscovering standard tactics). So it's at least a little different.
Whether these are better is hard to say. It's not superhuman, after all, unlike AlphaGo, so it's not presumptively right, and you can't doublecheck by doing a very deep tree evaluation (because DoTA doesn't lend itself to tree exploration - far too many actions and long-range).
The writeup implies that it's entirely self-play.
> Also, how does this system cope with gameplay changes that arise when the game is patched?
From the sound of it, they don't. Since it's a policy gradient method, which learns only from the last set of samples, hypothetically, they could simply swap out the DoTA binary on the fly in parallel and let it automatically update itself by continued training. (The difference between optimal pre/post-patch is a lot smaller than the difference between a random policy and an optimal policy...)