So the naive MCTS implementation in Python is ridiculously inefficient. Of course, you could reimplement it in C++ but this then requires you to use the C wrappers of Tensorflow/JAX to do the MCTS/neural network interop.
MCTX comes up with an even niftier implementation in JAX that runs the entire MCTS algorithm on the TPU. This is quite a feat because tree search is typically a heavily pointer based algorithm. It uses the object pool pattern described in https://gameprogrammingpatterns.com/object-pool.html to serialize all of the nodes of the search tree into one flat array (which is how it manages to fit into JAX formalisms). I suspect it's not a particularly efficient use of the TPU, but it does cut out all of the CPU-TPU round trip latency, which I'm sure more than compensates.
Chasing pointers in the MCTS tree is definitely a slow approach. Although typically there are ~ 900 "considerations" per move for alphazero. I've found getting value/policy predictions from a neural network (or GBDT[1]) for the node expansions during those considerations is at least an order of magnitude slower than the MCTS tree-hopping logic.
> If you have the research paper, someone in the field could reimplement them in a few days.
Hi I did this while I was at Google Brain and it took our team of three more like a year. The "reimplementation" part took 3 months or so and the rest of the time was literally trying to debug and figure out all of the subtleties that were not quite mentioned in the paper. See https://openreview.net/forum?id=H1eerhIpLV
> The replication crisis (also called the replicability crisis and the reproducibility crisis) is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce. Because the reproducibility of empirical results is an essential part of the scientific method,[2] such failures undermine the credibility of theories building on them and potentially call into question substantial parts of scientific knowledge.
People should publish automated tests. How does a performance-optimizer know that they haven't changed the output of there are no known-good inputs and outputs documented as executable tests? Pytest-hypothesis seems like a nice compact way to specify tests.
I think I agree with everything you've said here, but just want to note that while we absolutely should (where relevant) expect published code including automated tests, we should not typically consider reproduction that reuses that code to be "replication" per se. As I understand it, replication isn't merely a test for fraud (which rerunning should typically detect) and mistakes (which rerunning might sometimes detect) but also a test that the paper successfully communicates the ideas such that other human minds can work with them.
Sources of variance; Experimental Design, Hardware, Software, irrelevant environmental conditions/state, Data (Sample(s)), Analysis
Can you run the notebook again with the exact same data sample (input) and get the same charts and summary statistics (output)? Is there a way to test the stability of those outputs over time?
Can you run the same experiment (the same 'experimental design'), ceteris paribus (everything else being equal) and a different sample (input) and get a very similar output? Is it stable, differentiable, independent, nonlinear, reversible; Does it converge?
Now I have to go look up the definitions for Replication, Repeatability, Reproducibility
> Measures of reproducibility and repeatability: In chemistry, the terms reproducibility and repeatability are used with a specific quantitative meaning. [7] In inter-laboratory experiments, a concentration or other quantity of a chemical substance is measured repeatedly in different laboratories to assess the variability of the measurements. Then, the standard deviation of the difference between two values obtained within the same laboratory is called repeatability. The standard deviation for the difference between two measurement from different laboratories is called reproducibility. [8] These measures are related to the more general concept of variance components in metrology.
> In engineering, science, and statistics, replication is the repetition of an experimental condition so that the variability associated with the phenomenon can be estimated. ASTM, in standard E1847, defines replication as "... the repetition of the set of all the treatment combinations to be compared in an experiment. Each of the repetitions is called a replicate."
> Replication is not the same as repeated measurements of the same item: they are dealt with differently in statistical experimental design and data analysis.
> For proper sampling, a process or batch of products should be in reasonable statistical control; inherent random variation is present but variation due to assignable (special) causes is not. Evaluation or testing of a single item does not allow for item-to-item variation and may not represent the batch or process. Replication is needed to account for this variation among items and treatments.
> In simpler terms, given a statistical sample or set of data points from repeated measurements of the same quantity, the sample or set can be said to be accurate if their average is close to the true value of the quantity being measured, while the set can be said to be precise if their standard deviation is relatively small.
alltime classic Hacker News moment: "heh someone in the field could write this in a few days" "Hi its me 3 of us literally work at google brain and it took us a year"
I don't know the subject matter well enough to make the call, but it's possible the OP is making a general statement that's generally true even if it's not in this specific context of it taking a year.
One thing that's not appreciated by many who haven't tried to implement a NN is how subtle bugs can be. When you look at code for a NN, it's generally pretty simple. However, what happens when your code doesn't produce the output you were expecting? When that happens, it can be very difficult and time consuming to find the subtle issue with your code.
How come you weren't able to just get it from DeepMind given that they are a subsidiary of Google? Is there a lot of red tape involved in exchanging IP like that?
There is Leela Zero (https://github.com/leela-zero/leela-zero) for Go and lc0/Leela Chess (https://github.com/orgs/LeelaChessZero/repositories) for Chess, where both provide trained weights. The Leela Chess project specifically have been working for a long time on training and refining the weights for Chess, as well as providing the code -- they allow you to see the history and performance over time for the various trained models.
Secret sauce is in the ML compiler and accelerator used, but all those improvements simply lower the cost of training a model. You could still do it on a regular GPU, it would just take you more time.
In the case of Google, they probably used TPU chips that you can't get direct 'bare metal' access to anyway, so none of that code would have helped.
The actual optimizer used and parameters (like the learning rate schedule) is normally published in the research paper.
You should pencil out on a napkin just how long "more time" is. Here, i'll get you started:
1600 inferences per move * 1ms per inference * 250 moves/game * 30M games played = 12B seconds. 140k days; muzero with gumbel brought down the 1600 to ~40, but either way, you need some more scale.
It turns out a lot of the difficulties, judgment calls, and implementation details involve data pipelining. Some of those choices affect the final skill ceiling you reach. Which ones? How much? Are they path dependent? Well, you'll need to run it more than once...
Depends on game/environment and—since it's using a GBDT and not a NN—how good you are at feature extraction/selection for your problem.
High level, I'd say it's a good way to test a new environment w/out spending time/effort on GPUs until you understand the problem well, and then you can switch to the time/money costly GPU world.
Models can be massive, but also totally doable. Just to put things in perspective: ProcMaze solving using DeepMind MCTX converges <1M steps. Whereas a physically based agent such as HalfCheetah may require >100M steps to learn to run. Q-learning Pac-Man on snapdragon chromeos is ~1hr for 1000 epochs ;)
“The heart of” can mean many things, is this runnable or not? Also, they should open source the weights as well so that their claims can finally be verified independently.
I talked to the guy who led AlphaStar at Blizzcon. I asked if I could get the weights. He said that balance changes and map changes makes keeping the model updated prohibitive.
I personally hope these types of AI's make it into future games as bots to play against in single-player, and also as an agent to run some of the background stuff while you're focusing on other things.
For example, in Starcraft, when you A-move a large part of your army across the map, an AlphaStar-like AI could decide which units to prioritize it to attack once it reaches a group of enemy units, or what formation to move in (automatically keeping colossus in back, observer ahead, balls of marines split apart into smaller clumps), or when your Terran army encounters a significant force along the way, the tanks in back could auto-siege / auto-unsiege.
That was one of the things I was hoping for, better ai, but also there are many unanswered questions in Starcraft. Like which race is the "best", what is an optimal strategy, and I thought these questions could be explored more.
I thought that they had an open API, and I also thought we as players and AI enthusiasts would benefit somehow from this venture, it was all rather exciting. But the research was all behind closed doors, and nothing much came of it except for some impressive demonstrations.
I feel like this headline is deceptive. This is a Jax reimplementation, and it was released a year ago. It is a cool library though. The basic operation of muzero is very simple, but training it efficiently is tricky.
Obligatory note that AlphaZero has long since been surpassed by its independently developed cousin Leela Chess Zero(which is also open source btw), and also Stockfish has more than caught up and remains competitive with MCTS engines.
Stockfish has added neural networks for evaluation, much simpler than the large networks in MCTS engines, and efficiently CPU computable. The technique actually originated in a Shogi engine and was later ported to mainline Stockfish, so there's no direct inheritance from AlphaZero in that sense.
The search algorithms have been incrementally improved, but still follow the same Alpha Beta and a heap of heuristics approach.
Stockfish now uses a neural network in some situations. Last I checked, it uses the classical, fast, tactical search and evaluation for active, tactical positions, and uses the network for quieter, more strategic positions.
I can't help but wonder when this stuff happens that the devs (or the leads) are worried about being let go, so they open source their stuff to be able to take it elsewhere.
These particular programs were bleeding edge when they were announced, but have been recreated and improved on outside of DeepMind since then. For example, LCZero today is stronger at chess than any of the DeepMind chess programs that were shown to the public. Of course we don't know if they got further behind the scenes. Of course LCZero and everything else relied on the published papers of DeepMind. That is, the papers were more important than the code.
There's an interesting bias I've noticed on HN where a lot of people still believe AlphaZero is the state of the art in computer chess, when that hasn't been true except for a short while after the release. AZ still gets posted and upvoted today while newer improvements in other engines are discussed far less.
AZ revolutionized chess engine architecture and then moved on to other fields. Fast followers who incrementally improve upon the breakthroughs are very rarely recognized.
E.g. inventor of the blue LED versus those who improve the efficiency by .1%
It changed engine analysis of games because it doesn’t generate a series of “lines” that each has a centipawn value. It just gives you the move.
Subjectively, the Monte Carlo moves seem so human-like in comparison to minimax. Minimax can suggest a move that no human would play because the depth of calculation at which that move is good is just impossible for people.
The link you linked confirms what I already pointed out in another subthread. NNUE was invented by Yu Nasu for the game of Shogi. It bears very little resemblance to the AZ neural net. It's much more shallow, and inspired by work that pre-exists AZ, from the computer Shogi community. The way it trains is different, inference is different. The only commonality is the fact it's a neural net. But neural nets have been used for chess before in various capacities; Deepmind didn't invent this. All they did was throw a bunch of compute at it and do a big PR stunt then walk away.
I'm not sure where you get the LCZero connection from.
Well, AZ could perform above Stockfish by using much simpler principles. A lot of tinkering was required to get Stockfish to where it was, and AZ pretty much rediscovered and improved on some of these refined fine-tuned principles, such as board evaluation, opening books or endgames, from its own training. I think that's very impressive.
> I can't help but wonder when this stuff happens that the devs (or the leads) are worried about being let go, so they open source their stuff to be able to take it elsewhere.
Open-sourcing isn't a dev-level decision - it is a business leadership decision.
Note that JAX was created after the success of Alpha(Go|Zero), so the claim that this powered those papers is not accurate. It is a DM MCTS library though.
Must feel pretty bad being a rank and file dev at DeepMind and realizing that your leads had all the talent and funding in the world that they could have used for building language model products and instead they heavily invested in solving board games.
Then you realize they get paid >x10 of what you are and they're fine, but the next layoff is likely gonna get you.
The human in the loop reinforcement learning paper that powered chatgpt's training arose from deepmind's experiments with boardgames (and games). AGI is still an unsolved problem and deep RL that arose from the success of deepmind's experiments wth boardgames and games so far, will likely play a huge part in it
Search? Gmail completing your emails? They have lots of AI products, they just wait until they are more solid before releasing stuff. Now they will rush out a competitor to ChatGPT before it is ready, but they would use it for something given time.
Board games have been used in AI since the beginning. They provide a good environment as we know the rules and control them. Also as everybody uses them it is easier to compare different algorithms. Most advances in AI were done in board games. Most probably chatGPT uses lot of those things you think are irrelevant in board games (reinforcement learning for fine tuning the responses with human feedback, same algorithms used in board games).
Deepmind did not pioneer reinforcement learning. In fact it wasn’t even the first place to use neural networks for feature representation in RL. That was achieved with Backgammon in the 90s.
They also do work with the UK National Health Service.
I have heard (high level sources, unconfirmed publicly) the YouTube algorithm that promotes open mouth creator with $$$$ signs thumbnails is based on their ground breaking research from this collaboration.
I came up with a nifty implementation in Python that outperforms the naive impl by 30x, allowing a pure python MCTS/NN interop implementation. See https://www.moderndescartes.com/essays/deep_dive_mcts/
MCTX comes up with an even niftier implementation in JAX that runs the entire MCTS algorithm on the TPU. This is quite a feat because tree search is typically a heavily pointer based algorithm. It uses the object pool pattern described in https://gameprogrammingpatterns.com/object-pool.html to serialize all of the nodes of the search tree into one flat array (which is how it manages to fit into JAX formalisms). I suspect it's not a particularly efficient use of the TPU, but it does cut out all of the CPU-TPU round trip latency, which I'm sure more than compensates.