Hacker News new | past | comments | ask | show | jobs | submit login
DeepMind has open-sourced the heart of AlphaGo and AlphaZero (twitter.com/drjimfan)
328 points by mariuz on Feb 15, 2023 | hide | past | favorite | 87 comments



So the naive MCTS implementation in Python is ridiculously inefficient. Of course, you could reimplement it in C++ but this then requires you to use the C wrappers of Tensorflow/JAX to do the MCTS/neural network interop.

I came up with a nifty implementation in Python that outperforms the naive impl by 30x, allowing a pure python MCTS/NN interop implementation. See https://www.moderndescartes.com/essays/deep_dive_mcts/

MCTX comes up with an even niftier implementation in JAX that runs the entire MCTS algorithm on the TPU. This is quite a feat because tree search is typically a heavily pointer based algorithm. It uses the object pool pattern described in https://gameprogrammingpatterns.com/object-pool.html to serialize all of the nodes of the search tree into one flat array (which is how it manages to fit into JAX formalisms). I suspect it's not a particularly efficient use of the TPU, but it does cut out all of the CPU-TPU round trip latency, which I'm sure more than compensates.


> I came up with a nifty implementation in Python that outperforms the naive impl by 30x, allowing a pure python MCTS/NN interop implementation. See https://www.moderndescartes.com/essays/deep_dive_mcts/

Great post!

Chasing pointers in the MCTS tree is definitely a slow approach. Although typically there are ~ 900 "considerations" per move for alphazero. I've found getting value/policy predictions from a neural network (or GBDT[1]) for the node expansions during those considerations is at least an order of magnitude slower than the MCTS tree-hopping logic.

[1] https://github.com/cgreer/alpha-zero-boosted


Worth noting that while AlphaGo and AlphaZero are incredible achievements, the amount of actual code to implement them isn't very much.

If you have the research paper, someone in the field could reimplement them in a few days.

Then there is the large compute cost for training them to produce the trained weights.

So, opensourcing these bits of work without the weights isn't as major a thing as you might imagine.


> If you have the research paper, someone in the field could reimplement them in a few days.

Hi I did this while I was at Google Brain and it took our team of three more like a year. The "reimplementation" part took 3 months or so and the rest of the time was literally trying to debug and figure out all of the subtleties that were not quite mentioned in the paper. See https://openreview.net/forum?id=H1eerhIpLV


Replication crisis: https://en.wikipedia.org/wiki/Replication_crisis :

> The replication crisis (also called the replicability crisis and the reproducibility crisis) is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce. Because the reproducibility of empirical results is an essential part of the scientific method,[2] such failures undermine the credibility of theories building on them and potentially call into question substantial parts of scientific knowledge.

People should publish automated tests. How does a performance-optimizer know that they haven't changed the output of there are no known-good inputs and outputs documented as executable tests? Pytest-hypothesis seems like a nice compact way to specify tests.

AlphaZero: https://en.wikipedia.org/wiki/AlphaZero

GH topic "AlphaZero" https://github.com/topics/alphazero

I believe ther are one or more JAX implementations of AlphaZero?

Though there's not yet a quantum-inference-based self-play (AlphaZero) algorithm?

TIL about the modified snow plow problem is a variation on TSP, and there are already quantum algos capable of optimally solving TSP.


I think I agree with everything you've said here, but just want to note that while we absolutely should (where relevant) expect published code including automated tests, we should not typically consider reproduction that reuses that code to be "replication" per se. As I understand it, replication isn't merely a test for fraud (which rerunning should typically detect) and mistakes (which rerunning might sometimes detect) but also a test that the paper successfully communicates the ideas such that other human minds can work with them.


Sources of variance; Experimental Design, Hardware, Software, irrelevant environmental conditions/state, Data (Sample(s)), Analysis

Can you run the notebook again with the exact same data sample (input) and get the same charts and summary statistics (output)? Is there a way to test the stability of those outputs over time?

Can you run the same experiment (the same 'experimental design'), ceteris paribus (everything else being equal) and a different sample (input) and get a very similar output? Is it stable, differentiable, independent, nonlinear, reversible; Does it converge?

Now I have to go look up the definitions for Replication, Repeatability, Reproducibility


Replication: https://en.wikipedia.org/wiki/Replication (disambiguation)

Replication_(scientific_method) -> Reproducibility https://en.wikipedia.org/wiki/Reproducibility :

> Measures of reproducibility and repeatability: In chemistry, the terms reproducibility and repeatability are used with a specific quantitative meaning. [7] In inter-laboratory experiments, a concentration or other quantity of a chemical substance is measured repeatedly in different laboratories to assess the variability of the measurements. Then, the standard deviation of the difference between two values obtained within the same laboratory is called repeatability. The standard deviation for the difference between two measurement from different laboratories is called reproducibility. [8] These measures are related to the more general concept of variance components in metrology.

Replication (statistics) https://en.wikipedia.org/wiki/Replication_(statistics) :

> In engineering, science, and statistics, replication is the repetition of an experimental condition so that the variability associated with the phenomenon can be estimated. ASTM, in standard E1847, defines replication as "... the repetition of the set of all the treatment combinations to be compared in an experiment. Each of the repetitions is called a replicate."

> Replication is not the same as repeated measurements of the same item: they are dealt with differently in statistical experimental design and data analysis.

> For proper sampling, a process or batch of products should be in reasonable statistical control; inherent random variation is present but variation due to assignable (special) causes is not. Evaluation or testing of a single item does not allow for item-to-item variation and may not represent the batch or process. Replication is needed to account for this variation among items and treatments.

Accuracy and precision: https://en.m.wikipedia.org/wiki/Accuracy_and_precision :

> In simpler terms, given a statistical sample or set of data points from repeated measurements of the same quantity, the sample or set can be said to be accurate if their average is close to the true value of the quantity being measured, while the set can be said to be precise if their standard deviation is relatively small.

Reproducible builds; to isolate and minimize software variance: https://en.wikipedia.org/wiki/Reproducible_builds

Re: reproducibility, containers, Jupyter books, REES, repo2docker: https://news.ycombinator.com/item?id=32965961 https://westurner.github.io/hnlog/#comment-32965961 (Ctrl-F #linkedreproducibility)


alltime classic Hacker News moment: "heh someone in the field could write this in a few days" "Hi its me 3 of us literally work at google brain and it took us a year"


I don't know the subject matter well enough to make the call, but it's possible the OP is making a general statement that's generally true even if it's not in this specific context of it taking a year.


One thing that's not appreciated by many who haven't tried to implement a NN is how subtle bugs can be. When you look at code for a NN, it's generally pretty simple. However, what happens when your code doesn't produce the output you were expecting? When that happens, it can be very difficult and time consuming to find the subtle issue with your code.


How come you weren't able to just get it from DeepMind given that they are a subsidiary of Google? Is there a lot of red tape involved in exchanging IP like that?


They were & are very protective of the AlphaGo "brand", is the best-case explanation.


That defeats the purpose of reimplementing it?


There is Leela Zero (https://github.com/leela-zero/leela-zero) for Go and lc0/Leela Chess (https://github.com/orgs/LeelaChessZero/repositories) for Chess, where both provide trained weights. The Leela Chess project specifically have been working for a long time on training and refining the weights for Chess, as well as providing the code -- they allow you to see the history and performance over time for the various trained models.


I'm moderately into chess but I have never heard of Leela

I'm shocked to discover it's been rated higher than AlphaZero & Komodo and just slightly below Stockfish


If I'm not mistaken Stockfish has it's own neural network implementation as well correct?




> Then there is the large compute cost for training them to produce the trained weights.

And as far as I understand, the training code is where the secret sauce lies.


generally no...

Secret sauce is in the ML compiler and accelerator used, but all those improvements simply lower the cost of training a model. You could still do it on a regular GPU, it would just take you more time.

In the case of Google, they probably used TPU chips that you can't get direct 'bare metal' access to anyway, so none of that code would have helped.

The actual optimizer used and parameters (like the learning rate schedule) is normally published in the research paper.


You should pencil out on a napkin just how long "more time" is. Here, i'll get you started:

1600 inferences per move * 1ms per inference * 250 moves/game * 30M games played = 12B seconds. 140k days; muzero with gumbel brought down the 1600 to ~40, but either way, you need some more scale.

It turns out a lot of the difficulties, judgment calls, and implementation details involve data pipelining. Some of those choices affect the final skill ceiling you reach. Which ones? How much? Are they path dependent? Well, you'll need to run it more than once...


Yep, there are many reimplementations. Here is a reimplementation that swaps out a neural net with a GBDT to address compute costs:

https://github.com/cgreer/alpha-zero-boosted


How does the performance of this version compare?


Depends on game/environment and—since it's using a GBDT and not a NN—how good you are at feature extraction/selection for your problem.

High level, I'd say it's a good way to test a new environment w/out spending time/effort on GPUs until you understand the problem well, and then you can switch to the time/money costly GPU world.


Imagine its perfect for Computer Backgammon, but overfits higher dimensional spaces ;)


If only there existed a distributed way to incentivize calculation of AI weightings while also providing a currency to encourage scale…


As many people have pointed out before, the distributed currency part adds energy waste. BOINC accomplishes the same without the waste.


Of course there will be trade-offs of inefficiency, that is always the case with distribution.

Is it valuable to have open source AI systems is the countering question to that…


The point is you can develop open source AI cheaper without blockchain.


not an entirely bad idea.


Models can be massive, but also totally doable. Just to put things in perspective: ProcMaze solving using DeepMind MCTX converges <1M steps. Whereas a physically based agent such as HalfCheetah may require >100M steps to learn to run. Q-learning Pac-Man on snapdragon chromeos is ~1hr for 1000 epochs ;)


“The heart of” can mean many things, is this runnable or not? Also, they should open source the weights as well so that their claims can finally be verified independently.


Totally agree. I don't even know what benefit they'd get at this point from keeping some parts locked up.

Anyway if you want something runnable Leela has a nice reimplementation: https://github.com/leela-zero/leela-zero


I'd suggest KataGo, which is much stronger and more actively developed than Leela Zero https://github.com/lightvector/KataGo


And let us not forget those secret random seeds!


Is this a joke I don't get? Because it's a bit unlikely there's something interesting in random seeds.


Please do the same for Alphastar, the starcraft AI, that would be great.


I talked to the guy who led AlphaStar at Blizzcon. I asked if I could get the weights. He said that balance changes and map changes makes keeping the model updated prohibitive.


sure but even frozen outdated weights would be very instructive to study


I personally hope these types of AI's make it into future games as bots to play against in single-player, and also as an agent to run some of the background stuff while you're focusing on other things.

For example, in Starcraft, when you A-move a large part of your army across the map, an AlphaStar-like AI could decide which units to prioritize it to attack once it reaches a group of enemy units, or what formation to move in (automatically keeping colossus in back, observer ahead, balls of marines split apart into smaller clumps), or when your Terran army encounters a significant force along the way, the tanks in back could auto-siege / auto-unsiege.


That was one of the things I was hoping for, better ai, but also there are many unanswered questions in Starcraft. Like which race is the "best", what is an optimal strategy, and I thought these questions could be explored more.


This would probably be against some Blizzard ToS


I thought that they had an open API, and I also thought we as players and AI enthusiasts would benefit somehow from this venture, it was all rather exciting. But the research was all behind closed doors, and nothing much came of it except for some impressive demonstrations.


In case you didn't know, SC2 does have a healthy AI bot community:

https://sc2ai.net/


I am aware of that and follow it but its way behind what Alphastar was doing.


I feel like this headline is deceptive. This is a Jax reimplementation, and it was released a year ago. It is a cool library though. The basic operation of muzero is very simple, but training it efficiently is tricky.


Yeah. DeepMind has released various MCTS implementations (eg in OpenSpiel). Not the same as releasing AlphaZero.


actually it is not the original code of alphazero , the original code was tensorflow/TPU https://en.wikipedia.org/wiki/AlphaGo_Zero#Training and this code is in JAX


Obligatory note that AlphaZero has long since been surpassed by its independently developed cousin Leela Chess Zero(which is also open source btw), and also Stockfish has more than caught up and remains competitive with MCTS engines.


Did stockfish use any tech from alphazero & co ? or has it managed to catch up keeping itself "pure" ?


Stockfish has added neural networks for evaluation, much simpler than the large networks in MCTS engines, and efficiently CPU computable. The technique actually originated in a Shogi engine and was later ported to mainline Stockfish, so there's no direct inheritance from AlphaZero in that sense.

The search algorithms have been incrementally improved, but still follow the same Alpha Beta and a heap of heuristics approach.


They use neural networks since Stockfish 12, but not in the same way that AlphaZero did:

https://www.chessprogramming.org/Stockfish_NNUE


Stockfish now uses a neural network in some situations. Last I checked, it uses the classical, fast, tactical search and evaluation for active, tactical positions, and uses the network for quieter, more strategic positions.


Interesting approach to private variables https://github.com/deepmind/mctx/blob/577fc77a3cda1b796e277e...


Is there a better way to do it in Python?


I can't help but wonder when this stuff happens that the devs (or the leads) are worried about being let go, so they open source their stuff to be able to take it elsewhere.

Anyone here from DeepMind?


These particular programs were bleeding edge when they were announced, but have been recreated and improved on outside of DeepMind since then. For example, LCZero today is stronger at chess than any of the DeepMind chess programs that were shown to the public. Of course we don't know if they got further behind the scenes. Of course LCZero and everything else relied on the published papers of DeepMind. That is, the papers were more important than the code.


There's an interesting bias I've noticed on HN where a lot of people still believe AlphaZero is the state of the art in computer chess, when that hasn't been true except for a short while after the release. AZ still gets posted and upvoted today while newer improvements in other engines are discussed far less.

I guess it's because AZ came out of Google?


AZ revolutionized chess engine architecture and then moved on to other fields. Fast followers who incrementally improve upon the breakthroughs are very rarely recognized.

E.g. inventor of the blue LED versus those who improve the efficiency by .1%


It changed engine analysis of games because it doesn’t generate a series of “lines” that each has a centipawn value. It just gives you the move.

Subjectively, the Monte Carlo moves seem so human-like in comparison to minimax. Minimax can suggest a move that no human would play because the depth of calculation at which that move is good is just impossible for people.


Stockfish' architecture is in no way revolutionised by AZ. It has gone its own way.


The current versions of Stockfish (designated NNUE) are definitely AZ influenced. In fact it was implemented in collaboration with the LCZero devs.

https://stockfishchess.org/blog/2020/introducing-nnue-evalua...


The link you linked confirms what I already pointed out in another subthread. NNUE was invented by Yu Nasu for the game of Shogi. It bears very little resemblance to the AZ neural net. It's much more shallow, and inspired by work that pre-exists AZ, from the computer Shogi community. The way it trains is different, inference is different. The only commonality is the fact it's a neural net. But neural nets have been used for chess before in various capacities; Deepmind didn't invent this. All they did was throw a bunch of compute at it and do a big PR stunt then walk away.

I'm not sure where you get the LCZero connection from.


What is interesting is that current state of the art comes from another place: https://en.m.wikipedia.org/wiki/Efficiently_updatable_neural...

Lc0 is based on AlphaZero ideas and is significantly weaker than NNUE based modern Stockfish.


Well, AZ could perform above Stockfish by using much simpler principles. A lot of tinkering was required to get Stockfish to where it was, and AZ pretty much rediscovered and improved on some of these refined fine-tuned principles, such as board evaluation, opening books or endgames, from its own training. I think that's very impressive.


> when that hasn't been true except for a short while after the release

Even that was debatable given the restrictions placed on the version of Stockfish it played.

Not to take anything away from AlphaZero either, self play to reach that level was quite the achievement.


Agreed, most "newly released" "open sourced" products have typically been in the pipeline for at least half a decade.

That said many have been retooled for a more specific purpose before public release.


> I can't help but wonder when this stuff happens that the devs (or the leads) are worried about being let go, so they open source their stuff to be able to take it elsewhere.

Open-sourcing isn't a dev-level decision - it is a business leadership decision.


The people who left probably said that they would want to work on stuff usable by the public, this seems like an answer to that.


Note that JAX was created after the success of Alpha(Go|Zero), so the claim that this powered those papers is not accurate. It is a DM MCTS library though.


...and announce that they are changing name to OpenMind?


Must feel pretty bad being a rank and file dev at DeepMind and realizing that your leads had all the talent and funding in the world that they could have used for building language model products and instead they heavily invested in solving board games.

Then you realize they get paid >x10 of what you are and they're fine, but the next layoff is likely gonna get you.


The human in the loop reinforcement learning paper that powered chatgpt's training arose from deepmind's experiments with boardgames (and games). AGI is still an unsolved problem and deep RL that arose from the success of deepmind's experiments wth boardgames and games so far, will likely play a huge part in it


Like Bell Labs of old, Google has excellent AI researchers but they haven't built many (any?) AI products people can use.


Search? Gmail completing your emails? They have lots of AI products, they just wait until they are more solid before releasing stuff. Now they will rush out a competitor to ChatGPT before it is ready, but they would use it for something given time.


Don't forget translate!


Board games have been used in AI since the beginning. They provide a good environment as we know the rules and control them. Also as everybody uses them it is easier to compare different algorithms. Most advances in AI were done in board games. Most probably chatGPT uses lot of those things you think are irrelevant in board games (reinforcement learning for fine tuning the responses with human feedback, same algorithms used in board games).


If chess was interesting enough for Ken Thompson to work on, that's good enough for me.


Google researchers invented Transformers and Deepmind pioneered reinforcement learning.


Deepmind did not pioneer reinforcement learning. In fact it wasn’t even the first place to use neural networks for feature representation in RL. That was achieved with Backgammon in the 90s.


Games offer clear perimeters and have lotsa depth to them.

Humans are also trained in games when young, and even into adulthood, think war games that are held semi-annually between countries.

These are just board games with increased stakes and additional variables.


"board games"

ie. Chess and Go. Go a couple of thousand years old, and Chess in particular a core element of AI research history.

Language model research is cool, but you should perhaps consider expanding your horizons beyond the latest headlines in AI.


They also do work with the UK National Health Service.

I have heard (high level sources, unconfirmed publicly) the YouTube algorithm that promotes open mouth creator with $$$$ signs thumbnails is based on their ground breaking research from this collaboration.



They have their own LLMs such as PaLM


why they dont make public apps?(at least)




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: