Hacker News new | past | comments | ask | show | jobs | submit login
Leela Zero – Go engine with no human-provided knowledge (github.com/gcp)
142 points by pmontra on Nov 16, 2017 | hide | past | favorite | 47 comments



I applaud the work that went into this!

But at the same time I feel sad for the state of science where working code is still not considered part of the paper, so the code is almost never published, let alone published as Free Software for other scienists (and hobbyists) to tinker with it, to fix it, to improve it, and to adapt it.

For example, why isn't it possible for everyone interested to let Leela Zero play against AlphaGo Zero, on their own hardware (or their university's hardware)?


> so the code is almost never published

Even worse is that a friend of mine who is a researcher in neuroscience/BCI has found that many of the papers couldn't be reproduced at all with flawed or incorrect conclusions. Since they typically don't publish their code, its very difficult for others to check the correctness.


Moreover, in this context "code" is not just Python/C++/Java, but also spreadsheets!

Publishing spreadsheets is perhaps an even more urgent issue than publishing other types of code.

And of course the underlying data (anonymized) in case of statistical elicitation, but that's another topic. (Open Access, Open Data and Free Software are very closely related, though).


And even when it is made available, it is of very poor quality. I heard about someome who studied the programs used for analyzing DNA sequences. While tinkering with it, he found a bug, fixed it and submitted a pull request. Two years later, the pull request was still not merged by the maintainer of this 'standard' package. He also told that the software was of a poor quality. For example, in theory, four base pairs could be stored in one byte, instead they are simply storing the letters A,C,T, or G in a single byte for each pair.


Yeah, that’s what my friend was saying too. He mentioned one often-cited paper which, when he fixed a bug he found in the code, no longer supported the papers conclusion at all. He also complained about the poor quality in a lot of the code. :-(


I haven't tested this myself, but from what I have heard it is not necessarily a good idea to store several different things in a single byte like this. While it may consume less memory there is a performance cost for each access since you may need to do masks and shifts.

Of course, there is a non-trivial performance trade-off with more cycles but more objects fits in cache, etc.

Some quick googling gave for example this source which seems to hint that under some situations it is not worth it at least https://stackoverflow.com/questions/4240974/when-is-it-worth...

Lastly, it seems like being quite harsh to call the code being of poor quality because of something like that. Unless it really is very important to get the extra memory, just the maintenance cost of packing things efficiently might make it not worth it. Even production code bases such as the JVM will happily use a single byte (or even more?) to store a boolean, and such things probably occur all over the place.


unless the author uses memory in the most efficient way possible (preferably writing LLVM-IR directly) it will be hard for reviewers to run DNA analysis algorithms on their smart fridges /s


Really, the academic world is sometimes obsessed with pseudocode.

It might make some didactic sense, but it makes fewer and fewer sense as times go by when you can ship the actual code and shows actual results


Even worse is when you realize that's a growing concern in deep learning.


Agreed, but the main goal of the average researcher is getting the paper published. The code is a byproduct and probably written not too well (they are pro of another profession) and maybe they don't know how to properly distribute it. Furthermore their publishing channel is made for paper / PDF, not source code.

In this case the author is DeepMind/Google, so they know how to code and how to distribute it.

However we need not only the code but also the network. Leela Zero is code but more importantly an effort to spread the effort of training the network among many people. Each one volunteers some hw (CPU/GPU) and contributes a part of the training runs. The README states that the estimated training duration on commodity hardware would be 1,700 years. We're not that patient :-)


Could you design a cryptocurrency that used neural network training as part of a proof-of-work function? Then you'd actually get something useful at the end instead of brute-force hash collisions.


You'd need to find some task that is expensive to compute, but trivial to verify to "port" current PoW methods, but the hard part is finding a task that fills this condition and is useful to actually compute.

In any case, you'd still be running a competition between nodes for fastest/best solution, with lots of duplicated effort and consumed resources. That is the key part of crypto consensus afaiu, and regardless of "backend task", there's always going to be waste in this architecture.


The expensive task in machine learning is the training, it can be verified by checking for performance improvement on a validation set. You can make use of the duplicated effort if everyone trains on different examples, then their learned models can be merged/ensembled together for better performance.

That's the ideal situation when everyone plays by the rules. To make that scheme work as a "crypto"currency, you'll also have to make sure that breaking the rules doesn't bring a benefit. E.g. the validation examples can't be known beforehand, otherwise someone could just overfit to them to get a higher score with less work. But I think it might be possible to make it work.


> it can be verified by checking for performance improvement on a validation set.

Not all gradient updates improve performance


Not all attempts to find a Bitcoin hash of the right difficulty succeed either. Miners would have to continue training until they are fairly certain that it has resulted in an improvement.


Which should be possible. For instance, your PoW could be "design a Go game bot neural network that's better than the system currently challenges you with".

Hard to make, easy to verify.


> easy to verify

This may be a lot harder than you think.

How do you decide that one AI plays better than another? By playing lots of games. These games must be random, i.e. not known beforehand, otherwise the new AI could cheat.

But then, every cryptocurrency node would use a different set of game settings for validation. Which means that some (unlucky) nodes may reject the new AI while others accept it. The result are regular net splits, making it very hard to establish an keep the global consensus that is absolutely needed for a stable cryptocurrency system.

The core issue here is that validation must be 100% deterministic. Each node must run exactly the same calculations for validation, otherwise the consensus network is doomed.


N seeds for the prng algorithm shipped with the program to be verified. Each seed provides for one game. N is sufficiently large to effectively remove "luck" for the skill comparison, and sufficiently large to make finding enough seeds allowing wins for a cheating weaker player more difficult than generating a better go player

Im assuming all games played against the incumbent; seeds are basically proof of wins. I guess you could still cheat though by copying the incumbent, get a 50-50 ratio of games, and then just find enough winners to make it 49-51.


I think this is a very interesting idea. Except I would probably use a random forest instead of ANN for something like this. It is trivially parallelizable, you can train on subsets of features, you can slit and merge your ensembles/forests and so on.


it becoming better in the recent times with projects like http://www.gitxiv.com/


This is super cool! As soon as I read the AlphaGo Zero paper I wanted to do a reimplementation as well, but apply the strategy to train a chess engine instead. Performance turned out to be way more important than I anticipated (I haven't built many big projects before) and I've been less inclined to work on it recently since I realized the necessary computing resources to train it properly would cost hundreds if not thousands of dollars in Google Cloud GPU time. I also wrote it in Python because I've done all of my TensorFlow work in Python so far.

https://github.com/emdoyle/chess_ai

That's my repo, if anyone wants to help me make it faster or possibly fix parts of it (hard to be sure the MCTS and NN are faithfully replicated) I would welcome pull requests!

Also I gitignored my tensorflow serving stuff but it's basically required to be able to serve the models and communicate via RPC to get predictions (the RPC client is visible in the repo).


I used to have a screensaver that played out recorded games of go when you were away from your computer. It'd be fun to have something similar for the training of this.

At the moment, it seems like it's outputting moves, but it'd be lovely to have a nice graphical representation.


was it this one? http://draves.org/goban/


>>> If you are wondering what the catch is: you still need the network weights.

That's why I'm very worried about neural nets stuff : you can share any code you want, the secret sauce (the weights) will be easy to hide. GPL don't quite apply here.

It's probably true of hundreds of other applications, but somehow, I have the feeling it's different with AI 'cos the data "embody" the behavior much more.


This is why lots of large companies are happy to open-source internal code, no? Because ultimately it's data which is valuable, not the algorithms.


If you compile the data inside the program it's probably GPLed too and must be made public as soon as the program is distributed. Workarounds: the data go inside an external library or are downloaded from a service. Then you're right to worry: the real algorithm would be secret.


> GPL don't quite apply here.

You can license your trained network data GPL if you choose. Applies fine.

You are right that the valuable IP is in the data, but this is no different to Doom without the WADs, or MAME without ROMS.


"Recomputing the AlphaGo Zero weights will take about 1700 years on commodity hardware"

Oh well...

As much as the hardware advances are important, maybe there will be better ways to train networks with limited hardware

(training even a small CNN with limited resolution inputs on a CPU is slow)


Keep reading. It already has the capability for you to contribute using your local machine, folding@home style.


Didn't the Zero version from Google train on something like 10 TPU's? Or is it only inference?

Seems just the CPU or even GPU aren't really suited for this kind of task


"Train" as in do the gradient computation: 64 GPU.

Generate the data to do that in the first place: 2000 TPU, each about 45 TFLOPS (or 2-4 x a GTX 1080 Ti).


AlphaGo Zero trained on 2000 TPUs


I was a bit confused by the title. They really shouldn't name a game after a popular programming language.


I wonder why we’ve settled on the Japanese name for the game rather than the Chinese (weiqi) or Korean (baduk) names. Using one of those would make it a lot less confusing.


The game came to Europe and America from Japan, and for decades the only translated foreign literature on it was Japanese.

Whenever I do a google search I use "baduk" instead. I think most people writing about go are smart enough to throw that word in there somewhere.


I do the same sort of thing when searching whenever I run into trouble. Even before Google introduced their language, it was a nearly unsearchable name.


the game is a bit older


Please tell me you did that on purpose.


We will never know. Hack News is filtering Emojis.


A total of 22593 games have been submitted. 416 clients have submitted games. You can get see live stats here: http://zero.sjeng.org/


> Recomputing the AlphaGo Zero weights will take about 1700 years on commodity hardware

We're seeing the reversal of PC revolution and most people don't seem to notice or care.


On the contrary.

Google achieved such high computational power from building custom chips. No one in the 90s could have afforded the custom ASICs that ran "Deep Blue", and certainly no one except Disney in the 90s could afford the giant cluster of computers used to compute "Toy Story".

The story of "PC Revolution" is always Big Company spends billions on custom ASICs -> hardware companies take notice -> NVidia releases "Volta" which emulates the TPU in 2018. (https://devblogs.nvidia.com/parallelforall/inside-volta/)

Soooo yeah. We're actually really, really close to consumer-level TPUs. Within a year or two. Google's secret sauce is being replicated by hardware companies now, and consumers will likely reap the rewards.

There's always a few years of delay between state-of-the-art custom machines and wide-spread adoption of consumer hardware.

-----------

Despite their success in these games, I'm personally not really sold on the broad applicability of Neural Networks. But hey, I'm willing to be proven wrong. There's a lot of experimentation being done and the "TPU" architecture is definitely very interesting (and the way "Tensors" map to simple linear-algebra problems is really what makes all of this possible).

I really wouldn't be too gloomy about Google's headstart. They did invent the TPU architecture after all, its only fair that they get to play with it for a few years before the mass market consumers can.


I could swear Toy Story was a Pixar production long before it was Disney...


What's Google's secret sauce with these datasets anyway? How are they planning to monetize being the best go player in the world? I don't understand the incentive for not releasing it.


Looks very interesting! I'm an OpenCL beginner so I really like to see how other people are actually doing OpenCL development.

Some notes:

* OpenCL code is stored in strings: https://github.com/gcp/leela-zero/blob/master/src/OpenCL.cpp

* OpenCL Build Flags are: m_program.build("-cl-mad-enable -cl-fast-relaxed-math -cl-no-signed-zeros -cl-denorms-are-zero");

Which is telling: I've been struggling with -O2 on my AMD R9 290x, but it seems common knowledge that the AMD OpenCL optimizer is basically unusable. No -O2 to be found here. I guess that's how people have been dealing with that issue...


The alphago paper contains a reasonable amount of detail, but not nearly enough to allow a true reimplementation. This should be described as "an attempt to reproduce" or such. It should be expected to differ in details, and quite posssibly some of those details might turn out to be important!


Does "fairly faithful reimplementation" not reflect this nuance? I'm not sure what you think was missing from the Alpha Go Zero paper in terms of methods, BTW. The whole thing was fairly simple [1] and all the knowledge is discovered. I think that was DeepMind's point.

Of course if Alpha Go Zero had some proprietary secret sauces that they omitted from the paper and that are needed for things to converge or whatever, you are right.

[1] I mean if you look at the code, most of it is a fast Go engine, and code to compute a residual network stack on the GPU. These aren't things that the DeepMind paper has to spell out.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: