KataGo and Leela Zero and all the other AIs certainly didn't cost that much (the...

qayxc · on June 11, 2020

> KataGo and Leela Zero and all the other AIs certainly didn't cost that much

And you base that claim on what exactly? Leela Go is trained by the community, which donates self-play resources. Just because you outsource your cost to volunteers doesn't mean it's free!

In order to get to a realistic estimation, you'd need to get the average cost for electricity, hardware cost (proportionate to use), and of course opportunity costs.

Since you cannot do that, I'd argue that you have no clue what the true training cost of these projects compared to on-demand/cloud costs really are.

mafuy · on June 11, 2020

Leela Zero used some commercial providers for training. Mostly, we used the free offers for new GCP/AWS/etc members, so it's only a certain fraction of course.

To provide an order of magnitude: There are about 20m training games. Iirc a V100 could complete one game in something like a minute. So that's 300k-ish hours of value. Obviously, while V100 were the fastest, other GPUs were more cost efficient.

conistonwater · on June 11, 2020

Go is a small community, and there's just no chance they managed to donate $36M worth of anything, whatever the electricity and hardware costs, that's just too much money. KataGo's page says it was developed using resources donated by Jane Street, the company that its developer worked for; the general magnitude of numbers is also way off: sure they're a quantitative trading firm, but it's implausible that they'd donate $36M to develop a go AI.

qayxc · on June 11, 2020

The cost estimate was based on what what you and I would have to pay if we were to train AlphaGo Zero in 40 days on the given hardware using the reported number of games and resources.

Just replace the self-play TPU resources with commodity hardware or even just cheaper GPU compute providers and you'd reduce cost 10-fold by just not using TPUs. Same goes for the number of self-play games.

That still doesn't change the estimate itself. IF the other projects would've used Google TPUs, they'd well have been around the same cost as the estimate.

I really don't understand what you're trying to argue against here.

lozenge · on June 11, 2020

Replicating the experiment would normally mean running the same code for the same length of time, which is what the article measures. If you're using different approaches like KataGo, it's not a replication any more.

How is Leela Zero stronger if it's the same calculations and less compute time?