Disclosure: I work on Google Cloud. I want to highlight this paragraph from the ...

sabalaba · on Feb 12, 2018

Any plans to support AMD GPUs and the Radeon Open Compute project? The AI/ML community really needs viable alternatives to NVIDIA, otherwise they will continue to flex pricing power. Google, via TensorFlow, is in a phenomenal position to promote open source alternatives to the proprietary Deep Learning software ecosystem that we see today with CUDA/CuDNN.

londons_explore · on Feb 12, 2018

Google would happily accept patches to enable support for it.

AMD hopefully has a team writing such patches now. It makes business sense for them to do so.

Google is getting even more price gouging from Nvidia than the general public, and has even more incentive to level the playing field.

socceroos · on Feb 12, 2018

Or the opposite - they're getting nice savings in return for not actively developing or encouraging CUDA/cuDNN alternatives.

oneshot908 · on Feb 12, 2018

Did you guys ever reveal the internal math model of TPU 2?

We know V100 is FP16/FP32 on their tensor cores, when will you follow suit?

Edit: sort of, from https://www.theregister.co.uk/2017/12/14/google_tpu2_specs_i...

"32-bit floating-point precision math units for scalars and vectors, and 32-bit floating-point-precision matrix multiplication units with reduced precision for multipliers."

So what does "reduced" mean exactly?

boulos · on Feb 12, 2018

We still don’t document it exactly, but [1] shows that bfloat16 is supported on lots of ops.

[1] https://cloud.google.com/tpu/docs/tensorflow-ops

londons_explore · on Feb 12, 2018

It is documented here: https://github.com/tensorflow/tensorflow/blob/master/tensorf...

boulos · on Feb 12, 2018

That doesn’t prove that the chip operates at 16 bits. For example, we could do 18-bit multipliers (or anything >= 16) and still use 16-bit floats.

oneshot908 · on Feb 12, 2018

ATI demonstrated FP24 was frickin' awesome over a decade and a half ago. it wouldn't surprise me in the least if you went somewhere like that, but it perplexes me as to why you think that's secret sauce in any way long after ATI nearly destroyed NVIDIA with FP24 back in the early days of DirectX 9 and NV3x.

dogma1138 · on Feb 16, 2018

This isn’t exactly correct. ATI pulled a “fast one” and went with 24bit despite the initial DX9 spec called for 16/32 bit floats which NVIDIA followed.

Once DX9 was split into DX9b and c that “advantage” went away and NVIDIA proved that 16/32 bit was better, something that ATI also had to adopt once MSFT told them enough is enough.

24bit is only better as long as it can do everything 32bit can do and it’s advantageous to build a hardware with 24bit FPUs instead of 32bit FPUs that can also do 2x16bit ops per cycle.

Basically if the silicon cost allow you to put far more 24bit FPUs than 32/16bit ones.

And history proved that this isn’t the case.

For gaming eventually even 2:1 FPUs went away since they are costlier than only 32bit FPUs with promotion.

Maybe in the future we’ll have a 24bit FPU that can also do 3 8bit ops or 16bit+8bit op per cycle if it will be more beneficial than the current 2:1 16/32bit model.

oneshot908 · on Feb 17, 2018

I personally would stick to FP32 across the board for my ML efforts, but we have an entire cottage industry of people coming up with approximations to drive up perf and perf/W, all of which will prove irrelevant until Moore's Law runs out IMO. And even then, I'll still stick to FP32 personally. Speaking from direct experience, bulletproof mixed precision is tough.

londons_explore · on Feb 12, 2018

I don't think it is secret sauce. If you're gonna let customers send operations to these TPU's, one could figure out what kind of multiplier is used almost immediately upon inspection of a few inputs and outputs.

trevyn · on Feb 14, 2018

>We fundamentally want Google Cloud to be the best place to do computing.

Lower. Network. Egress. Pricing. By. Two. Orders. Of. Magnitude.

Market rate is close to $1 per TB outbound. Your rate is $80-$120 per TB. That's just embarrassing.

riku_iki · on Feb 12, 2018

> high-performance CPUs (including Intel Skylake)

Any plans for ryzen?

boulos · on Feb 12, 2018

We’re always exploring the best hardware for the dollar. We’re a founding member of OpenPOWER and to your question about AMD parts, we’ve previously (publicly) run Opterons when they were the best choice. At this time, we don’t have any announcements to make :).

But I’d like to note that even if we were to use parts internally at Google (or not!), that for Cloud what matters is market demand. If there really was enormous customer demand for say ARM64, then we would look into it, even if the rest of Google wasn’t interested.