> Just for reference Nvidia's flagship GPU(3090)'s FP32 performance is 35.5 TFLO...

codelord · on June 9, 2021

It's not really fair to compare a discrete GPU to a mobile GPU, I only provided this as a comparison for someone who maybe has one of these at home. And btw, you are talking about TF32 performance not FP32. TF32 actually uses 16 bits. A100's FP32 performance is actually lower than 3090, it's 19.5 TFLOPS: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Cent...

FP16 performance is also relevant as a lot of people now train in FP16. The default for pytorch/TF is still FP32.

rubatuga · on June 9, 2021

TF32 is 19 bits! Default for pytorch and TF is TF32!

codelord · on June 9, 2021

Oops. You are right.

6gvONxR4sf7o · on June 9, 2021

I’ve been mostly using TPUs lately and non-TF32 GPUs at work, so I don’t have any practical experience with TF32, but the sales pitch seems pretty good. Do you have any personal experience on whether it’s as much of a drop in replacement for fp32 as they suggest?

codelord · on June 9, 2021

I haven't used TF32 personally, but I think the sales pitch is not too far off. Most of the time I use mixed-precision training which should be similar to FP16/TF32 in terms of performance. It does tremendously speed up training.

fartcannon · on June 9, 2021

The free part is limited to a set number of hours per month.