Inference is only seconds on a GPU, but have a look at flops of modern GPUs to CPUs - matrix multiplications differ by two orders of magnitude. Seconds on the GPU is minutes on the CPU. And don’t forget inference needs to scale in the data center, it needs to run repeatedly for many users.