FP64 is what HPC is built on. F32 works on the cards too (same rate or faster). I don't know the status of F16 or F8.
Some architectures provide fast F16->F32 and F32->F16 conversion instructions so you can DIY the memory bandwidth saving - that always seemed reasonable to me, but I don't know if the AMD hardware people are/will go down that path.
Sure but Radeon cards are not HPC accelerators. A modest 7800XT for example, which would be a great card for SD, has 76 TFlops@FP16, 37TF@FP32 and 1.16TF@FP64.
Keeping all those FPUs busy is another problem and not easy, but in cases where it can be done FP32 is clearly desirable.
More importantly, if you specify FP16, yet the hardware only supports FP32, then the library should emit a warning but work anyway, doing transparent casts behind your back as necessary.
Some architectures provide fast F16->F32 and F32->F16 conversion instructions so you can DIY the memory bandwidth saving - that always seemed reasonable to me, but I don't know if the AMD hardware people are/will go down that path.