Even if you add 1 unit of interference at each processing stage (and since rounding tends not to be malicious, you may well do better than that), you'd need 128 poorly-implemented processing stages for a 32-bit float to be reduced to mere 16-bit integer precision - but in practice, likely more.
When it comes to clipping or loss of data on the lower end, well, 32-bit floats have an 8 bit exponent (254 reasonable values); that means that the loudest full-precision unclipped signal is 765 dB (!) louder than the softest un-quantized signal. Even with mediocre centering, that's more than enough.
I don't think 64-bit audio is likely to be noticable, even for processing purposes, outside of really specialist kind of niches.
When it comes to clipping or loss of data on the lower end, well, 32-bit floats have an 8 bit exponent (254 reasonable values); that means that the loudest full-precision unclipped signal is 765 dB (!) louder than the softest un-quantized signal. Even with mediocre centering, that's more than enough.
I don't think 64-bit audio is likely to be noticable, even for processing purposes, outside of really specialist kind of niches.