I think he's saying that the MMX versions of these functions actually perform worse than what the compiler could produce from the plain C code, because the compiler would be able to use SSE2 for all floating-point math. Hence, these elaborately hand-optimized code paths are actually "pessimizations".