Clearly this isn't the case. Plenty of neat C++ "reference implementation" code ends up 5x faster when hand optimized, parallelized, vectorized, etc.
There are some transformations that compilers are really bad at. Rearranging data structures, switching out algorithms for equivalent ones with better big-O complexity, generating & using lookup tables, bit-packing things, using caches, hash tables and bloom filters for time/memory trade offs, etc.
The spec doesn't prevent such optimizations, but current compilers aren't smart enough to find them.
Imagine the outcry if compilers switched algorithms. How can the compiler know my input size and input distribution? Maybe my dumb algorithm is optimal for my data.
Compilers can easily runtime-detect the size and shape of the problem, and run different code for different problem sizes. Many already do for loop-unrolling. Ie. if you memcpy 2 bytes, they won't even branch into the fancy SIMD version.
This would just be an extension of that. If the code creates and uses a linked list, yet the list is 1M items long and being accessed entirely by index, branch to a different version of the code which uses an array, etc.
If I know my input shape in advance and write the correct algorithm for it, I don't want any runtime checking of the input and the associated costs for branching and code size inflation.
That's my question. I'm also under the impression that optimizations CAN be made manyally, but I find it surprising that "current compilers aren't smart enought to find them" isn't improving
There are some transformations that compilers are really bad at. Rearranging data structures, switching out algorithms for equivalent ones with better big-O complexity, generating & using lookup tables, bit-packing things, using caches, hash tables and bloom filters for time/memory trade offs, etc.
The spec doesn't prevent such optimizations, but current compilers aren't smart enough to find them.