> you understand what the code intends, and the compiler does not
It seems to me that the best approach to this would be to feed more information to the compiler rather than writing assembly yourself. Otherwise you give up portability.
You are right. A lot of times you can rearrange the code in various ways, and the compiler will happily generate more efficient code.
When that's not possible, the way to keep it portable is by writing a generic C function, then writing an optimized version of the same function that will be compiled when that architecture is available.
At work it's rare to need to compile the same code for various architectures, but sometimes it happens.
That's generally better when it's possible, but it's not always possible.
If you do start writing assembly, you usually want to have it exist next to a higher-level version of the code that you can toggle on and off. This lets you maintain portability (just compile the higher-level version on platforms where you don't yet have assembly) and makes it easy to try out new optimizations that wouldn't require assembly.
It seems to me that the best approach to this would be to feed more information to the compiler rather than writing assembly yourself. Otherwise you give up portability.