> My usual guess would be that you can often hope for a 50% speedup in a tight l...

nkurz · on Dec 16, 2013

While there is lots to be said for the maintainability of intrinsics, I have found inline assembly to be significantly better for performance. And this is precisely because it inhibits the compiler from blindly performing 'optimizations' in the section of code you've already optimized. This thread offers an example and some numbers: http://software.intel.com/en-us/forums/topic/480004

gillianseed · on Dec 16, 2013

I was under the impression that the parts of performance oriented programs which are typically converted to assembly are in essence small profiled hotspots like very tight loops, as such I doubt that there's any real performance to be had from high level optimizations in conjunction with that code as made possible by insintrics/extensions.

But I'm certainly no expert in this area, so take my opinion with a large grain of salt.