Byte (endian) swapping is another one. GCC and Clang can recognize (some) implementations of that and replace with single instruction version (like bswap).
On the other hand until relatively recently they have been bad at recognizing loads/stores of larger integers implemented using byte loads and shifts and then optimizing that into larger loads (for architectures where unaligned loads are allowed). Doing it this way allows you to completely delegate any little/big/weird endian concerns to the compiler.