I would guess that any interesting SIMD hardware would expose shuffles in one form or another since they're so powerful. I would even consider defining "interesting SIMD hardware" as one that exposes shuffles. (Example of the power of shuffles: http://software.intel.com/en-us/articles/using-simd-technolo... which uses shuffles to make super-fast 4x4 matrix multiplies (very common for games).)
Prior art:
- LLVM's shufflevector intrinsic http://llvm.org/docs/LangRef.html#shufflevector-instruction (not meant for human consumption, but is an example of a multi-architecture backend with SIMD shuffle support)
- OpenCL's shuffle https://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/s... (Also note the `.s0123` "swizzle" syntax there.)