How would you like shuffles exposed? One of the things we really tried to do with this design is make sure it's NOT tied to one particular hardware implementation of SIMD.
I would guess that any interesting SIMD hardware would expose shuffles in one form or another since they're so powerful. I would even consider defining "interesting SIMD hardware" as one that exposes shuffles. (Example of the power of shuffles: http://software.intel.com/en-us/articles/using-simd-technolo... which uses shuffles to make super-fast 4x4 matrix multiplies (very common for games).)
Couldn't you expose machine-dependent stuff in a subclass?
SSE has a bunch of other useful instructions like PMOVMSKB (useful for fetching the result of vectorized comparisons, yay!), then there are string instructions (sometimes also useful outside of string processing), etc.
New versions (AVX-512) will also have mask registers for masked operations.
It sounds like they're trying to maintain platform independence while still abstracting platform specific low-level operations. If you utilize machine specific code in subclasses, how do you generate MSIL that remains independent?
If you really needed something like that, could you not use C++/CLI and expose those operations in your own unmanaged library? You will of course lose portability, but that seems like a possible work around.