How would immutability help with performance? (I'm not trying to ask a leading question, just curious) Value types are obviously nice for better cache performance (less pointer chasing, more linear data) but I generally associate immutability with better correctness but worse performance (not much worse, but worse).
edit: On topic, this is really, really cool. I was playing around with SIMD intrinsics in C++ a few days ago (and realized that the compiler was in most cases generating equal or better code with the auto-vectorizer than me using intrinsics) so I have kind of a new-found interest in the topic.
First, making them immutable means you don't have to worry about memory barriers. That could be huge for data that's being shared by multiple threads. While these types logically have multiple elements, in reality they all fit within a single SSE register. Meaning the performance cost associated with having to worry about mutability could easily annihilate any potential performance boost you might get from being able to fiddle with the vector unit.
(Guess one-and-a-half is that, since these values are meant fit in a single CPU register, they're really more analogous to atomic types than they are to objects, anyway.)
Second guess is that it's more of a "pit of success" thing than a performance thing. Mutable value types in .NET are really problematic. I've seen so many bugs resulting from them that nowadays I consider them grounds for automatic rejection in code reviews.
The immutable design came from the class library folks (not my team [the JIT folks]). I believe the analogy to atomic types (integers aren't mutable) is pretty sound. The API really was cleaner by making them immutable. If you want to allow mutation, the API surface area really explodes, resulting in dramatically more JIT work to make them perform well.
I would guess this is due to the underlying ISAs. Most SIMD ISAs don't have a quick way to, say, replace the 9th byte in a 16 byte vector. Since the whole point of SIMD is to make stuff go faster, it makes perfect sense to omit operations which are slow.