If what's needed is speed, neither Java nor C# are optimal choices.
True, but if you're just trying to get some numerical speed out of some specific routines in a larger .NET application then having access to SSE like this could still be very helpful. Calling into native libraries from .NET isn't necessarily a performant option because of the cost of marshaling data back and forth between the managed and unmanaged memory spaces.
The end result can be pretty significant; from my own experience I'm usually pretty hard-pressed to come up with a C++ implementation that can beat the C# code it intends to replace outside of a microbenchmark. If the C# code now has the option of banging on SSE then I'm not sure it'll even be worth trying to trot out C++.
I agree. This is clearly the way to go for the most common use-case ("I don't want to write C/C++, but I want to make this thing faster"). Having support for SIMD in the standard library, even if at the expense of specific types (like NumPy) is definitely the way to go.
You can, and I've generally had better luck with unsafe code than with C++ code. Unsafe code creates GC overhead, though, so it can also end up doing more harm than good if you're not careful. It's another spot where I've found that microbenchmarks can be misleading - the performance cost that pinning incurs is insidious and hard to measure.
Usually in the managed world you have references that point to an object. The objects themselves don't live in the same place forever, they may be moved by the GC (to consolidate "holes" in memory). References reflect that movement so you don't notice. However, when using unsafe code (which has pointers and pointer arithmetic) you need to keep the objects in place. That's pinning and it essentially forces the GC to work around those islands of pinned objects.
True, but if you're just trying to get some numerical speed out of some specific routines in a larger .NET application then having access to SSE like this could still be very helpful. Calling into native libraries from .NET isn't necessarily a performant option because of the cost of marshaling data back and forth between the managed and unmanaged memory spaces.
The end result can be pretty significant; from my own experience I'm usually pretty hard-pressed to come up with a C++ implementation that can beat the C# code it intends to replace outside of a microbenchmark. If the C# code now has the option of banging on SSE then I'm not sure it'll even be worth trying to trot out C++.