SIMD design itself isn't constant between different processor families. Any purported standardized language for scalar/vector hybrid either has to rely on a smart optimizer or be utterly platform specific.
> SIMD design itself isn't constant between different processor families. Any purported standardized language for scalar/vector hybrid either has to rely on a smart optimizer or be utterly platform specific.
That is indeed part of the problem. There might be enough lowest-common-denominator there to standardize, like there is with atomics, I don't know, but I'm not saying that C needs to add SIMD support. I'm saying that any low-level language needs to directly expose machine functionality, which includes some SIMD stuff on some classes of processor.
Maybe there will be a shakeout, like how scalar processors largely shook out to being byte-addressable machines with flat address spaces and pointers one word size large, as opposed to word-addressable systems with two pointers to a machine word (the PDP-10 family) or segmented systems, like lots of systems plus the redoubtable IBM PC. C can definitely run on those "odd" systems, which weren't so odd when C was first being standardized, but char array access definitely gets more efficient when the machine can access a char in one opcode. (You could have a char the same size as an int. It's standards-conformant. But it doesn't help your compiler compile code intended for other systems.) C could standardize SIMD access once that happens. However, it would be nice to have a semi-portable high-level assembly which targets all 'sane' architectures and is close to the hardware.
You’re mistaken about the PDP-10. Yes, you could pack two pointer-to-word pointers into a single word; but a single word could also contain a single pointer-to-byte. See http://pdp10.nocrew.org/docs/instruction-set/Byte.html for all the instructions that deal with bytes, including auto-increment! And bytes could be any size you want, per pointer, from 1 to 36 bits.