> The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers write scalar code that is implicitly parallelized by compiler and hardware. On Intel GPUs, however, this abstraction has profound performance implications as the underlying ISA is SIMD and important hardware capabilities cannot be fully utilized
What? That makes no sense.
GPU processor cores are basically just SIMD with a different color hat. The SASS assebly simply has _only_ SIMD instructions - and with the full instrunction set being SIMD'ized, it can drop the mention of "this is SIMD" and just pretend individual lanes are instruction-locked threads .
So, an OpenCL compiler would do very similar parallelization on a GPU and on an Intel CPU. (It's obviously not exactly the same since the instruction sets do differ, and the widths are not the same, and Intel CPUs has different widths which could all act at the same time etc.)
So, the hardware capabilities can be utilized just fine.
What? That makes no sense.
GPU processor cores are basically just SIMD with a different color hat. The SASS assebly simply has _only_ SIMD instructions - and with the full instrunction set being SIMD'ized, it can drop the mention of "this is SIMD" and just pretend individual lanes are instruction-locked threads .
So, an OpenCL compiler would do very similar parallelization on a GPU and on an Intel CPU. (It's obviously not exactly the same since the instruction sets do differ, and the widths are not the same, and Intel CPUs has different widths which could all act at the same time etc.)
So, the hardware capabilities can be utilized just fine.