Hacker News new | past | comments | ask | show | jobs | submit login

> The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers write scalar code that is implicitly parallelized by compiler and hardware. On Intel GPUs, however, this abstraction has profound performance implications as the underlying ISA is SIMD and important hardware capabilities cannot be fully utilized

What? That makes no sense.

GPU processor cores are basically just SIMD with a different color hat. The SASS assebly simply has _only_ SIMD instructions - and with the full instrunction set being SIMD'ized, it can drop the mention of "this is SIMD" and just pretend individual lanes are instruction-locked threads .

So, an OpenCL compiler would do very similar parallelization on a GPU and on an Intel CPU. (It's obviously not exactly the same since the instruction sets do differ, and the widths are not the same, and Intel CPUs has different widths which could all act at the same time etc.)

So, the hardware capabilities can be utilized just fine.




Modern NVIDIA GPUs (since Volta) drop that pretence at the ISA level, each thread has its own instruction pointer there.

Your GPU ISA is scalar, not vector on modern NVIDIA machines.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: