Don't now if NVIDIA is keen to understand their hardware, but they are obviously very interested in users understanding their hardware. I originally started i86 assembler and stopped after the i860, which as far as I remember was the first Intel processor with branch prediction. It's a nightmare for control freaks, especially on CISC processors with variable clock cycles.
GPU programming with CUDA and PTX feels like programming on a single core CPU without tasks and threads with deterministic behavior but in a multidimensional space. And every hour spent avoiding an 'if' pays off in terms of synchronization and therefore speed.
GPU programming with CUDA and PTX feels like programming on a single core CPU without tasks and threads with deterministic behavior but in a multidimensional space. And every hour spent avoiding an 'if' pays off in terms of synchronization and therefore speed.