There's some great information about how GPUs are organized here. It's interesting that GPUs and CPUs aren't that different, even though maximizing performance requires different considerations. SIMD lanes are just wider on the GPU, and you have to treat memory differently (registers are bigger, main memory has higher latency but better bandwidth, etc).
It's also funny that Nvidia lists the total number of SIMD lanes as "CUDA cores", meaning that a single compute unit with 64 SIMD lanes counts as 64 CUDA cores. That's cheating, if you ask me :P
It's also funny that Nvidia lists the total number of SIMD lanes as "CUDA cores", meaning that a single compute unit with 64 SIMD lanes counts as 64 CUDA cores. That's cheating, if you ask me :P