Hacker News new | past | comments | ask | show | jobs | submit login

GPU instructions sets change every year/generation.

Your software would not run next year if you directly targeted the instruction set.

NVIDIA does document their PTX instruction set (a level above what the hardware actually runs):

https://docs.nvidia.com/cuda/parallel-thread-execution/index...




> GPU instructions sets change every year/generation.

1. It's every two-to-three years.

2. It's not like they change into something completely different.

3. PTX is not a hardware instruction set, it's just an LLVM IR variant

4. I don't need to only target an instruction set directly, but it does help to know what instructions the hardware actually executes. And this is just like for CPUs (ok, not just like, because CPUs have u-ops, and I don't know that GPUs have those).


4. The actual instruction set is called SASS. You can view it with Nsight, which can show all 3 levels, CUDA+PTX+SASS:

https://docs.nvidia.com/nsight-visual-studio-edition/3.2/Con...


You can view it, but there's no documentation for it, nor a listing of all instructions. You need to guess what the instructions actually do. Sometimes it's not so difficult, like IADD3; but sometimes it's not at all trivial.


The only "documentation" we have is in the form of the binary utility docs, which has a list of SASS instructions

https://docs.nvidia.com/cuda/cuda-binary-utilities/index.htm...

though there is no guarantee this is exhaustive, no opcodes either (though you could reverse engineer it using cuobjdump -sass and a hex editing like I've been doing). I'm pretty sure some of the instructions in the list are deprecated as well (95% percent sure that PMTRIG does nothing >Volta)


...But literally the exact same is the case for CPUs and yet we do have public and constant instruction set for decades now?

Altough ofcourse CPU's instruction are also just a frontend api that behind the scenes is implemented using microcode, which probably is much less stable.

But the point is, if we could move one level 'closer' on gpus, just like we have it on cpus, it would stop the big buisness gate-keeping that exsists when it comes to current day GPU apis/libraries


That's not true, you can run a 30 years old binary on modern CPUs. The machine code you feed to the CPU didn't change much.

That's not true for GPUs, the machine code changes very frequently. You feed your "binary" (PTX, ...) to the driver, and the driver compiles it to the actual machine code of your actual GPU.


But that's not inherent due to gpus completely revolutionizing their aproach every generation, is it?

The main difference is that with cpu, the translation unit is hidden inside the cpu itself. With gpus, the translation is moved from the device to the driver.

Old openGL code also will run in card that is newer then code itself.

The only difference is that with the cpus, it's open standard what is the instruction set, while on gpus, instruction sets are defined by third parties (DX12,Vulkan,OpenGL) while it falls to nvidia to implement them.


OpenGL code is actually running on the CPU. From this CPU running code you can send shaders to the driver, in source code form or bytecode form, and the driver does the final compilation to GPU machine code.


Pedantry. Yes you are running your code on the cpu, but you are using opengl defined api to do so. Having a text generation and parsing, or having a function calls passed to/from drivers- those a simply implementation details


Modern CPUs being able to run 30 year old binaries as-is is pretty different than a GPU requiring a driver to compile byte code into machine code at runtime.


How is that different?


You seem like you're being purposefully obtuse about this, so I'm going to disengage after this. But, generally speaking, a programming language which compiles to machine code which is executed directly by the CPU is widely considered to have a significantly different architecture than one which is compiled to byte code to be executed by a virtual machine. This architectural choice has many downstream ramifications. And the difference in execution model between CPUs and GPUs is similar in nature and comes with similar ramifications. Such as, for example, operating system support for GPUs.


Being obtuse is when you don't accept currently popular bounds of abstractions are God given axioms.


PTX is more or less the same as x86. What's even better is that unlike x86, Nvidia allows you to see and step through the SASS instructions which PTX compiles into. Although SASS is not documented, seeing it alongside the PTX makes it possible to infer the meaning. In contrast to this you can't see the micro-ops running on x86 cpus (although people have inferred quite a lot about them).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: