The Cell processor was a very different architecture from x86. It sacrificed cac...

qayxc · on Dec 28, 2021

> NVIDIA GPUs are also innovative hardware, mentioned in the article, and CUDA is also just an extension of C

CUDA uses a completely different programming model from C. There is no such thing as a C virtual machine behind CUDA and just because the syntax is identical (to make adoption easier), doesn't mean semantics are.

> The problem doesn't lie with the language, it lies with the x86 processors and different trade-offs that companies like Intel must make

You are ware of the fact that the mentioned UltraSPARC Tx CPUs are also highly susceptible to Spectre? These CPUs feature up to 8x SMT ("hyperthreading" in Intel jargon) and thus are facing the same issues as x86 when it comes to parallelism and speculative execution.

The problems are evident across hardware architectures.

phdelightful · on Dec 28, 2021

CUDA's programming model is not /completely/ different than C. It's not even /very/ different. Most of the C abstract machine (what I think you meant when you wrote virtual machine) carries over directly.

What is quite different is the performance characteristics of certain constructs due to the underlying GPU architecture (esp. memory access and branches). Obviously, there are extensions related to GPU specific things, but those are quite few and far between (though important for performance). Most everything related to GPU control look and act like library functions.

qayxc · on Dec 28, 2021

> Most of the C abstract machine (what I think you meant when you wrote virtual machine) carries over directly.

Tomato tomato, yes I meant the abstract machine.

C's abstractions, however, do not carry over directly. C's programming model is strictly serial, whereas CUDA's model is task parallel.

CUDA assumes a memory hierarchy and separate memory spaces between a host and a device - both concepts are fundamentally unknown in the C programming model.

The lowest level of abstraction in CUDA is a thread, whereas threads are optional in C and follow rules that don't apply in CUDA (and vice versa). There's no thread hierarchy in C, type qualifiers like volatile are specified differently.

The assignment operator in CUDA is a different beast from C's assignment with very specific rules derived from the host/device-separation.

Function parameters behave differently between C and CUDA (e.g. there's no such thing as a 4KiB limit on parameters passed via the __global__ specifier, in fact no such mechanism even exists in C).

I could continue with the different semantics and rules of type qualifiers like "volatile" and "static", scopes, linkage, storage duration, etc. But I won't.

CUDA uses C++ syntax and a few extensions to make C(++) programmers feel at home and provide a heterogeneous computing environment that doesn't rely on API calls (like OpenCL does). That doesn't mean both environments share the same programming model and semantics, starting with the separation of host and device, which isn't a C concept.

phdelightful · on Dec 29, 2021

Yes, everything you write is true, it's a great list. I wish Nvidia had a section of their programming guide that succinctly stated the differences. I've been writing CUDA for a long time, and once you grok the host/device stuff, it's still mostly just C/C++. I've only been bitten by a few of these things a handful of times over the last 10 years, and only when doing something "fancy".

zwartwater · on Dec 28, 2021

This. Is why I love parsing HN’s comments. Thank you.

p_l · on Dec 28, 2021

Niagara doesn't do speculative execution, and their SMT model is wildly different from intel's - specifically, they are a variation of barrel cpu where you duplicate minimal amount of resources to hold state then execute X instructions per thread in round-Robin fashion. Similar setup is used on POWER8 and newer (which allows you to dynamically change the amount of threads available).

qayxc · on Dec 28, 2021

The CPUs still include speculative execution (starting with T3 Oracle introduced speculative and out-of-order execution in the S3 pipeline) and Oracle had to release patches to mitigate Spectre v1 and v2 vulnerabilities; see Oracle Support Document 2349278.1

tubby12345 · on Dec 28, 2021

NVVM IR compiles down to PTX which then is compiled by the driver to a shader language (or something like that).