The Cell processor was a very different architecture from x86. It sacrificed cache coherency and required the programmer to manually manage each core's cache, in exchange for state-of-the-art performance. This was all done in C (although FORTRAN compiler was also available, of course). The Cell processor simply introduced new intrinsic functions to the C compiler, to allow the programmer to access the new hardware functionality. It all works perfectly fine with the rest of C, although people felt it was too difficult to program and the architecture quickly went extinct.
NVIDIA GPUs are also innovative hardware, mentioned in the article, and CUDA is also just an extension of C. CUDA is wildly popular, and lots of higher level abstractions had been built on top of it. The only thing lower level than CUDA is the NVVM IR code which is generated by the C compiler (eg. LLVM NVVM backend) and is only compiled into final machine code by the GPU driver at run-time. So C is the lowest level.
The problem doesn't lie with the language, it lies with the x86 processors and different trade-offs that companies like Intel must make, such as trying to sell processors to developers who have been instructed by their employers to be productive and use a "safe" and high level language (e.g. Java CRUD application developers, or JavaScript web developers, etc).
> NVIDIA GPUs are also innovative hardware, mentioned in the article, and CUDA is also just an extension of C
CUDA uses a completely different programming model from C. There is no such thing as a C virtual machine behind CUDA and just because the syntax is identical (to make adoption easier), doesn't mean semantics are.
> The problem doesn't lie with the language, it lies with the x86 processors and different trade-offs that companies like Intel must make
You are ware of the fact that the mentioned UltraSPARC Tx CPUs are also highly susceptible to Spectre? These CPUs feature up to 8x SMT ("hyperthreading" in Intel jargon) and thus are facing the same issues as x86 when it comes to parallelism and speculative execution.
The problems are evident across hardware architectures.
CUDA's programming model is not /completely/ different than C. It's not even /very/ different. Most of the C abstract machine (what I think you meant when you wrote virtual machine) carries over directly.
What is quite different is the performance characteristics of certain constructs due to the underlying GPU architecture (esp. memory access and branches).
Obviously, there are extensions related to GPU specific things, but those are quite few and far between (though important for performance). Most everything related to GPU control look and act like library functions.
> Most of the C abstract machine (what I think you meant when you wrote virtual machine) carries over directly.
Tomato tomato, yes I meant the abstract machine.
C's abstractions, however, do not carry over directly. C's programming model is strictly serial, whereas CUDA's model is task parallel.
CUDA assumes a memory hierarchy and separate memory spaces between a host and a device - both concepts are fundamentally unknown in the C programming model.
The lowest level of abstraction in CUDA is a thread, whereas threads are optional in C and follow rules that don't apply in CUDA (and vice versa). There's no thread hierarchy in C, type qualifiers like volatile are specified differently.
The assignment operator in CUDA is a different beast from C's assignment with very specific rules derived from the host/device-separation.
Function parameters behave differently between C and CUDA (e.g. there's no such thing as a 4KiB limit on parameters passed via the __global__ specifier, in fact no such mechanism even exists in C).
I could continue with the different semantics and rules of type qualifiers like "volatile" and "static", scopes, linkage, storage duration, etc. But I won't.
CUDA uses C++ syntax and a few extensions to make C(++) programmers feel at home and provide a heterogeneous computing environment that doesn't rely on API calls (like OpenCL does). That doesn't mean both environments share the same programming model and semantics, starting with the separation of host and device, which isn't a C concept.
Yes, everything you write is true, it's a great list. I wish Nvidia had a section of their programming guide that succinctly stated the differences. I've been writing CUDA for a long time, and once you grok the host/device stuff, it's still mostly just C/C++. I've only been bitten by a few of these things a handful of times over the last 10 years, and only when doing something "fancy".
Niagara doesn't do speculative execution, and their SMT model is wildly different from intel's - specifically, they are a variation of barrel cpu where you duplicate minimal amount of resources to hold state then execute X instructions per thread in round-Robin fashion. Similar setup is used on POWER8 and newer (which allows you to dynamically change the amount of threads available).
The CPUs still include speculative execution (starting with T3 Oracle introduced speculative and out-of-order execution in the S3 pipeline) and Oracle had to release patches to mitigate Spectre v1 and v2 vulnerabilities; see Oracle Support Document 2349278.1
NVIDIA GPUs are also innovative hardware, mentioned in the article, and CUDA is also just an extension of C. CUDA is wildly popular, and lots of higher level abstractions had been built on top of it. The only thing lower level than CUDA is the NVVM IR code which is generated by the C compiler (eg. LLVM NVVM backend) and is only compiled into final machine code by the GPU driver at run-time. So C is the lowest level.
The problem doesn't lie with the language, it lies with the x86 processors and different trade-offs that companies like Intel must make, such as trying to sell processors to developers who have been instructed by their employers to be productive and use a "safe" and high level language (e.g. Java CRUD application developers, or JavaScript web developers, etc).
edit: typos