Hacker News new | past | comments | ask | show | jobs | submit login
C Is Not a Low-level Language (2018) (acm.org)
204 points by goranmoomin on Dec 28, 2021 | hide | past | favorite | 192 comments



The Cell processor was a very different architecture from x86. It sacrificed cache coherency and required the programmer to manually manage each core's cache, in exchange for state-of-the-art performance. This was all done in C (although FORTRAN compiler was also available, of course). The Cell processor simply introduced new intrinsic functions to the C compiler, to allow the programmer to access the new hardware functionality. It all works perfectly fine with the rest of C, although people felt it was too difficult to program and the architecture quickly went extinct.

NVIDIA GPUs are also innovative hardware, mentioned in the article, and CUDA is also just an extension of C. CUDA is wildly popular, and lots of higher level abstractions had been built on top of it. The only thing lower level than CUDA is the NVVM IR code which is generated by the C compiler (eg. LLVM NVVM backend) and is only compiled into final machine code by the GPU driver at run-time. So C is the lowest level.

The problem doesn't lie with the language, it lies with the x86 processors and different trade-offs that companies like Intel must make, such as trying to sell processors to developers who have been instructed by their employers to be productive and use a "safe" and high level language (e.g. Java CRUD application developers, or JavaScript web developers, etc).

edit: typos


> NVIDIA GPUs are also innovative hardware, mentioned in the article, and CUDA is also just an extension of C

CUDA uses a completely different programming model from C. There is no such thing as a C virtual machine behind CUDA and just because the syntax is identical (to make adoption easier), doesn't mean semantics are.

> The problem doesn't lie with the language, it lies with the x86 processors and different trade-offs that companies like Intel must make

You are ware of the fact that the mentioned UltraSPARC Tx CPUs are also highly susceptible to Spectre? These CPUs feature up to 8x SMT ("hyperthreading" in Intel jargon) and thus are facing the same issues as x86 when it comes to parallelism and speculative execution.

The problems are evident across hardware architectures.


CUDA's programming model is not /completely/ different than C. It's not even /very/ different. Most of the C abstract machine (what I think you meant when you wrote virtual machine) carries over directly.

What is quite different is the performance characteristics of certain constructs due to the underlying GPU architecture (esp. memory access and branches). Obviously, there are extensions related to GPU specific things, but those are quite few and far between (though important for performance). Most everything related to GPU control look and act like library functions.


> Most of the C abstract machine (what I think you meant when you wrote virtual machine) carries over directly.

Tomato tomato, yes I meant the abstract machine.

C's abstractions, however, do not carry over directly. C's programming model is strictly serial, whereas CUDA's model is task parallel.

CUDA assumes a memory hierarchy and separate memory spaces between a host and a device - both concepts are fundamentally unknown in the C programming model.

The lowest level of abstraction in CUDA is a thread, whereas threads are optional in C and follow rules that don't apply in CUDA (and vice versa). There's no thread hierarchy in C, type qualifiers like volatile are specified differently.

The assignment operator in CUDA is a different beast from C's assignment with very specific rules derived from the host/device-separation.

Function parameters behave differently between C and CUDA (e.g. there's no such thing as a 4KiB limit on parameters passed via the __global__ specifier, in fact no such mechanism even exists in C).

I could continue with the different semantics and rules of type qualifiers like "volatile" and "static", scopes, linkage, storage duration, etc. But I won't.

CUDA uses C++ syntax and a few extensions to make C(++) programmers feel at home and provide a heterogeneous computing environment that doesn't rely on API calls (like OpenCL does). That doesn't mean both environments share the same programming model and semantics, starting with the separation of host and device, which isn't a C concept.


Yes, everything you write is true, it's a great list. I wish Nvidia had a section of their programming guide that succinctly stated the differences. I've been writing CUDA for a long time, and once you grok the host/device stuff, it's still mostly just C/C++. I've only been bitten by a few of these things a handful of times over the last 10 years, and only when doing something "fancy".


This. Is why I love parsing HN’s comments. Thank you.


Niagara doesn't do speculative execution, and their SMT model is wildly different from intel's - specifically, they are a variation of barrel cpu where you duplicate minimal amount of resources to hold state then execute X instructions per thread in round-Robin fashion. Similar setup is used on POWER8 and newer (which allows you to dynamically change the amount of threads available).


The CPUs still include speculative execution (starting with T3 Oracle introduced speculative and out-of-order execution in the S3 pipeline) and Oracle had to release patches to mitigate Spectre v1 and v2 vulnerabilities; see Oracle Support Document 2349278.1


NVVM IR compiles down to PTX which then is compiled by the driver to a shader language (or something like that).


This is a rewording of the original title, which is:

C Is Not a Low-level Language Your computer is not a fast PDP-11.

"Trying to expose a PDP-11" is misleading here, because how it's written in this title suggests that C is not a low level language because it fails at exposing a PDP-11.

Rather, the article suggests that modern processors are made in a way to expose abstractions which are similar to a fast PDP-11, which lacks real ways to represent their true complexity.

It's an interesting article, but this misleading title might give the wrong idea.


This article presents a false premise.

Intel (and other) processors present a serial programming model in spite of parallelism under the hood because that is required for stable semantics in assembly language programming, and for the instruction set architecture to be a stable target for any higher level language whatsoever. It's not because of the expectations of the C programming model.

The unoptimized, instruction by instruction fetch-decode-execute model of the instruction set pins down what the code means, which is super important, otherwise there is chaos.

Moreover, machine language executables must continue to work across evolution of the architecture family. The way an new Intel processor is today has as much to do with C compilers as it does with the need for someone out there to run MS-DOS or Windows 95.

Compilers could better deal with chaos at the architecture level, because breakages at the architecture level can be treated as a new back-end target, and code can be recompiled. It's the code that doesn't get recompiled that you have to worry about.


>Moreover, machine language executables must continue to work across evolution of the architecture family. The way a new Intel processor is today has as much to do with C compilers as it does with the need for someone out there to run MS-DOS or Windows 95.

Apologies, but I think this belief that compatibility must be maintained is false. Apple has made three incompatible binary architecture transitions and dealt with the issues via emulation. Each time, the performance gains from the new architecture have masked the inefficiencies of the emulation layer until applications could be rebuilt. It just takes a willingness to say no to legacy compatibility. Microsoft is finally trying to take a similar step with WinARM. I suspect the x86 architecture will get relegated to the dust bin within a few more years. The instruction set is just too polluted and chaotic at this point. As an example, the Apple M1 instruction decoder is much wider than x86 and part of why that was practical to implement in a power efficient way is that the M1 instruction set is much less variable in length.


But converting (even just-in-time) between two architectures is not that hard — if every new code could get some better performance/efficiency at minimal overhead for “legacy” applications, it may well worth it.


"Low level" is relative. The article says that there doesn't exist any "low level" languages that programmers can use, this isn't a very helpful definition. If we go by the languages that programmers actually have access to, then machine code is as close as you get to the metal, and C maps very well to machine code. So by any reasonable definition C is a low level language for programmers. For a hardware engineer who works on CPU architecture it is of course different, but we are talking from a programmers perspective here.


Sigh...

The point the article is making is this is a "low level" only in the programmer's mind. The point is that for example the mental model you have when programming in python is one mental model and the model you have for C is another. C's might be faster in most implementations, but it is still a mental model and it is a mental model that is somewhat removed from what actually happens when your cpu executes instructions.

People have trained themselves to think in the Python (or js or java or etc etc) mental model while remembering that it doesn't map to what actually happens 100% People have NOT made that leap for C because people still labor under the delusion that their CPU is a fast PDP-11.


Most of the time I've written performance sensitive C or C++, I spend most of the time looking at disassembly or pipeline models, trying to spin up an instruction mix that will utilize the CPU backend in a good way.

In the best case, the compiler finds the instructions I want, and often it does a very good job with details like register allocation.

So the abstraction leaks and inverts like hell. But still, in many cases the C program can still be portable, even if none of the performance tuning is - it targets the wrong pipeline and memory subsystem traits.

When you need to rely on intrinsics, it is different - but I'd say intrinsics are closer to inline asm than C. Saying that as someone who might have ported a DSP library from SSE4 to AltiVec with mostly #define. I know. Also I'm happy it wasn't the other way around!


What is a "pipeline model"?

Google doesn't give me any relevant results

EDIT: Think I found an answer

  > "Some very experienced programmer from another company told me about some low-level code-optimization tips that targeting specific CPU, including pipeline-optimization, which means, arrange the code (inlined assembly, obviously) in special orders such that it fit the pipeline better for the targeting hardware."
From https://stackoverflow.com/questions/14657247/pipeline-optimz...

Answer in comments point to this as a resource:

https://www.agner.org/optimize/


What I think of as "pipeline model" is similar to what LLVM-MCA produces in its "Timeline View" [1]

It basically tries to model statically how instructions travel through the pipeline of a particular CPU, which is useful for finding bottlenecks.

1: https://llvm.org/docs/CommandGuide/llvm-mca.html#timeline-vi...


GP probably means something like IACA


Slightly off topic question: how can I get started with pipeline models and with optimizing the instructions that get executed? What tools would one use to inspect what gets executed, trace the instructions and measure the execution time?

Could you recommend any books on this topic? Thanks!


Ah, one resource that slipped my mind was Jon Stokes' microarchitecture articles [0] at Ars Technica when it was still good (it's all gadget/lifestyle/policy stuff nowadays).

Jon also has this book which I seem to remember was fairly good [1]. Don't be put off by the age - uarch on CPU side has mostly been more and more of the same for x86 chips.

0: https://arstechnica.com/author/hannibal/

1: https://www.goodreads.com/book/show/610830.Inside_the_Machin...


One good starting point is the LLVM Machine Code Analyzer [1]

What it does is use the scheduling info known to the LLVM optimizers to model how a particular CPU is going to execute your machine code.

I was lucky to start doing this back in the day when 486 was common and the Pentium was brand new. 80486 could do some instructions in parallel if you arranged them very carefully, and Pentium greatly boosted this capability. I used AMD CodeAnalyst (free) back then. I read Zen of Code Optimization by Mike Abrash which explained those particular microarchitectures very carefully. It may still be worth reading to understand how CPUs have evolved, but as this is uarch specific it will not be of great practical use.

The pipelines back then were simple enough to memorize so I spent some boring classes in senior high school plotting various software blitter algorithms on grid paper. Nowadays the superscalar capability is huge and you are better off taking a more statistical approach first - which execution units are stalled or underutilized - and see if you can tweak the instruction mix or find a false dependency that prevents register renaming.

For someone starting out I would recommend studying some smaller Arm chip that has limited superscalar capabilities. Sadly I can't name drop a book that would be a great help in that.

1: https://llvm.org/docs/CommandGuide/llvm-mca.html


Thank you so much. This is awesome info. I'll check out LLVM MCA and I've also ordered the book. Happy holidays!


compiler inspector is fun: https://gcc.godbolt.org/

perf related tools on linux: https://perf.wiki.kernel.org/index.php/Main_Page

Didn't read it but it's a well known reference: the dragon book (Compilers: Principles, Techniques, and Tools) 2nd edition has stuff on machine-dependent optimizations (chapters 10 and 11 apparently). Most likely a good read.


Context for others who may have not read "Compilers" by Aho: it's a great textbook and a wonderful resource to learn about compilers, but I wouldn't call it a "good read". A good read is "The Pragmatic Programmer", for some. The dragon book is raw knowledge and it's filled with proofs.

The content in the dragon book is the equivalent of a two semester course on compilers, and that's with a prof and TA. Ideally, to get the full benefit of reading this book, you need to reserve a year of your free time after 5pm and be ready to build an optimizing compiler.

It's one of those books that requires your full attention for an extended period of time and you come out the other side a stronger developer, only because it didn't kill you with knowledge.

Edit: I realized now you may be saying the 2 chapters (10 and 11) are most likely a good read to learn about optimizations, not that the entire book is a "good read" in general. Makes sense - I'll leave the comment up, with the disclaimer that I'm referring to reading the entire book cover to cover.


The dragon book is a bad compiler book.

CPU pipelines are extremely dynamic since the slowest actions affecting them (memory reads) are also the least predictable (depends on what ends up in the cache). So it’s usually not worth trying to control them so precisely, but knowing how the first layers work can be good. For x86 the best resources are Agner Fog’s manuals and then the official Intel/AMD ones.


Agreed with "usually". When you model pipelines, you must know the cache behaviour of your workload, otherwise it is a waste of time. But when you do, in order to approach theoretical machine limits, you do need airtight core resource utilization in your kinner loops.


> trying to spin up an instruction mix

But this may change from compiler to compiler and from version to version of the same compiler. I think you may be better off writing assembly code directly rather than trying to coerce the compiler to do exactly what you want.


You'd be surprised how stable it actually is in practice. The occasional big swings in benchmarks tend to be due to compiler A pattern-matching an idiom that compiler B does not. Regressions from A.1 to A.2 are rare, and usually either bugs or that optimizer default target has shifted and now neglects whatever uarch you regressed on.


> People have trained themselves to think in the Python (or js or java or etc etc) mental model while remembering that it doesn't map to what actually happens 100% People have NOT made that leap for C because people still labor under the delusion that their CPU is a fast PDP-11.

I don't see this, I worked on high performance software and everyone is aware that the CPU isn't just a fast PDP-11. This was before 2018, so teaching C programmers about their language isn't the point of the article, everyone who needs to care about performance already knows these things.


> The article says that there doesn't exist any "low level" languages that programmers can use, this isn't a very helpful definition.

This is not the definition used in the article. There is literally a paragraph called "what is a low level language" at the beginning:

> Think of programming languages as belonging on a continuum, with assembly at one end and the interface to the Starship Enterprise's computer at the other. Low-level languages are "close to the metal," whereas high-level languages are closer to how humans think. For a language to be "close to the metal," it must provide an abstract machine that maps easily to the abstractions exposed by the target platform. It's easy to argue that C was a low-level language for the PDP-11.

--

> C maps very well to machine code.

The article's core point is literally a contradiction to this. I'm not sure what to say. Perhaps you ought to read again or provide more arguments as to why you think this it is true.

> A modern Intel processor has up to 180 instructions in flight at a time (in stark contrast to a sequential C abstract machine [..])

> [..] the C abstract machine's memory model: flat memory. This hasn't been true for more than two decades.

> Unfortunately, simple translation providing fast code is not true for C.

--

> "Low level" is relative.

> For a hardware engineer who works on CPU architecture it is of course different, but we are talking from a programmers perspective here.

Point is being made in the articles that programmers do care about cache and pipeline behavior (to get decent performance in intensive parts) or about threading, both of which are transparent to C. And also that languages otherwise seen as "high-level" (usually because of memory management) sometimes have aspects that map better to these features (hence are lower level than C in some other aspects).


You say “ This is not the definition used in the article. There is literally a paragraph called "what is a low level language" at the beginning:”

Then say C fails this test of being low-level:

> A modern Intel processor has up to 180 instructions in flight at a time (in stark contrast to a sequential C abstract machine [..]) > [..] the C abstract machine's memory model: flat memory. This hasn't been true for more than two decades. > Unfortunately, simple translation providing fast code is not true for C.

None of this is true for assembly either. To my knowledge neither x86 nor ARM expose primitives to control runtime instruction parallelism nor cache. You just have to know what that looks like and how to adjust that implicitly as best you can (still with no guarantees). Maybe a CPU architecture like Vulkan would work where there’s low-level primitives and vendor-specific plugins but I don’t know. GPU programming has gotten significantly harder and error-prone with Vulkan. Additionally, game-programming doesn’t need pixel-perfect behavior while you do typically want that out of your CPU (yes it’s slightly inaccurate with GPU compute advancements, but that’s still significantly more expensive as a development target and reserved for problems where the value is worth it)


You'll find that you are indeed correct and that it is the point of the article that neither C nor assembly are compelling low level programing languages.


But by that definition, programming in binary isn't low level either! If by the definition, there is no low level programming possible, then it's not a very useful definition.

(I mean, the point that "it's higher level than you think" might be a reasonable one to make. But arguing about the definition of "low level" may not be the best way to make that point.)


I agree with your take, but I still find the article really informative regarding the relative high-levelness of x86 instructions and as someone else wrote: “while js, java, etc. developers learned that the programming model used by the language is quite distinct from the CPU’s one, C programmers like to live in their fantasy land where they just have a very fast PDP-11” (I may have combined a few sentences from the thread, but the point is the same)

By the way: what’s your take on VLIW architectures? Possibly with the currently hardware level optimizations moved to software?


So I think we could say that 1) C is low level, and 2) the distance between C and what's really going on is much larger than it used to be, partly because the distance between assembly opcodes and what's really going on is much larger than it used to be.

Re VLIW architectures: I don't know enough to have a take about them specifically. But I fear moving the optimizations from hardware to software, because it's going to be really hard to optimize as well as Intel/AMD/Arm do, and it's going to be really easy to mess something up. (I don't write assembly if I can help it - I let the compiler writer do it. I don't manage memory if I don't need to. And so on.)

I can see some people might benefit from this, but I suspect that it's bleeding-edge people only. Most of us won't gain from going this route.

Note well: Just my take. Your mileage may vary.


They do expose cache control operations, but they’re rarely useful. If you get a lot of cache misses then you can add prefetches but this is very sensitive to what processor you’re on so it’s not always worth it.

And of course if you’re writing EFI code sometimes you haven’t turned the RAM on yet and cache is what you’re executing out of.


> The article's core point is literally a contradiction to this.

No. The article's core point is that machine code maps poorly to what the CPU does.

Modern CPUs expose no low level access to the way they execute code.


> For a language to be "close to the metal," it must provide an abstract machine that maps easily to the abstractions exposed by the target platform.

C does map perfectly well to the abstractions exposed by x86 CPU. By this definition, C is a low level language. The author himself states that modern processors are fast pdp-11 emulators. This is exactly the "abstractions exposed by the target platform".

If C is not a low level language by that definition, then nothing is. Assembly uses the same PDP-11 abstract machine model that C does. Hence the parents point that this definition is useless.


> The author himself states that modern processors are fast pdp-11 emulators.

Not sure how to say it but they state the exact opposite! Modern processors are not pdp-11 emulators and people are led to believe that because of immense work done by compilers (hence not low level).

> Assembly uses the same PDP-11 abstract machine model that C does.

Not really (assembly has things for SMP memory consistency and cache management), but even that is irrelevant since the article's point is a call to stop shoehorning modern processor capabilities onto that old sequential model and designing an actual low level language with full access to these capabilities (explicitely and not implicitely as is currently done for eg ILP).


>Not sure how to say it but they state the exact opposite! Modern processors are not pdp-11 emulators.

The author very clearly states that modern processors _are_ presenting a PDP-11 interface. The author argues that they shouldn't be doing that, but all modern processors are still presenting a pdp-11 like abstract machine:

> The root cause of the Spectre and Meltdown vulnerabilities was that processor architects were trying to build not just fast processors, but fast processors that expose the same abstract machine as a PDP-11.


It's not done by the compilers, though. It's done by the CPU itself.


yeah low level is relative. Assembly would be the place to manipulate hardware, but then developer need to have knowledge of the chip design, etc.

Depending on what you want to do. One can also go a level lower with machine code, or physics as how some quantum computers are coming on the scene now days.


> C maps very well to machine code

Again, that depends. If you're a compiler engineer, it's likely you might disagree with that statement, because your job is to create passes that change that C into machine code that barely resembles the original source but performs the same operation, faster–except that "same operation" isn't even defined in terms of the machine code, it's defined in terms of an abstract machine that doesn't exist outside of the C standard. Does C really map well to machine code when it reorders your statements and enregisters your variables and unrolls your loops and inlines your functions? I think that really depends on where you're sitting on the ladder of relativity.


C is a 3-rd level of abstraction language. Currently, there is no 4-th level of abstraction, so C is a high-level language by definition of «level of abstraction».

Language levels of abstraction:

1. Machine code

2. Assembler

3. A high-level language

C does allow low level things, e.g. access of machine registers or embedding of assembler or machine code, but it is also capable of quite high level of abstraction.

Some of 3-rd level languages allows higher level of abstraction, or hide low level abstractions, e.g. memory management, stack, byte order, bits per number, etc., hover none of them improves programmer productivity by an order of magnitude.


Err... what? Did you make these terms up yourself, or are these established in some literature already? Because they make no sense.

C isn't converted into assembler language unless you ask the compiler to. For example, LLVM lowers IR AST structures into target specific machine code. Assembler is never involved.

ASM is just a human readable representation of machine code for a particular ISA. So in your mental model, C and ASM exist at the same level.

Further, Java bytecode would exist above the the machine code level (again, alongside C then, since the VM is compiled to machine code) and Java on top of that.

It just doesn't make any sense to define languages like this. I don't see any benefit, and I don't even see it as being correct.


perhaps he/she meant 3rd generation language. https://en.wikipedia.org/wiki/Third-generation_programming_l...


3 generation of languages were powered by increased level of abstraction, which was enabled by increased capabilities of hardware. 4th generation can (will?) be powered by AI.


Just because it's on Wikipedia doesn't mean it's a real thing. The only source for that page is some bizarro, circa-1998 website about "computer jargon". These are not academic terms that I've personally ever heard of. Happy to be wrong, but I'm not going to pretend they have any utility or legs.


3gl 4gl 5gl were common terms 30 years ago or so. More industry and advertising terms than academic. And 5gl was more like a Japanese government grant program than any particular set of language features IIRC.


My memory of the whole "nGL" language thing was that it was more a marketing term than anything you would find in an academic paper. It was particularly common in the IBM world, for example in books[1] and advertisements[2]

[1] https://www.oldcomputerbooks.com/pages/books/R504/james-mart...

[2] I can't find a good example, but I remember seeing ads for "4GL" tools in Dr Dobbs regularly in the mid-90s.


On PC, I remember platforms such as dBase were often referred to as 4GL. In retrospect, it feels like was a short way to say "very high-level domain-specific language for databases".


The "high-level lang, assembly, machine code" model is pretty much how it's taught in schools and random internet tutorials. As usual, the real world is a little more nuanced.


> C isn't converted into assembler language unless you ask the compiler to. For example, LLVM lowers IR AST structures into target specific machine code. Assembler is never involved

That depends on the compiler. GCC, for instance, most certainly emits assembler code as part of its standard compilation process.


GCC to my knowledge has long-switched to IR as well. It’s called GIMPLE [1]. Can you please substantiate your claim?

[1]. https://gcc.gnu.org/wiki/GIMPLE


Well, there this[1] page that describes the GCC architecture:

"The SSA form is also used for optimizations. GCC performs more than 20 different optimizations on SSA trees. After the SSA optimization pass, the tree is converted back to the GIMPLE form which is then used to generate a register-transfer language (RTL) form of a tree. RTL is a hardware-based representation that corresponds to an abstract target architecture with an infinite number of registers. An RTL optimization pass optimizes the tree in the RTL form. *Finally, a GCC back-end generates the assembly code for the target architecture using the RTL representation.* Examples of back-ends are x86 back end, mips back end, etc."

Then, there's the official GCC documentation[2]:

"Compilation can involve up to four stages: preprocessing, compilation proper, assembly and linking, always in that order. GCC is capable of preprocessing and compiling several files either into several assembler input files, or into one assembler input file; then each assembler input file produces an object file, and linking combines all the object files (those newly compiled, and those specified as input) into an executable file."

You can also just run GCC -- either with --verbose or --save-temps -- and observe what the compiler toolchain is doing.

[1] https://en.m.wikibooks.org/wiki/GNU_C_Compiler_Internals/GNU...

[2] https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Overall-Option...


Not really sure what you’re arguing for. All optimization passes operate on GIMPLE.

> SSA GIMPLE is low level GIMPLE rewritten in SSA form.

I’m by no means a compiler internals expert, but to my knowledge there’s no translation from RTL to assembly. The compiler is directly emitting the machine code directly without first emitting assembler. You have to ask the compiler to serialize to assembly explicitly.

> several assembler input files, or into one assembler input file; then each assembler input file produces an object file,

That doesn’t sound right. Seems more like an abstract description. You can’t dump assembly out of an object file either. You have to do a conversion from an intermediary form to get back out a form of the assembly.

I think at this point it’s splitting hairs but assembly is not used as an intermediary at any point, even for generating machine code.


> Not really sure what you’re arguing for. All optimization passes operate on GIMPLE.

You asked me for a citation. I provided two, as well as a means by which you could see for yourself that what I am saying is correct.

Believe what you want, but all of the GCC documentation, as well as the output of the compiler if you run with --verbose or --save-temps supports what I have described of GCC's operation.

The compiler internals manual[1] also describes the backend machine descriptions, and how they are used in detail:

"There are three main conversions that happen in the compiler:

1. The front end reads the source code and builds a parse tree.

2. The parse tree is used to generate an RTL insn list based on named instruction patterns.

3. The insn list is matched against the RTL templates to produce assembler code."

[1] - https://gcc.gnu.org/onlinedocs/gccint/Overview.html


Such definition is just useless and therefore should be abandoned.

> hover none of them improves programmer productivity by an order of magnitude.

I disagree. I can write efficient SQL statement in 2 minutes, but implementing efficient multiple "table" join with spill to disk took me much longer than 20 minutes.


Agreed. I can't really believe I am reading this discussion. Languages are just user interfaces. It's OBVIOUS that there's a difference between the granularity of the Java UI and the C UI. It's also OBVIOUS that such a difference would greatly impact the speed of use. I could write a garbage collector in C, in Java it's already there. The whole point is discussing those differences, so anyone saying they are the same has completely missed the point of the discussion.

It's like saying Windows 11 and OS/2 are the same because they are both Operating Systems.


What you're missing is the level of comparison. Everything is different from everything else if you're looking only at differences. But if you want to classify concepts, you need to determine what are the similarities. So, yes, if you're talking about objects in the category of operating systems, W11 and OS/2 are in the same category. If you're looking at languages that offer high level constructs, C and Java are at the same level. Languages that offer only direct machine instructions (assembly) are at an inferior level.


SQL is a special purpose language. Special purpose languages can be amazing (and SQL certainly is), but here we’re comparing general purpose languages.

In this context, the question is more, are Lisp or Haskell, or Python 10 times as productive as C? I used to think so, but now I seriously doubt it.

I mean, the reason I’m so much faster at writing small Python programs than I am at writing equivalent C programs, is because Python’s standard library is so damn huge. This is not a case of the language being more productive, this is a case of the work already being done.


If someone gave you a library of functions for doing table joins in C, it wouldn't take you much longer than those 2 minutes as well.


I'd argue that the class of languages running within a virtual machine like Java (JVM) or .NET (CLR) ought to be placed in a higher level of abstraction. These are so detached from the physical reality that they become cross-platform, which in itself is also the goal.


C is the most cross-platform language today, because almost every platform supports C. JVM is written in C, for example, for portability.


The most prevalent JVM is written mostly in C++, not C.

Your portability point mostly stands, though.


Respectfully: are these levels just something you created for this comment, or are they actually something you can back up with literature that agrees that they are the right levels or that they even exist at all?


These levels are learned by software engineers on introduction to compilers.

«A Compiler is a software that typically takes a high level language (Like C++ and Java) code as input and converts the input to a lower level language at once.»

«Assembler A translator from assembly language programs to machine language programs.

Compiler A translator from “high-level” language programs into programs in a lower-level language.

Transpiler A translator from high-level language programs into programs in a different high-level language.»

«The compiler is software that converts a program written in a high-level language (Source Language) to low-level language (Object/Target/Machine Language).»


That, in the conventional classification, C is considered a high level language, is definitely what I learned in some languages class. This seems like an odd result given that it is probably the least abstract language that the vast majority of programmers might ever encounter, outside the exercise or two on assembly that they might see in an architectures class. Sort of how, as far as I know, most (all?) PIC and AVR microcontrollers would be considered "very large scale integration" chips.

This the best possible outcome, though -- since they used up these kind of "biggest classification that doesn't sound totally silly" labels in the 70's and 80's or whatever, we don't have to worry about classifying languages anymore, which was not really all that productive to do anyway.

Although I do wonder what level Verilog should be considered.


I would define fourth level languages by what they allow you to do, or what foot-guns they provide you with.

For example, surely languages like Typescript are "level 4" as you cannot manipulate memory directly but only use the languages higher level type structures.


On fourth level of abstraction, no procedures or functions are used. Copilot is at 4th level.


The 1980's want their marketing back.

https://en.wikipedia.org/wiki/Fourth-generation_programming_...

They considered things like excel 4GL: You program some formulas and excel magically finds out what the necesarry steps are to do the computation.

A lot of vendors created graphical programming languages and declared them 4GL. The term mostly died because nobody could stop laughing.


The term died, the dream festered on. It got rebranded as model driven development andany in the embedded world bought into it in the late 90s, some have only ditched it in the last year or two, while plenty more have followed a second rebranding and keep drawing... There are some good graphical languages, lego mind storm, code spells, gnu radio companion, simulink, and maybe, sometimes, labview. Those languages aren't object oriented, they represent data flow. I suspect that makes a difference. But mostly they are for quick prototyping or small designs. And that makes an even bigger one. Diagrams don't scale well.


Level of abstraction ≠ Generation of languages.

The first 3 generation of languages were powered by higher levels of abstraction, which were possible because of better machines.

However, further increase in level of abstraction did not make developers an order of magnitude more productive, except for niche areas. Switching to a better development process (Scrum) can achieve order of magnitude improvement in developer performance, while switching of programming language cannot.

Anybody can claim that their product is a next gen revolutionary market shaking breakthrough. It's the self-promotion, not a science.


Erm...Copilot still writes procedures and data types, most of which I have to describe one by one.

And I still have to review all of them.

How is that "4th level abstraction" whatever the current usage of the term?


3rd level languages are compiled into 2nd level languages (assembler, or byte code, or intermediate representation), then into machine code.

Transpilers are compiling from one 3rd level language into another 3rd level language.

Code generators or macros are receiving data and producing code in a 3rd level language.

4th level language receives plain text explanation of a goal and produces code in a 3rd level language. AI based tool is the only tool which is capable to do that, because such process requires understanding of the goal, not a mechanical transformation of input into output. It's possible to implement such AI in Lisp, Prolog, or using ML. Copilot is an example of such AI.


>4th level language receives plain text explanation of a goal and produces code in a 3rd level language.

Such an explanation would read like this:

    I require some glue code so the system our company uses to pass messages between our services, can talk to the system of the company we talked to 2 days ago. Also, there should be a web interface or something to monitor it.
Not like this:

    A function that queries the user from the database by its surname


Such explanation can be like this:

  apt-get install foo bar baz
Just few words, but they can generate very complex ready to run systems.

Unfortunately, this is not a general-purpose language, but apt + shell is.


How is this the "explanation of a goal"?

I issue an imperative command to apt-get, and it tries to execute it.

It has no knowledge of why I want foo bar and baz installed, nor can it decide if these 3 programs make sense for what I am trying to do, or if there is a better way to do it (maybe there is a package "bazinga", that can replace the foo-bar-baz stack?).

A goal-aware package manager would look like this:

   super-apt "i need the system ready to function as a webserver, with a high availability database and backup capabilities"


> A goal-aware package manager would look like this:

A typical package manager is already goal-aware, like `make`. It's input DSL is far from English, but it's not hard to translate your English sentence into «apache postgress rsync» by matching text to package descriptions.

Package manager requires a solver for dependencies, constraints, and conflicts, which is far from trivial. For example, Fedora(RedHat) switched from in-house solver for yum/dnf to zypper SAT solver (libzypp) developed by SuSE/Ximian/Novell.

Unfortunately, package managers are DSL, not a general purpose language.


Which layer is Forth? xD


WTF is layer?

If you are asking what level of abstraction is used by Forth, then the answer is: 2nd. It directly exposes stack, memory, and machine code and provides a thin wrapper on top of that. Developers can build higher levels of abstraction then, of course. It's possible to do an object-oriented programming in machine code, because compiler can.

The trickier question is: What level is Lisp? Is it assembler for Lisp machine (because of opcodes like car, cdr, etc.)? Or, maybe, it is a 4GL because of it advanced meta-programming possibilities?


That does not describe all Forths. A Forth running on a Forth CPU would certainly be a 2nd level, but depending on how the Forth is implemented and designed, it could easily be seen as a 3rd level. Arguably, Forth is (can be) more abstract than C and almost a quasi-FP.


It's possible to implement high-level abstraction in machine code, or compile a high-level language into low level machine code, however it's not possible to hide low-level, machine dependent details in a low level language. It's not possible to hide machine opcodes in machine code, or CPU registers in assembler, or stack in Forth, so a developer must learn these things and deal with them, which makes the development process slower.

So, it's expected to see an order of magnitude improvement in development speed, on average, when jumping from machine code to asm, or from asm to a 3rd level language. Forth doesn't improve speed of development by an order of magnitude comparing to asm, because of the steep learning curve and low level stack management. I tried to learn Forth multiple times, but still cannot program in Forth freely, which makes it unique among 20+ other programming languages I know.


IMHO Forth has a longer learning curve than other languages and requires a significant mental model shift if you are use to conventional languages. It does not suit everyone. It seems more like learning a new human language with a large number of words.

Chuck Moore created Forth to improves his productivity versus conventional tools and there is some anecdotal evidence that it worked for him and those that worked with him, particularly where direct hardware control on new hardware could be interrogated and validated via the interpreter rather than an Edit/Assemble/Link/Test loop.


Which layer is an embedded Lua script controlling execution of a Java program running on the JVM within Rosetta2 and then getting translated into ARM microcode within the M1 CPU?


M1 doesn’t have microcode; that’s an AMD/Intel implementation detail.


I don't know about layer, but according to this book[1], it is a fourth-generation language. I've never associated it with RPG and FoxPro, but I guess you learn something new every day.

[1] - https://www.discoverbooks.com/Forth-The-Fourth-Generation-La...


Good take - defining a "low-level language" feels like trying to define a "cold temperature". It's all very relative.

We could say that a temperature is "freezing" if it is under 0C, and "boiling" if it is over 100C. But it's hard to nail down "cold" or "warm", as any group sharing a thermostat will tell you.

You can easily place arbitrary assembly commands or memory values in C code. Plenty of embedded codebases are littered with asm blocks in performance-critical areas. It's hard to see it as a high-level language in the age of javascript/python, but maybe the overton window has shifted.


Yeah IDK.

I know in college, a lot of my profs referred to C as a 'mid level' language (or by others as a 'systems' language), C# 1.2 and Java were 'high level'. They really only considered ASM as a low level, but that was almost 20 years ago.


This one caught my eye:

> for example, you must be able to compare two structs using a type-oblivious comparison (e.g., memcmp), so a copy of a struct must retain its padding.

This definitely doesn't work in the real world because the padding bytes will contain random junk which isn't copied along in some situations, depending on the compiler and optimization level.

Also, IMHO what the article calls "low level" might be important for compiler writers, but isn't really all that relevant for most programmers when the comparison is to "high level" languages like Java, C#, Javascript or Python.

In my mind, the most important property of a low level langauge is to provide explicit control over how data is layed out in memory, this is usually an afterthought in high-level languages, if possible at all.

Or more generally: how much explicit control does the language allow before the programmer hits the "manual optimization wall". In that sense C is fairly high level, especially without non-standard extensions, but still much lower level than most other programming languages. I think there is definitely room for more experimental languages between C and assembly.


> In that sense C is fairly high level, especially without non-standard extensions, but still much lower level than most other programming languages.

In practice you are using gnu c etc. The Stackoverflow consensus that you are supposed to stick to the standard no matter the complexities of the workarounds is rather annoying and seem to be quite prevalent.


The reason for sticking to standard is when inevitably someone will try to compile it with another compiler. It’s far easier just to stick to the spec instead of fixing non standard code.

For some reason it has been me several times doing the fixing as original dev thought it’s completely fine. I hope to save others the pain.


IMHO the only practical solution to this problem is to compile and test the code at least on the popular compilers (e.g. GCC, Clang and MSVC), which is quite easy today with CI services like GH Actions.

The standard also often doesn't tell what features are actually supported by different compilers (e.g. MSVC famously will never be C99 compliant, but eventually - really soon now! - C11 compliant) - so the language standard is in reality more like a recommendation of what features are more likely to work across compilers than others.

Finally, even standard compliant code may still trigger a lot of warnings, and those warnings are different on different compilers or different versions of the same compiler, so testing on different compilers is needed anyway to cleanup warnings.


It's not an either-or. Write standard-compliant code as much as possible, and test it with real-world compilers (and work around their quirks or limitations if needed). Somebody trying to compile it 20 years later will thank you.


MSVC is already C17 compliant, minus optional annexes.


C's pre/post-increment is from B which was designed for an older arch, not PDP, and B still had the pre/post increment and decrement. It's now a well known misconception that C was based on the PDP's instructions, it doesn't hold water with the facts. C did redesign for one thing on the PDP and that's byte addressing... and we're still using byte addressing most places, and frankly if you don't then C isn't incapable with that.


> and we're still using byte addressing most places

Which is a masquerade in itself, as in the silicon, addressing works at the memory line level.


Yet pointers are byte-granular. A more interesting question is rather, why are pointers no bit-granular by now, it would just "waste" 3 bits, and 2.3 exabytes ought to be enough for anybody ;)


The PDP-10 has variable width byte pointers, but 8 bit bytes are what caught on.


The cpu bus is what’s important. Bus addressing us at the byte level


On modern CPU, byte-level addressing is but smoke and mirrors for what is a fetch of a line-sized memory chunk followed by a mask-and-shift.


Not at the ISA level. Programmers don’t have access to the smoke and mirrors besides some cache management stuff


Depends on the cpu specific, and a lot of non-x86 designs made it explicit that memory is not byte addressable without special steps.

This makes those architectures incompatible with latest C and C++ though, because they require simultaneous, concurrent, atomic access to bytes at a time


The most important non x86 arch is aarch64 which is in fact byte addressable (more so than x86)! So is mips64.


Using c on a word machine is just a bad idea.


But x86_64 ISA is smoke and mirrors; proof is the large increase in performances that can obtained when one is aware of these tricks.


The Problem is not with C alone however.

Speculative Execution, branch prediction and look ahead to the next 25 instructions, wasting huge amounts of power,...I mean, what?

The article even mentions that there is another way:

> In contrast, GPUs achieve very high performance without any of this logic, at the expense of requiring explicitly parallel programs.

Yes, C doesn't support that very well. But it could, and other languages, namely Rust, Go & Julia already can. So maybe it's time to do in CPU design what Go did in language design, and hit the brakes on complexity?

We don't need smarter processors, we need processors that can do high throughput on many execution units, and languages that support that well.


There are already processors that are highly parallel, high throughput execution. GPUs as you point out. The reason they haven't replaced CPUs is not laziness, it's that many problems are not trivially parallel. Today's mix of CPUs that are fast at serial execution plus multiple cores, SIMD, and GPUs seems to give a good mix of flexibility to program for.


Rust and Go don't support GPU-style parallelism natively (I guess Julia probably does). You wouldn't even want that for 99% of programs. It's only useful for big mathsy tensor operations with very little flow control.

Most programs are not like that at all.


> we need processors that can do high throughput on many execution units, and languages that support that well.

Language support is very difficult for this, for a whole bunch of reasons - and it often requires redesigning the entire program and its data structures. It is still the case that most code the end-user is waiting for is JITted Javascript, which is why Apple focused so much effort into making that fast. Which is forced to be single-threaded. Hence all the big/little CPU designs; you get one or two high-speed high-power cores, and some low-speed low-power cores.


    go doSomethingWith(x)

    Threads.@threads for x = 1:42

    thread::spawn    
How is that difficult? The problem is that people are taught that concurrency and the capacity for parallell execution is somehow difficult. It really isn't.

> It is still the case that most code the end-user is waiting for is JITted Javascript

That's a problem with JavaScript, not with language design. JS is simply not a very good language, and its lack of support of parallell processing is just one of its many problems.


It's difficult for several reasons, and you've identified one of them: people aren't taught how to write code that can take advantage of concurrency. Except…this includes you.

I work as a performance engineer, and a lot of my job is actually undoing concurrency written by people who do it incorrectly and create problems worse than they could've ever have without it. People will farm work out to a bunch of threads, except they'll have the work units be so small that the synchronization overhead is an order of magnitude more than the actual work being done. They'll create a thread pool to execute their work and forget to cap its size, or use an inappropriate spawning heuristic, and cause a thread explosion. They'll struggle mightily to apply concurrency to a problem that doesn't parallelize trivially, due to involved data dependencies, and write complex code with subtle bugs in it.

Writing concurrent code is hard. In general, nobody actually wants concurrency*, it's just a thing we deal with because single-threaded performance has stopped advancing as fast as we'd want it to. As an industry we're slowly getting more familiar with it, and providing better primitives to harness it safely and efficiently, but the overall effort is a whole lot harder than just slapping some sort of concurrent for loop around every problem.

*Except for some very rare exceptions that cannot be time shared


This. As well as actually partitioning the problem. Things like DOM updates tend to be bad for this because the way they're specified requires you to use the result of a previous computation.

No amount of concurrency will save you from memory bandwidth problems, and can quite often make them worse.


> the synchronization overhead is an order of magnitude more than the actual work being done.

Yes, this is a common pitfall of concurrent programming applied incorrectly. I am aware of this. I am also aware that this is something that can be measured in an appropriate benchmarks.

Just like the thread-explosion problem of unchecked worker-pools, it isn't solved, but made alot easier to handle, by baking the capability to map logical execution threads onto OS threads right into the language.

> Writing concurrent code is hard.

Writing really good concurrent code is hard. But not harder than writing good abstractions, or writing performant code, or writing maintainable code.


Your program does nothing! It’s the reduction stage that causes the bottleneck


Please no :( unparallelizable tasks are already slow enough. My guitar effects rack (where effects all have to be processed one after each other, by design) uses barely less CPU% in a 2020 computer than in a 2011 one and it is extremely frustrating.


> uses barely less CPU% in a 2020 computer than in a 2011 one and it is extremely frustrating.

But if that's the case, what's the point in all that extra-complexity in the CPU, if in the end the benefits seem to be miniscule?


> But if that's the case, what's the point in all that extra-complexity in the CPU, if in the end the benefits seem to be miniscule?

They aren't. 10 years ago, single thread performance was achieved by upping the core frequency. That trend died when it hit physical limitations and we're stuck with 4-5 GHz ever since. In order to get more performance, all these tricks (caches, speculative execution, data-parallelism, etc.) had to be employed in addition to more cores.

In audio processing this means that a modern laptop can process more effects and tracks than a beefy workstation could in 2011. Sure, each single effect still taxes the CPU pretty bad; but in contrast to 2011 this means you can easily run dozens in parallel without breaking a sweat or endless fiddling with buffer sizes to keep latency and processing capability in balance.


all this "extra complexity", branch prediction, pipelines, multiple levels of cache, speculative execution was mostly there since the late 80s, 90s in CPU design ; the Pentium pro already had all of this to some level. The last decade was in large part about more SIMD and more cores and it's been a real PITA when your workflow does not benefit much from it because the state at t depends on the state at t-1. But the improvement of these features is definitely not negligible ; at the beginning of the SPECTRE / Meltdown / ... mitigations the loss of performance was double-digit big% in some cases.


This isn't really true. Micro-op caches are fairly new, branch predictors are massively improved, caches have gone from 1 level to 3, lots of operations have gotten way more efficient (64 bit division for example has gone from around 60 cycles to 30 cycles between 2012 and now). Out of order execution has also massively improved, which allows for major speed increases.


L3 caches have been in consumer Intel CPUs since 2008 and uop caches were already there in pentium 4 (released in 2000, almost 22 years ago :-)). Hardly new. Of course there are interesting iterative improvements, but nothing earth-shattering.


You might note that neither 2008 nor 2000 are the 1980s which was the time you previously referred to.


double-checked and L3 was actually also there in P4 in 2003 ; and P4 itself was in the works since 1998. For me that's closer to late 90s (which is also what I referred to) than today, that's almost as many years as there were between the two world wars...


> languages that support that well

Ada has had built-in tasking for a long time, and is now also getting a built-in parallel blocks and parallel loops (iterators and for) structures in Ada 2022.


It's really fun (?) to work with hardware designers, for whom even the registers that the code accesses is considered "high-level".

Abstractions, all the way down... even our digital bits are a joke to the analog designers.



One should always think of a C as high level Assembler. Not less, not more. Everything else (parallel programming, threading, ...), are "higher" level paradigms, where C "robustness" is more of an obstruction, not a help.

Imo, from hw designers perspective C is a language as "high" as it goes, when for a software engineers it's often "as low as it goes".


Have you RTFA?

Because that is exactly the point that the article refutes.


I am lost here, the mentioned bugs are a result of optimizations like speculative execution, branch prediction, prefetching etc.

These are language independent optimizations. For example, any language (that allow for loop like constructs) compiled to intel machine code and executed on intel processor will be exposed to these bugs, it is not C specific. Am I missing anything?


> any language (that allow for loop like constructs) compiled to intel machine code and executed on intel processor will be exposed to these bugs, it is not C specific.

Well, that's kind of the point, that Intel/x86 and most/all modern processors implement an abstract machine that's basically made for C, papering over the underlying instruction parallelism and absurdly complex memory model with all kinds of crazy front-end decoder business to allow the CPU to ingest that machine code and pretend like instructions are executed in order, with C-style control flow.

You could (in principle) prevent these sorts of bugs by creating a new kind of machine, but that machine would be incapable of running C software, at least efficiently. There are several ideas being alluded to here, another is the idea of directly exposing the underlying instruction level parallelism -- that's been attempted before in VLIW processors like Intel's Itanium chips. You could make the argument a big part of their problem was at the compiler level, trying to map C-style semantics to the CPU, while maybe a different language/compiler would have extracted more performance.

Trying to summarize the author's idea, modern CPUs have a lot of hidden potential behind a restrictive "virtual machine". If that layer were stripped clean, we could (the idea goes) get more performance and parallelism, and potentially more security, at the cost of compatibility/interoperability with legacy software.


> Intel/x86 and most/all modern processors implement an abstract machine that's basically made for C, papering over the underlying instruction parallelism and absurdly complex memory model with all kinds of crazy front-end decoder business to allow the CPU to ingest that machine code and pretend like instructions are executed in order, with C-style control flow.

The issue is that none of this has to do with C, really at all. C relies on the semantics exposed by the machine code. The ISA does not expose speculative execution or pipelining. C, and asm, and literally all software conforms to the ISA because that's all there is.

Calling out C specifically is just clickbait, IMO. The author makes a great point about how the x86 ISA may not be a great abstraction for modern CPUs.


As mentioned under the older thread of this same article, many hardware APIs suffer from having to provide a C-compatible interface/memory model. I remember reading that a particular GPU’s elegant memory model was butchered so that C-programmers could do something with it? My memory is hazy on the details though.


> I am lost here, the mentioned bugs are a result of optimizations like speculative execution, branch prediction, prefetching etc.

>

> These are language independent optimizations. For example, any language (that allow for loop like constructs) compiled to intel machine code and executed on intel processor will be exposed to these bugs, it is not C specific. Am I missing anything?

1. There's machine code that is exposed to the ISA (public machine code that compilers generate) and there's machine code that exists and is used but is not exposed.

2. The author is making the argument that the machine code that is exposed is designed around the memory model of the C programming language, which itself was designed around the memory model of the PDP.

Put the above two together and (if you squint really hard, and ignore things like logic and reason) the conclusion is that the modern x86/X64 ISA is suboptimal because of the PDP.

The actual reality is that all the popular programming languages are imperative, have the concepts of stack, heap and in-order execution of instructions.

Because all languages appear to converge on the same basic concepts in order to be commonly accepted, I think that it is doubtful that any alternative machine and memory model would have arose in the absence of a language like C or a machine like the PDP.

I think this because of the existence of other languages that offer alternative machine and/or memory models, and those languages have existed for decades without being popular.


> The actual reality is that all the popular programming languages are imperative, have the concepts of stack, heap and in-order execution of instructions. Because all languages appear to converge on the same basic concepts in order to be commonly accepted, I think that it is doubtful that any alternative machine and memory model would have arose in the absence of a language like C or a machine like the PDP.

But as we can see, this model could not keep up with performance improvements so much more complexity got implemented beneath the surface of the old model. The author’s point is to be aware of the mismatch here, and that perhaps we should stop believing the “lies” what C tells us.

I personally believe that we would be much better off with lower level instructions exposed to us, and putting the complexity in software. That way CPU vulnerabilities could be patched, and I believe we could create much better optimizations, and faster CPU design iterations.


At least some of this complexity and abstraction layer provided by the hardware is done so that the same binary can run on multiple different implementations, though. If you expose low level instructions corresponding to the specific CPU implementation then you lose "this app runs on all these Android phones" and also "this process can migrate between CPUs in a big/little setup", which would be unfortunate.


Well, I meant it more in terms of x86 to microcode JIT compilers, but in software. So even existing code can potentially be run in exactly the same way they do know, but instead of cumbersome hardware pipelines, these could be done entirely in software where the complexity ceiling is perhaps somewhat higher. This JIT compiler could do the same “magic” what current CPUs do, reorder, branch predict, etc and even more, while in case of a bug those can be fixed without buying a new processor.


Ah, so a Transmeta style approach? That's certainly feasible, in the sense that their technology worked, but I think it would be tricky at best to match the performance of the more standard do-it-in-hardware approach.


> I personally believe that we would be much better off with lower level instructions exposed to us, and putting the complexity in software

There have been several initiatives that sound vaguely like that, none of which actually worked out commercially. Principally Itanium. At the same time, there is no question that it's possible to gain a lot in performance if you're both willing and able to use a programming environment like CUDA.

It seems to me that the article never actually articulates an alternative in enough detail to take seriously. It doesn't seem to make any falsifiable claim.

In my view the most plausible explanation for why things in this area look the way they look was articulated in DJB's "The Death of Optimizing Compilers" talk. I'm not surprised that this ACM piece was written by somebody that works on optimizing compilers. Perhaps that shouldn't be relevant, but I can't help but suspect that it is.


The bugs are not caused by speculative execution.

The bugs are caused by programmers believing they have control over memory addresses and registers in the CPU only because they are hardcore programmers writing in “low level C”. So when they peek behind the abstraction, the ship now has two captains, one being the programmer and one being the compiler. When both of them hit the gas at the same time all the undefined behavior bugs appear.

A bit extreme analogy but you could compare it to writing code without mutex locks because code will only be executed in single thread, then go multithreading anyway. On an old cpu and old compiler it will work, but once you rev up it will crash and burn. In the case of C, the language spec always catered for this possibility, but many programmers thought they were smarter than the compiler leading to todays bugs.

Other languages neither expose the control nor the temptation to insert such sticks into the machinery. Both sides of the abstraction have a much more mutual understanding of where the border of the language ends. With taller guardrails in place to prevent stepping over it.


> The bugs are not caused by speculative execution.

Actually, they are.

It seems like you're arguing against the existence of any kind of low-level hardware/software interface. C is strongly associated with that interface, but it isn't the same thing.


If hardware is significantly more advanced, but C is largely the same as it was in PDP-11 days, then C now is relatively speaking less able to control the low level compared to back then.

C has decent escape hatches to direct machine control via a simple ABI and even inline assembly, but the behavior of C code itself is arguably under-specified on modern hardware.


Being under-specified towards the hardware is a feature, to allow optimization.

Now I’m not arguing all the UB in C is great, in fact the opposite. But that should be prohibited on the language layer and not on the machine layer. See rust for a better solution.


100% specification would be a mis-feature. But you can't easily use the C language to tell the compiler that your ternary assignment has a 50/50 probability for each branch, meaning it probably should issue an instruction like cmov on x86. You can use non-standard builtins to say it's 99/1 so it should just branch predict, and those builtins are so ubiquitous it's not such a big deal, but still not technically possible with just the standard C language.

In a PDP-11 world, there was no need for such things. In the superscalar predictive world of today, there are a lot of such things you need to specify with non-standard builtins and inline assembly. C gets the job done, but basically ever since the Pentium, it has relied more and more on those non-standard escape hatches, in order to be the best portable assembler. It's still the best compared to other languages, but in absolute terms, it is less good at that task with every hardware generation. Rust doesn't improve on this by the way.

That's the obvious stuff. Compiler writers using UB footguns as one of the most powerful optimization tools is another problem on top of that. You might have to use signed indices into some bytes in order to let the compiler assume your index increment has no overflow (that would be UB), again just to be able to issue cmovs. That's an awkwardly indirect way to do that. Arguably C would be better off specifying addition operators or inline functions for unsigned integers that the programmer promises will never overflow. I don't use Rust, but my understanding is all integer types panic on overflow in debug, and wrap in release -- at least there is a non-stable unchecked_add.


Now I see what you mean by under-specified in the case of branch hints. But does the solution need to involve giving more direct machine access? Similar benefits can be given with higher layer abstractions as well, without adding more UB, in fact the examples below remove UB. Take restrict keyword in C, which theoretically could be automatically inferred in Rust (not sure if they actually do nowadays). Iterators, or as you say better addition operators, can hide the details of array indexing and overflows. Hinting the probability of a certain branch certainly sounds like a higher layer construct as well, not something to expose a direct machine instruction for.


I’m not proposing machine specific additions to C. One of my complaints is that the C language doesn’t have enough features, so I end up using machine specific assembly or compiler specific builtins. And I’m complaining about C pushing developers onto the knife edge of UB or performant code — I don’t want more UB either.

I want higher level constructs in the C language (restrict isn’t a great feature, but it’s high level, so like that) that can map to the actual feature set found on actual machines since 2004 (and compiler writers can do that mapping per machine). But C is stuck with a simplified model of the machine that ignores what almost all hardware can do these days.

I think I’ll always have to dip into assembly/intrinsics sometimes, so I’m not looking for super advanced/rich features. Actually, I think the real benefit would be giving compiler writers ways to improve performance without pushing devs onto the UB knife edge.


Sounds like he's saying that a low level language without invisible abstractions no longer exists for modern CPUs. Thus the calls to rethink both the hardware and language.


Maybe this doesn't matter so much. Remember, the cpu may be re-writing your code and re-ordering many operations. The latest Intel processors can examine your code and write new microcode to do it more efficiently. Heck, even identifying a fault to an instruction in 'your' code has become problematic!


"Assembly isn't a Low-Level Language: Your heterogeneous multicore Apple M1 with integrated GPU and multi-tier SRAM cache isn't a 1970s PDP-11 minicomputer, you can't just MOV RAX, EAX and expect it to work, you complete and utter jackwagon!"


> C Is Not a Low-level Language Anymore

Pretty sure C was always considered a high-level language.


"High" and "low" are relative adjectives. The meaning of "high-level language" has changed. C was considered a high-level language, and it is now considered a low-level language. Not by everyone, of course. There are still a pair of definitions for "high-level language" and "low-level language" that draw the line right above assembly. I won't say "nobody" uses those definitions anymore; lots of people learned them and many still use them. I will say that it is pointless to act as if those are the only definitions of the terms anymore.


I think I'm confident in saying that K&R saw C as lower-level language. As you say, it is relative [1]. I just don't think enough people considered C to be a high-level language (given that Smalltalk, APL, and Lisp were about) to make your broader characterization that "C was considered a high-level language."

Here's my reasoning:

From "The C Programming Language" book (1st. ed., 1978) at https://archive.org/details/TheCProgrammingLanguageFirstEdit... we can read 'C is not a "very high level language"' (p. ix) and 'C is a relatively "low level" language' (p. 1).[1]

They describe what "low level" means to them: "This characterization is not pejorative; it simply means that C deals with the same sort of objects that most computers do. namely characters, numbers, and addresses."

And on page 2 we see how they don't regard C as the lowest level: "Of 13000 lines of system code, only about 800 lines at the very lowest level are in assembler."

I also found "The C Programming Language" paper in The Bell System Technical Journal (1978) saying "All three languages [BCPL, B, and C] are rather low-level", at https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6770408... .

Now to the [1], I found the paper "Implementing LISP in a high‐level language" from 1977, where that high-level language is BCPL, which is a precursor to C, so clearly a good number of people at the time would have considered C a high-level language, at the very least in the context of developing a Lisp.


K&R are not the last word on this. They made their comment in 1978, and now it's 2021, and computing is very different.

"Characters, numbers, and addresses" are very much not what CPUs deal with internally today. Most languages no longer reference addresses directly, and "characters and numbers" live behind abstractions of their own.

The point is that C assumes a certain model of computing that was baked into both hardware and software from the late 70s onwards. That model has been superseded, but hardware and software still lose a lot of cycles emulating it. The claim is that this is both inefficient and unnecessary.

But the advantage of the C model is that it's simple, comprehensible, and general.

If you expose more of what goes on inside a modern CPU, programming becomes more difficult. If you build a CPU optimised for some specific other language abstractions you bake other assumptions and compromises into the hardware, and other languages become less efficient.

So if you want to replace the C model you'd first have to define an industry standard for - say - highly parallel languages with object orientation. That is not a small or simple project. And previous attempts to tie hardware to more abstract languages haven't ended well.

So C persists not because it's high or low level, but because it's general in a way that other potential abstractions aren't.

This is not to say that alternatives couldn't be both more general and more performant. It's more a reminder that designing performant alternatives is harder than it looks, and this is not a solved problem.

My guess (FWIW) is that nothing credible will emerge until radically new technologies become more obviously better for general purpose computing - whatever that looks like - than current models.


> K&R are not the last word on this. They made their comment in 1978, and now it's 2021, and computing is very different.

Yes, but K&R back then are relevant to refuting GP's contention that

>> C was always considered a high-level language.


Thank you for clarifying my intent!


People don't call C high-level, but I also don't see people call it low-level even in a casual setting with newer programmers. Instead I see it called a system-level language.


Kernighan and Ritchie referred to C as a 'relatively "low level" language' in their 1978 book.


sounds like system language is a nice way to say "relatively low level" then?


K&R write:

> It has been closely associated with the UNIX system, since it was developed on that system, and since UNIX and its software are written in C. The language, however, is not tied to any one operating system or machine; and although it has been called a “system programming language” because it is useful for writing operating systems, it has been used equally well to write major numerical, text processing, and data-base programs.

https://archive.org/details/TheCProgrammingLanguageFirstEdit...


So it sounds like they're endorsing that concept, and just don't want it to be a limiting term in terms of what people expect in regards to portability and scope?

"System level" does that just fine right? There's not much confusion about if C is tied to Unix anymore after all...


I really don't understand the aim of your inquiry.

Yes, I see it called a systems programming language.

But unlike you, I see people call it low-level. The easiest counter-example was to point to K&R, which was my textbook in college. (Yes, pre-ANSI). And there are many people who still say that, as I found in a quick Google Scholar search:

] Although the Java platform has been used as a multi-language platform, most of the low-level languages (such as C, Fortran, and C++) - (2016) https://dl.acm.org/doi/pdf/10.1145/2998415.2998416?casa_toke...

] Lifting these restrictions is primar-ily motivated by our desire to target low-level languages, such as C with pthreads - (2011) https://dl.acm.org/doi/pdf/10.1145/1929553.1929558?casa_toke...

] Use-after-free vulnerabilities have plagued software written in low-level languages, such as C and C++, - (2020) https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9152661...

Now that you've seen people call it low-level, you can't truthfully write a comment like you did at https://news.ycombinator.com/item?id=29710906 .


No, I can and I will, and it will still be truthful.

Because to most people with a firm understanding of the nature of English my statement means:

"When I, in current times not 40 years ago, hear people talk about C, they most often refer to it as a systems level language".

-

Of course, I forget this is HN and there are some people who think that it means:

I have never seen "C" and "low level" on the same line of text!

For these unfortunate cases, there is a belief that 40 year old K&R references (...) and some hastily assembled search results will change my reality... but that's a separate issue I'm not interested in.

Those people are definitely free to consider me a liar, the world will keep spinning for the rest of us.


I don't like responding to anecdotes with anecdotes, so rather than reply with a (IMO pointless) "what?! I hear people talk about C as a low-level language far more often than I hear them talk about it as a systems language", I prefer to give something more substantial.

Restricting my hastily assembled search to HN, I easily find comments from within the last few months referring to C as a low-level language ... and yes, as a systems language too.

HN is a casual setting with newer programmers.


C is the lowest-level language in common use, short of assembly language.

C lacks any high level abstractions. All the abstractions it does offer are hiding register and stack slot assignments (as local variables), code entry point addresses (as function names), ALU instructions (as operators), branching (as control-flow statements), and address arithmetic (as pointer operations). All of these are low, machine-level abstractions.

C was never a high-level language, even from its first day. It was specifically intended as a portable assembly language, by someone used to coding assembly language, to use porting an OS coded in assembly language to a new target host.


> C lacks any high level abstractions. All the abstractions it does offer are [...]

By that standard, practically all languages lack high level abstractions. Garbage collection? Hides pointer chasing and marking of memory locations. Dynamic dispatch? Hides a pointer to a function table. Functional programming? Hides pointers to closures. Closures? Hide pointers to data and function pointers.


And structures, unions, pseudo-meta programming via the macro processors, no exposure to IO unless on a CPU with MMIO.

JOVIAL and Algol dialects were also designed for creating OSes and no one calls them low level.


> And structures, unions, pseudo-meta programming via the macro processors,

Those exists in macro assemblers, for they are extremely thin abstractions, no thicker than jumping to a label instead of jumping to an absolute or relative address.

> no exposure to IO unless on a CPU with MMIO.

Well, since not all processors have I/O instructions (or dedicated I/O pins), the easiest way to implement portability is simply to not provide a direct access to them in the language, and let library functions handle it.


> Those exists in macro assemblers, for they are extremely thin abstractions, no thicker than jumping to a label instead of jumping to an absolute or relative address.

They’re much more than that because of type aliasing, which is what lets you write -> . = operations all day without each one literally being a memory access in asm.


Which kind of proves the point C doesn't provide all the necessary capabilities for a systems programming language.

As for macro Assemblers, IBM i one supports OOP constructs, so are OOP languages now low level?


Also functions, loops, conditional statements, and arrays are considered abstractions.


>C lacks any high level abstractions

macros


Looks like C being Low-level or not needs a precise or formal definition of low-level language. Without that, it falls down to opinion.


I am wondering if this excellent essay has surfaced again because I just tweeted it in reply to a popular Twitter account.

I dispute your conclusion, and this is the reasoning, as expressed in the article:

"High level" is tricky to define, because high is a relative assessment. But "low level" has a clear, agreed meaning: relatively similar to the machine's instruction set and architecture; close to the metal; comparable to assembly language.

And the point of the article is that C is close to the architecture of a PDP-11. Modern CPUs are nothing like the PDP-11 and haven't been for a third of a century or more. C models the architecture of a 1970s minicomputer, and 21st century computers are nothing like 1970s minis – they just run similar OSes.

If C is not close to the real architecture, then it's not low level.

The fact that there's nothing mainstream which is closer is irrelevant. The C programming model is nothing like modern multicore SIMD superscalar 64-bit CPUs with out-of-order execution, branch prediction etc.

If it isn't close to the metal, then it isn't low-level. C is neither any more. QED.


Pretty sure K&R calls C a high level language. Just a matter of perspective, innit: a high level language to assembly programmers, a low level language to .NET/JVM/web programmers, and prob something like a lower-mid-level language to someone looking at the whole tower from a distance


This article is my go to url when arguing on the internet with people who are under the illusion that "C is portable assembler" when it really isn't and their mental model is actually not what a computer actually does today (or has performed in the past few decades).


Another way is to ask them to describe how to perform certain systems programming tasks with the restriction that it was to compile under pure ISO C mode.

The answer will always be external Assembly.


As hinted by the article LLVM IR [1] is a lower level language and yet it's only intermediate as per the i in IR.

And it's true that the actor model makes writing parallel programs easier. I tend to use queues and message passing when I write multi threaded programs in sequential languages like Python or Ruby. That's easier to do in a language like Elixir. Unfortunately when I work with Elixir I'm a little discouraged by all the boilerplate needed to make supervisors and GenServers work. I think there is a lot of room for improvement for a higher level language that makes most of that disappear.

[1] https://llvm.org/docs/LangRef.html#instruction-reference


I'm not sure if it's the appropriate solution but have you looked at higher level abstractions in elixir, like Flow, Broadway, GenStage?


> In C, a read from an uninitialized variable is an unspecified value and is allowed to be any value each time it is read. This is important, because it allows behavior such as lazy recycling of pages: for example, on FreeBSD the malloc implementation informs the operating system that pages are currently unused, and the operating system uses the first write to a page as the hint that this is no longer true. A read to newly malloced memory may initially read the old value; then the operating system may reuse theunderlying physical page; and then on the next write to a different location in the page replace it with a newly zeroed page. The second read from the same location will then give a zero value.

What? What is the benifit of such behaviour?

What does other OSes do in this regard?


Lots of programs malloc a lot of memory, and do nothing with it for a while. This allows the os to wait for a low load time to handle memory allocation.


Is there much benefit from delaying the zeroing to first-write rather than first-read (which I think is effectively where Linux does it)?


Probably not... but both ways are fine.


I think the article is just incorrect and every system uses allocation on first read.


Effing great article. I am starting down the path of learning rust and wondering how its mutable-first design and ownership model alleviate some of the problems identified in making C fast on contemporary machines.


Thanks for this interesting read.

Say a platform superseded that C platform the article is describing, allowing for an unleash of power and parallelism in computing, what are the implications for Linux and other operating systems?


I doubt that outside of special cases this would actually provide significant benefits.

Suitable languages already exist (see Erlang), but the fact of the matter is that a vast array of problems don't benefit from parallelism or aren't parallelisable at all. Not to mention that a good chunk of the benefits would be negated by the increase in coordination and synchronisation between parallel tasks.


I have an old button that says “C combines the power of assembly language with the flexibility of assembly language.” Might have changed now, but it was a humorous yet valid observation then.


Is there a language that exposes the stuff below assembly? Like assembly language presents a nice sequential ordered view of processors as if they execute instructions 1 by 1, but we know the reality is quite different - each instruction is many micro instructions, they compute data dependencies and parallelize accordingly, pipeline & predict heavily etc. We do so much to fit async data dependent computation into sequential models, only for that to be translated back into async/parallel by the CPU. I wonder what that kind of PL would look like?


perhaps HDLs like (system)verilog and vhdl? then you truly are interacting with the architecture and you care much more about things like timing, clock cycles, etc.


> which attempt to hide latency

It does exceptionally good job at these attempts. If you think manually managed caches are fun, read [1] for an illustration what amount of efforts is required to sum an array for an architecture where on-chip RAM is manually managed. Another interesting case was Cell CPUs in PS3, I don't have hands on experience but I've read that it was equally hard to develop for.

> A low-level language for such processors would have native vector types of arbitrary lengths.

A low-level language would have native vector types of exactly the same lengths as underlying hardware. "arbitrary" is overkill unless the CPU supports arbitrary-length vectors.

Despite not specified as a part of language standard, all modern C and C++ implementations support these things. Specifically, when compiling into AMD64 instructions, the compilers implement native vector types, and vector intrinsics, defined by Intel. Same with NEON, all modern compilers implementing what's written by ARM.

> you must be able to compare two structs using a type-oblivious comparison (e.g., memcmp)

Using memcmp on structures is not necessarily a great idea, these padding bytes can be random garbage, it's not specified.

> with enough high-level parallelism, you can suspend the threads.. The problem with such designs is that C programs tend to have few busy threads.

Not just C programs. User input is serial, it can only interact with 1 application at a time. Display output is serial, it delivers a sequence of frames at 60Hz. Web browsers tend to have few busy threads because JavaScript is single threaded, also streaming parsers/decompressors/decryptors are not parallelizable.

> ARM's SVE (Scalar Vector Extensions)—and similar work from Berkeley—provides another glimpse at a better interface between program and hardware.

Just because it's different does not automatically make it better. The main problem with scalable vectors, it seems to be designed for problems CPUs no longer solving. For massively parallelizable vertical-only FP32 and FP64 math, GPGPU is the way to go, an order of magnitude faster while also being much more power efficient. CPU SIMD is used for more than vertical-only math. One thing is non-vertical operations i.e. shuffles, trivial use case: transpose a 4x4 matrix in 4 registers. Another one is operations on very small vectors, CPUs even have a DPPS instruction for FP32 dot product. For both use cases, scalable vectors make little sense.

> a garbage collector becomes a very simple state machine that is trivial to implement in hardware

People tried that a few times, first with Lisp, then with Java chips. General-purpose CPUs were better.

> Running C code on such a system would be problematic, so, given the large amount of legacy C code in the world, it would not likely be a commercial success.

nVidia did precisely that, made a processor designed purely for compute speed. I wouldn't call them a commercial failure.

[1] https://www.nvidia.com/content/GTC-2010/pdfs/2260_GTC2010.pd...


This article could have been 1/8th of its length and the message would still be clear. Modern writing is vague, overly detailed and inelegant.


Not really; as evidence, look at the number of comments here that failed to get the point.

When you present a viewpoint to people whose profession and living depend on believing something contrary to it, it's necessary to present a lot of evidence and solid reasoning to get even a small number of the smartest of such people to consider your point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: