Hacker News new | past | comments | ask | show | jobs | submit login
C Is Not a Low-level Language (2018) (acm.org)
343 points by goranmoomin on Dec 27, 2019 | hide | past | favorite | 167 comments



Intel and HP attempted to deliver what this paper advocates. They designed an entirely new processing element and compiler architecture that promised to deliver high performance with less complexity. It was called EPIC and manifested as Itanium. Billions of dollars and the best minds at the disposal of an industry wide consortium couldn't make it work; one year ago the last Itanium shipped. The market has spoken.

You can run unmodified, compiled OS/360 code from the 1960s on a z15 machine built this year. The market values the tools and code it has invested in far more than any idealized computing model you care to speculate about.

The flaws in contemporary CPUs that device manufacturers perpetrated on their customers for almost 20 years are not the fault of C and its users. They are the fault of reckless manufacturers that squandered their reputation in the name of performance and, ironically, helped perpetuate the lack of innovation in programming techniques called out in this paper.


So true. The Spectre and Meltdown flaws have nothing to do with the abstract machine that a processor implements. You might as well claim that the vulnerabilities are the result of the ISA (which itself was born to smooth over differences among machines). Rather, these vulnerabilities are the case of a confused deputy. There is nothing to say you should not speculate, just do not do so across security boundaries leaving breadcrumbs for an adversary to discover.


Though nowadays if AWS or Azure or GCP finds a way to run some proprietary database more efficiently using a new kind of language and cpu architecture, the scale there somewhat changes the playing field.


Sure, if they find.

The issue with Itanium was not that market didn’t care of superior performance. The issue was that Itanium is slow. Market didn’t care for theoretical beauty over actual performance.

It is not the first time Intel does something similar. In the 80’s they tried to introduce an ”object oriented” processor. It was ridiculously slow and no-one cared about it.


I'm not in the field, but I'd wager we're due for a shift in the current software-hardware relationship. With the end of Moore's law continuing to improve practical computing performance will requisite making more efficient use of transistors. Or maybe processors will go 3D and we'll have a few decades of relative status quo - I'm hardly Nostradamus


Plus even at AWS scale, I imagine HW homogeneity of the fleet is more important than efficiency of a single application, unless it's insanely more efficient for a top 10 service. Homogeneity improves predictability, load balancing, operations, security, and all sorts of other things.


On the other hand, AWS offers AMD machines and ARM machines in addition to the typical Intel stuff. If there was some new architecture that offered a 50% speedup, and all you had to do was recompile your code, I assume they'd support it in a heartbeat.


In the context of the article, the changes would by necessity be far more invasive than "recompile your code". The premise is that C cannot be an effective implementation language. A full rewrite would be required.


They also have GPUs.


Not to forget the i860 and Pentium 4. Intel really has a great track record of total failures.


> The market has spoken.

The market “demands” cheap, turnkey, easily replaceable programmers who don’t really know what they’re doing, and justifies this with “I made a website last weekend, programming is easy, you’re just pretending its hard to keep out competition”. Until software engineering is treated as actual professional engineering, time, money and resources will continue to be wasted frivolously.


Having previously worked in a “professionalized” field (architecture) before a career change to software engineering - please no.

Having a credentialing body that has to review and certify you just kills wages and innovation, and makes it take an outrageously long time to enter the field. But, more importantly, it will not prevent bad programmers from existing and shipping bad software.

Further, software engineering most assuredly is treated as actual professional engineering in most of the industry. That there are many companies that don’t treat it that way just means there are a lot of poorly led or non-software companies out there, which is no different than any other field.


Informatics Engineering is a professionalized field in some countries.

What happens in Portugal is that although most people tend not to do the admission exam, all universities that give engineering degrees (including in computing) need to be certified by the order, and signing legal contracts for projects as Engineer does require the exam approval.


Did you know that there was a PE exam for software engineering?

It was discontinued because less than 100 people applied for it over the course of several years.


No one took the exam because it was effectively impossible.

To become a PE, first the candidate has to pass one of the Fundamentals of Engineering exam to become an engineer-in-training. Except, whoops, there wasn't ever a software specific FE exam; the most relevant one is the EE/Comp. E. exam. Take a look at the list of topics: https://ncees.org/wp-content/uploads/FE-Ele-CBT-specs.pdf Most developers aren't going to pass that even with a CS degree.

Secondly, you need 4-8 years of supervision by a licensed engineer. Again, whoops, there are barely any software developers with a PE license, so who would they get to supervise them?

Only then do you get to take the PE exam for software engineering. Frankly, the situation was so absurd that one has to suspect that NSPE didn't want to certify software developers as PEs.


Country specific, others still have such exams in place.


There is a growth process in any profession. Nobody starts out as an expert just like nobody starts out as an adult. Wages to such are not a waste, but money well spent on training. There are degrees of skill, and jobs to match each level. And at the end of the day, the best quality education comes through real life experience.


How does it matter? What 'unqualified programmers' (in your speak) say has no bearing on the outcome/quality of actual software that is being created elsewhere by the 'professionals', right?

Unlike other professions, software is ambigously placed in between art and commerce. Perhaps, the only art form that drives commerce heavily compared to music/painting etc. Just like there are no certifying bodies for qualified painters/sculptors/musicians (even if they exist, they are not stopping actual artists creating and sharing stuff), creating a certifying body for software will not take anyone anywhere.


Is there movement towards professionalizing software engineering? What factors are for and what are against?

I like the idea of professionalizing but also understand it can be constraining to the labor market, but could increase “quality” of products?


Speaking for Portuguese market, it is not a movement, it already exists since about 30 years already.

https://pt.wikipedia.org/wiki/Engenharia_Inform%C3%A1tica#Co...

Quality of assurance of 5 year studies, with assessement of university study plans.


[flagged]


Indeed, to use the title "Engineer" one has to earn it, by taking an university degree in Informatics Engineering and having the respective assemment.


> Informatics Engineering

Coding isn't and never will be Engineering.


While they might be true to some extent, I argue there are lots of Engineering Principles that should be taught and required for coding, before call themselves Engineers.

Software Development right now has a little too much BroScience.


Got into programming because it was fun and interesting. I’ve watched what was “correct” change many times over the years. Or every few months months in JavaScript land.

Thought of being told what is correct is just wrong. It would crush so many people. So much innovation would be stifled.

Had a test question in college asking what is better, white on black or black on white for text. Knew I was wasting my time with “formal” education.


I agree. But it isn't being right or wrong as in other engineering discipline, where there are scientific evidence for those facts.

The principles I was thinking of is the act of trade offs, cost, performance, time and TCO. And it isn't about which set of trade offs is right, it is about knowingly and justify those trade offs. Throughput Vs Latency, CPU vs Memory etc. A lot of these seems to be learned while they are on the job and from mistakes. I wish there are more of these thoughts being taught.


If what one understands by coding is laying bricks and doing plumbing, surely.


The vast majority of programmers don't rise to the professionalism of a plumber.


One more reason not to be allowed to self entitle themselves as Engineer.


[flagged]


Unfortunately some people are happy to taint the profession.


> your gate-keeping

Yep, that's _exactly_ what the "I made a website last weekend, so programming isn't hard, programmers just pretend it's hard" types say.


> the lack of innovation in programming techniques called out in this paper.

What kind of innovation? Compiler optimizations? New language paradigms?

Am I right to think computers should adopt a more parallel architecture and design, and to expand on techs like OpenCL and CUDA, and generalize those technique to everything done by a computer, because we've hit a frequency limit?

We often see software being either sharded, distributed, load balanced, etc, but it seems that there is much more performance being possible if we start building chips that enforce the division of a task on the programmer. Of course it seems like it is a very big paradigm shift, which might be too expensive and complicated, but since bleeding edge techs like deep learning cannot be properly accomplished with traditional computers, I tend to believe the computer model of today is outdated.


> "Am I right to think computers should adopt a more parallel architecture and design, and to expand on techs like OpenCL and CUDA, and generalize those technique to everything done by a computer, because we've hit a frequency limit?"

https://en.wikipedia.org/wiki/Amdahl%27s_law

"...Amdahl's law is often used in parallel computing to predict the theoretical speedup when using multiple processors. For example, if a program needs 20 hours using a single processor core, and a particular part of the program which takes one hour to execute cannot be parallelized, while the remaining 19 hours (p = 0.95) of execution time can be parallelized, then regardless of how many processors are devoted to a parallelized execution of this program, the minimum execution time cannot be less than that critical one hour. Hence, the theoretical speedup is limited to at most 20 times {\displaystyle \left({\dfrac {1}{1-p}}=20\right)}{\displaystyle \left({\dfrac {1}{1-p}}=20\right)}. For this reason, parallel computing with many processors is useful only for highly parallelizable programs..."

So, in answer to your question, no.



In my field, computational chemistry, you are definitely right: when we have more parallelism available, we go for increased accuracy and scope, not reduced latency. The latency is set by human schedules (a coffee break, overnight, etc). So Amdahl's Law does not apply.


> Am I right to think computers should adopt a more parallel architecture and design, and to expand on techs like OpenCL and CUDA, and generalize those technique to everything done by a computer, because we've hit a frequency limit?

Intel Skylake has 8x execution ports with 6x way dispatch PER CORE. Typical code can achieve more than 1-instruction per clock tick, maybe 3 or 4 instructions/clock if you really work hard at your optimization. With 6x instructions/clock as the hard limit due to the uOp cache.

That's a normal CPU. Modern CPUs are incredibly parallel. They just "pretend" to be serial, so that the typical programmer doesn't have to think of those parallelization issues. A combination of compiler (aka: dependency cutting), and CPU (aka: Tomasulo's Algorithm) works together to achieve this parallelism (out-of-order, superscalar, pipelined).

----------

EPIC / VLIW are "leaky abstractions", which bleed the parallelism into the assembly language. But you don't get a lot of parallelism out of EPIC / VLIW, not compared to SIMD anyway.

So if you really have a huge amount of parallelism available, SIMD seems to be a superior methodology (and modern compilers can auto-vectorize loops when the compiler detects the parallelism).

I'm just not sure where EPIC / VLIW techniques come in handy. Its not parallel enough to compete with SIMD, but its still more complicated than traditional CPUs.


A lot of what is done in a computer is branching logic code, and that's not so easy to deploy on opencl or cuda which are to a first approximation arithmetic engines.


> Intel and HP attempted to deliver what this paper advocates. They designed an entirely new processing element and compiler architecture that promised to deliver high performance with less complexity. It was called EPIC and manifested as Itanium. Billions of dollars and the best minds at the disposal of an industry wide consortium couldn't make it work; one year ago the last Itanium shipped. The market has spoken.

Because SIMD seems to be easier in practice when you actually need performance.

Every high-performance application seems to dip into explicit SIMD-parallelism: H264 / H265 encoders, GPU-based deep learning... video game shaders and raytracing, etc. etc. All of which get more performance from SIMD than VLIW / EPIC.

I think the market has spoken: if you are aiming for explicit instruction level parallelism, why not go for 32-way parallelism per clock tick (NVidia Volta or AMD RDNA architectures) instead of just 3 or 4 parallel assembly statements (aka a "bundle") that EPIC / VLIW Itanium can do?

----------

Another note: normal CPUs these days are approximately 6-way parallel. The Skylake i9-9900k can execute 6-uops per clock tick from the uop cache (ex: 2x Load instructions, 2x Additions, 1x XOR, 1x 512-bit SIMD statement). In addition to pipelines, reorder buffers, and other such tricks to "extract" more parallelism from instruction streams.

EPIC / VLIW just happens to sit in an uncomfortable spot. Its "different enough" that it requires new compiler algorithms and new ways of thinking. But its not "dramatic enough" such that it creates huge parallelism like SIMD can easily represent.

Back in the 90s and 00s, it was probably assumed that SIMD-compute was too hard to program, while traditional CPUs couldn't scale very easily.

EPIC / VLIW was wrong on both counts. OpenCL and CUDA made SIMD far easier to program, while traditional CPUs became increasingly parallel. And that is the history IMO.


Intel and HP only failed, because AMD exists and was allowed to design chips that provided an alternative to Itanium.

Without AMD, the market would not have spoken anything.


> AMD exists and was allowed to design

What authority do you propose to gate who is and is not allowed to design semiconductors?

The x86 ISA started at 8 bits, sharing design elements with an earlier 4 bit device. It was successfully extended first to 16 and then 32 bits. Extension to 64 bits was inevitable. The fact that a hungry competitor fulfilled this inevitability before the market forced Intel to do so is an interesting detail and little more. The fact that this obvious evolution was all that was necessary to wipe out Itanium is telling.


The authority of patents and license agreements that Intel had with AMD, partially forced by a court agreement, without them x64 would never happened in a legal way.

Intel 80x86 architecture was successfully to 16 and then 32 bits, by Intel.


> Intel 80x86 architecture was successfully to 16 and then 32 bits, by Intel.

Intel would have been forced to extend x86 to 64 bits sans AMD. The market wanted address space. It did not want a "better" ISA, a new programming paradigm and all the other junk. The part of Intel still listening to customers understood this and Project Yamhill (Intel's 64 bit extension to x64) was underway at least a year before the first AMD x86_64 device appeared and after the first Itanium devices were sold. They knew they had a product the market didn't want and started moving to 64 bit x86 before AMD even delivered their first K8.


Market would have taken whatever was available to buy.


You think Intel would have just stopped manufacturing x86 chips?


Intel would have been able to drive the market into Itanium, since x64 would not exist.


this is not how economics works


Indeed, economics require competition, which only happened thanks to AMD x64 alternative.


They also built it upon the promise of a "sufficiently smart compiler" which never manifest, and ensured there was no second source to provide market competition.


EPIC also had numerous problems. The most severe I think was that it exposed the innermost workings of the chip.

That sounds like an awesome idea, but it's actually a major issue because once you expose something you freeze it in time forever.

Pretty much all modern processors larger than in-order embedded cores are basically virtual machines implemented in hardware. The actual execution units are behind a sophisticated instruction decoder that schedules operations to achieve maximum instruction level parallelism and balance many other concerns including heat and power use in modern designs.

The presence of this translation layer frees the innermost core of the chip to evolve with almost total freedom. Even if fundamental innovations were discovered like practical trinary logic or quantum acceleration of some kind, this could safely be kept behind the instruction decoder.

EPIC on the other hand by exposing the core freezes it. I predict that if EPIC would have taken over eventually you'd have... drum roll... an instruction decoder for each EPIC "lane" or whatever that did exactly what today's instruction decoders do. It probably would have ended up evolving into a synchronous multithreaded vector processor with cores that look not unlike today's cores complete with pipelines and instruction schedulers and all the rest of that stuff.

I can imagine one scenario where EPIC and other designs that show their guts could work, but it would require the cooperation of operating system vendors. (Stop laughing!)

OSes could implement the instruction decoder layer in software, transpiling binaries from a standard bitcode like WASM, JVM bytecode, LLVM intermediate code, and/or even pre-existing instruction sets like X86 and ARM to the underlying instruction set of the processor core. Each new processor core would require what amounts to a driver that would look not unlike an LLVM code generator.

Have fun getting OS vendors to do that. Another major problem would be that CPU vendors would be incentivized to distribute these things as opaque blobs, making it very hard for open source OSes to support new chip versions. It would be a bit like the ARM / mobile phone binary blob hell, which is one of the factors making it hard to ship open source phone OSes or make open phones.

Keeping the instruction decoder on the silicon basically just avoids this whole shitshow. It lets the CPU vendors keep their innovations closed as they wish without imposing that closed-ness on the OS or apps.

The final issue with kernel compilation is that the performance probably wouldn't be much better than what we get now. We'd trade the overhead of an instruction decoder in silicon for a lot of JIT or AOT compilation and caching in the OS kernel. The performance and power use hit might be just as large or larger.


>That sounds like an awesome idea, but it's actually a major issue because once you expose something you freeze it in time forever.

Here is an example that should be pretty familiar. When you have an auto-vectorizing compiler and it generates AVX instructions then you will have to recompile your program when AVX2 or AVX512 are out.

With EPIC this problem is extended across the entire program. The ADD instruction is twice as fast? You now have to recompile everything to use ADDv2.


> OSes could implement the instruction decoder layer in software, transpiling binaries from a standard bitcode like WASM, JVM bytecode, LLVM intermediate code, and/or even pre-existing instruction sets like X86 and ARM to the underlying instruction set of the processor core. Each new processor core would require what amounts to a driver that would look not unlike an LLVM code generator.

We already have that; the bitcode is called "source code" and the instruction decoder is called a "compiler".


> Have fun getting OS vendors to do that

That is exactly what IBM and Unisys mainframes do with their language environments, and has been picked up on Windows Store, Android and watchOS as well.


Yeap. One of my roommates from college worked at HP during the Itanium development.. it was great on paper but customers didn't want it or need it. The bigger picture is throwing everything away every few model years isn't "innovation," it's planned-obsolescence consumerism and pointless churn. Turing completeness and standardization > whiz-bang over-engineering.


The bigger picture is throwing everything away every few model years isn't "innovation," it's planned-obsolescence consumerism

x86* is a terrible architecture, though. The world would be a much better place without it.

Turing completeness and standardization > whiz-bang over-engineering

Turing completeness and standardization are fine, but there were and are better things to standardize on. "Something with a reasonable amount of thought going into it" is not "over-engineered."

Claiming "planned-obsolescence" would be reasonable if people weren't going through all of this work to maintain fifty years of (near) compatibility with an architecture originally used in a calculator.


You aren't wrong that x86 is terrible. But it's only terrible because it has survived so long and offered so much value through so many periods of change. I believe that any architecture that lives long enough will become "terrible".

It's not a bug. It's the scar tissue of hard won success. Long live x86!


x86 is a terrible architecture. If we changed it overnight the world be an extremely marginally better place (if we ignore backward compatibility). In reality on large powerful machines the ISA matters very little. Where the ISA matter (low end mobile, embedded), x86 has never been a thing.


x86 is a lousy architecture, but x86-64 isn't as bad; at least it has a good number of registers unlike x86.

It's easy to prove x86 is lousy by seeing its success in embedded and mobile devices: it hasn't had any, despite trying (Atom). ARM reigns supreme here.

x86 manages to hold on because of 1) backwards compatibility with proprietary software (namely Windows), and 2) inertia. We don't really notice it that much because we've covered over it with abstraction layers: almost no one writes assembly code any more.

We would be better off if we switched to something better, but we wouldn't notice it much; we might see a slight amount of power savings, and OS developers would be happier. But the benefits just aren't worth the costs. It's unfortunate, though: the major chipmakers should be able to just develop a nice, clean-sheet architecture which retains the good parts of x86 (like the PCIe bus and enumeration, things missing on ARM) and cleans up the bad things, and users shouldn't have to do anything other than make sure to select the Linux distro that matches that architecture. The presence of binary-only proprietary software that only works on x86(/64) is probably the biggest reason we're stuck with it: a competing CPU maker can't just come up with something new and have it "just work" (after getting their compiler changes merged into GCC and LLVM of course).


> x86 is a lousy architecture, but x86-64 isn't as bad; at least it has a good number of registers unlike x86.

Funny thing.

Whether you boot a modern x86 system in 32 bit mode or 64 bit mode doesn't change the number of registers you're using. You're still using the 32-128 physical registers on the core. That's why x86_64 code isn't particularly faster (sometimes slower) than the same code compiled for x86_32 mode.

People have this zany idea that assembly language is a low level language. It's not. When the CPU executes 32 bit x86 code it "compiles" it to uops that are totally unrecognizable to us and use dozens to hundreds of registers.

The thing about embedded kinda exposes your mental bias. When you scale up the x86 instruction decoder from Intel Atom scale to Xeon scale, the instruction decoder gets a little bit more complicated, but it's given a ton more tools to use. So sure, Atom sucks at embedded, but x86 is still king at desktop and beyond, and ARM will never be able to challenge it.

If ARM wanted to challenge x86 in terms of single threaded performance, it would need to do the same thing x86 does: have a super complicated instruction decoder that maps the 32 logical registers defined by the ISA to its 128 physical ones, reschedules everything, renames stuff where appropriate, identify loads that can be elided, etc. And all of the advantages of having a simple ISA go out the window, because the ISA is an illusion.

Unfortunately Intel's Architecture Code Analyzer is dead. LLVM MCA is almost as good. I recommend you play around with it a bit some time. CPUs kinda don't give a crap if you're speculating four loop iterations ahead and you're just using the same few registers over and over again for multiple purposes.


>When the CPU executes 32 bit x86 code it "compiles" it to uops that are totally unrecognizable to us and use dozens to hundreds of registers.

Yes, but the x86 code itself can't address hundreds of registers, only a handful, because it assumes that's all there is (because that's all there was in the actual x86 processors way back). So the fact that you're using a complicated instruction decoder to get around this and make use of much more capable hardware underneath seems to me to be a big source of inefficiency: surely you would have more performance if your ISA could directly use the hardware resources, instead of needing a super complicated instruction decoder.

>So sure, Atom sucks at embedded, but x86 is still king at desktop and beyond, and ARM will never be able to challenge it.

ARM is already challenging it. They have ARM-64 servers now. Here's a place selling them, from a quick Google search: https://system76.com/servers/starling

>If ARM wanted to challenge x86 in terms of single threaded performance, it would need to do the same thing x86 does: have a super complicated instruction decoder that maps the 32 logical registers defined by the ISA to its 128 physical ones, reschedules everything, renames stuff where appropriate, identify loads that can be elided, etc.

Ok, then why not just make a new (or at least extended) ISA that makes direct use of all those things, instead of needing a super complicated instruction decoder? We already have lots of different ISAs for embedded CPUS: ARM has all kinds of variants (ARMv7, ARMv9, etc.), and MIPS does too. For best performance, you have to compile for the exact ISA you're targeting. We don't do this for desktop stuff mainly because Microsoft isn't going to make 30 different versions of Windows, but for embedded systems it's perfectly normal because everything is compiled from source.


> ARM is already challenging it. They have ARM-64 servers now. Here's a place selling them,

TBH that's not challenging x86, any more than Atom is challenging ARM in the embedded space. The fact that they're for sale doesn't mean they're good. Those server CPUs have terrible performance per core.

> Ok, then why not just make a new (or at least extended) ISA that makes direct use of all those things, instead of needing a super complicated instruction decoder?

Directly accessing all the parts which are hidden behind the ISA is called VLIW, and the performance is terrible every time someone tries to reinvent it. It sucked even when Intel released the Itanium, which ran Windows.

The problem is that many of the data dependencies are dependent on timings which aren't available at compile time. (Integer division, for instance, the latency is sensitive to the values of its operands. To say nothing of cache timing.) A super complicated instruction decoder knows what data it has and what it doesn't while it is making decisions about what uops to dispatch on the half dozen or so lanes it's managing. A sufficiently advanced compiler does not, so a VLIW has to wait for all data in the half dozen or so lanes to become available before it is allowed to dispatch the instruction. If you want to do "interesting" rescheduling/renaming, you need to bring back the super complicated instruction decoder. (AFAIK later Itaniums started down the path of a complicated instruction decoder, but the Itanium was canned long before its complexity started approaching contemporary x86 standards. It would have been interesting to watch that develop.)

I think you're fundamentally misunderstanding how much stuff the instruction decoder does. To be fair, I'm not doing it its full justice, (how can I? It won't fit.) but I think you're too quick to think all a CPU does is perform the assembly instructions which are fed to it. As the article states, modern computers aren't just fast PDP-11s.


> ARM is already challenging it. They have ARM-64 servers now. Here's a place selling them, from a quick Google search: https://system76.com/servers/starling

From the site "Starling Pro ARM is currently unavailable.

We’ll loop you in when we have more to share. Want to be the first to learn when it arrives? Get exclusive access before anyone else."

Which is sadly the typical story with ARM servers - limited availability.


Less architetural registers means more expensive stack spills and reloads. 8 (7 really and often 6) registers is too little. 16 is about right for the vast majority of programs.

In theory memory can be renamed as well, but this was not done untill the very last intel and amd architectures and, IIRC, it still has to be conclusively confirmed.


>It's easy to prove x86 is lousy by seeing its success in embedded and mobile devices: it hasn't had any, despite trying (Atom). ARM reigns supreme here.

By that logic everything is lousy. x86 is lousy because it couldn't replace ARM. ARM is lousy because it couldn't replace x86.


I chuckled at that calculator portion of the comment. It's hilarious,when you think about it.


Something about this constantly appearing trope bugs me.

I began programming C and assembler on the VAX and the original PC. At that time, C was a reasonable approximation of the assembly code level. We didn't get into expanding C to assembly that much but the translation was reasonably clear.

As far as I know, what's changed that mid-80s world and now is that a number of levels below ordinary assembler have been added. These naturally are somewhat confusing but they aim to emulate the C/assembler model that existed way back then. These levels involve memory protection, task switching, caches and all things involved with having the current zillion-element Intel CPU behave approximately like the 16-register CPU of yore but much-much faster.

I get the "there's more on heaven and earth than your flat memory model, Horatio" (apologies to Shakespeare).

BUT, I still don't see any of that making these "Your Ceeee ain't low-level no more sucker" headlines enlightening. A clearer way to say it would "now the plumbing is much more complicated and even c programmers have to think about it".

Because... adding levels below C and conventional assembler still leaves C exactly as many levels below "high level" language as it was before and if there's a "true low level language" for today I'd like to hear about it. And the same sorts of programmers use C as when it was a low level language and the declaration doesn't even give any context, doesn't even bother to say "anymore" and yeah, I'm sick of it.

Edit: plus this particular actual article is primarily a rant about processor design with C just pulled into the fight as a stand-in for how people normally program and modern processors treat that.


> Because... adding levels below C and conventional assembler still leaves C exactly as many levels below "high level" language as it was before and if there's a "true low level language" for today I'd like to hear about it. And the same sorts of programmers use C as when it was a low level language and the declaration doesn't even give any context, doesn't even bother to say "anymore" and yeah, I'm sick of it.

Not really. For many purposes, C is not any more low-level than a supposedly "higher level" language. 20 years ago one could argue that it made sense to choose C over Java for high-performance code because C exposed the low-level performance characteristics that you cared about. More concretely, you could be confident that a small change to C code would not result in a program with radically different performance characteristics, in a way that you couldn't be for Java. Today that's not true: when writing high-performance C code you have to be very aware of, say, cache line aliasing, or whether a given piece of code is vectorisable, even though these things are completely invisible in your code and a seemingly insignificant change can make all the difference. So to a large extent writing high-performance C code today is the same kind of programming experience (heavily dependent on empirical profiling, actively counterintuitive in a lot of areas) as writing high-performance Java, and choosing to write a program with extreme performance requirements in C rather than Java because it's easier to control performance in C is likely to be the wrong tradeoff.


C has aged better than Java though. While Java still pretty much expects that a memory access is cheap relative to CPU performance like in the mid-90's, at least C gives you enough control over memory layout to tweak the program for the growing CPU/memory performance gap with some minor changes.

In Java and other high-level languages which hide memory management, you're almost entirely at the mercy of the VM.

IMHO "complete control over memory layout and allocation" is what separates a low-level language from a high-level language now, not how close the language-level instructions are to the machine-level instructions.


Picking Java is a bad example, plenty of high level languages are AOT compiled and provide the same control over memory allocation as C does.

And Java is in the process of getting those features as well anyway.


What popular "high level" languages might we consider? Scanning various lists, you have bytecode based languages (Java, .NET-languages, etc), you have the various scripting languages (Python, Ruby, Perl, etc), you have languages compiled the JVM (Scala, Elixar, etc), you extension to c (c++, objective-c). It seems all those are either built on the c memory model with extensions or use an

provide the same control over memory allocation as C does.

But the argument in thread is about something or other eventually being lower-level than C, right? C++, objective-C, D and friends "high-low", provide higher-level structure on the basic C model. Which in most conceptions puts higher than C but we can put them at the same level if we want, hence the "high low" usage, which is common, I didn't invent it.

Basically, the flat memory model that C assumes is what optimization facilities in these other languages might grant you. Modern CPUs emulate this and deviate from it in combination of some memory access taking longer than others and through bugs in the hardware. But neither of these things is a reason for programmer not to normally use this model, it's a reason to be aware, add "hints", choose modes, etc (though it's better if the OS does that).

And maybe different hardware could use a different sort of overt memory. BUT, the C programming language is actually not a bad way to manipulate mix-memory so multiple memory types wouldn't particularly imply "ha, no more c now". But a lot of this is cache and programmers manipulating cache directly seems like a mistake most of the time. But GPUs? Nothing about GPUs implies no more C (see Cuda, OpenGL - C++? fine).


.NET based languages include C++ as well, and .NET has have AOT compilation to native code in multiple forms since ages.

Latest versions of C# and F# also do make use of the MSIL capabilities used by C++ on .NET.

Then if we move beyond those into AOT compiled languages with systems programming capabilties still in use in some form, D, Swift, FreePascal, RemObjects Pascal, Delphi, Ada, Modula-3, Active Oberon, ATS, Rust, Common Lisp, NEWP, PL/S, Structured Basic dialects, more could be provided if going into more obscure languages.

C isn't neither the genesis of systems programing, nor did it provide anything that wasn't already available elsewhere, other than easier ways to shoot yourself.


It is literally impossible to write any reasonable high performance software in Java. (Yes, I've worked with Java devs who thought they had written high performance software, but they had no point of reference). This is mostly due, among other things, to the way modern CPUs implement cache, and the way Java completely disregards this by requiring objects to be object graphs of randomly allocated blobs of memory. A language that allows for locality of access can easily be an order of magnitude faster, and with some care, two orders of magnitude.


Its "literally possible" to do this with Unsafe, and has been for a long time. You get a block of memory, the base address, then you put things in it.

Just because its not the "idiomatic java style" doesn't mean its not Java. You might do this because you can use this for the parts that really need hand-tuned performance, then rely on the JVM/ecosystem for the parts that don't need it.


Java forces you to use profiling, at least with C you can see the exact instructions your compiler outputs. Missing the fancy vector instructions? Modify your code til you can guarantee it's vectorized. With Java you are at the mercy of the JVM to do the right thing at runtime.


Not that I disagree with what you're saying, but I thought you'd find it interesting: you can dump the JIT assembly from Hotspot JVM pretty readily to make sure things like inlining are happening as you'd expect.


You can also view the entire compiler internals in a visual way using the igv tool. You can actually get much better insight into how your code is getting compiled on a JVM like the GraalVM than with a C compiler.

However, I will admit that this is very obscure knowledge.


Java also allows to see the exact instructions the JIT/AOT compiler outputs, it is a matter of learning the tools.


The instructions no longer tell the whole story though. Maybe you can tell whether your code is vectorised, but you can't tell whether your data is coming out of main memory or L1 cache, and to a first approximation that's the only thing that matters for your program's performance.


> if there's a "true low level language" for today I'd like to hear about it

"Steel Bank Common Lisp: because sometimes C abstracts away too much" https://www.pvk.ca/Blog/2014/03/15/sbcl-the-ultimate-assembl...

ATS - C plus everything you could get from modern types, including type-safe pointer arithmetics https://www.youtube.com/watch?v=zt0OQb1DBko

Compiling Haskell to hardware http://conal.net/blog/posts/haskell-to-hardware-via-cccs


I don't see how these get past the limitations of assembly (and C) language mentioned in the post above. None of those links seem to indicate anything about exposing the cache hierarchy or out of order execution to developers.


The authors point was that it’s hard to separate discussion of modern CPU design from the constraints of C. Not from a technical perspective but from a pragmatic/commercial one.

The take away for me was that while C is obviously a higher level abstraction than CPUs, it’s a mistake to think that C has been designed for that hardware, when nowadays it’s the other way around.


But even if the hardware is designed for C instead of the other way around, getting away from C is going to be hard, indeed, if the hardware is designed for C, it kind of makes C the lowest level you can count on.

I'm just saying that the authors group together a variety of claims under "C is not low level" but while the claims themselves might be reasonable, they don't support the base point.


The article blames C for the processor designs that emulate older (non-parallel) processors. I think it can be summarised as C having a relatively straightforward translation into hardware-friendly assembly in the past, but these days both major CPUs and compilers are working hard to preserve the same model, so that neither the assembly is hardware-friendly nor efficient/optimizing C compilers are straightforward.


But "C has a bad effect" is a lot different from "C is not low level". Maybe C had a bad effect because it's low level and they should have been emulating the lisp model instead - for all I know.

Edit: I seems like it's really the flat memory model with single processing that CPUs and compiler-designers have been trying to preserve and that's not C-specific. Indeed, I think higher level languages are effectively more wedded to that.

I mention in another reply - Nvidia GPUs, complex memory model, programmed with Cuda, a C++ system. It really seems like C as such only enters here as something to beat-up on to make other points (whatever the validity of the other points).


There are different definitions of "low level language", but I think a charitable interpretation of the one in this article is that they view processors as providing a virtual machine, and count "levels" from what's more efficient for hardware (that is, the bits below that virtual machine). Though maybe a somewhat strange/confusing title was picked to attract attention.


OK, following that, in the case of the GPU, C++ code is translated into SPX "assembler" but SPX is very much a macro system and the programmer doesn't get access to the true low level.

And I don't think we'll escape the situation that there will low-level emulation code that programmers can't and should not access. It's good to know but that doesn't change the levels programmers normally work with.


>Maybe C had a bad effect because it's low level and they should have been emulating the lisp model instead - for all I know.

Of course lisp machines were a thing. They went out of fashion when more conventional architectures could run lisp code faster than their dedicated machines.


> I began programming C and assembler on the VAX and the original PC. At that time, C was a reasonable approximation of the assembly code level. We didn't get into expanding C to assembly that much but the translation was reasonably clear.

Right: On the VAX, there wasn't much else for a compiler to do other than the simple, straightforwards thing, and I'm including optimizations like common subexpression elimination, dead code pruning, and constant folding as straightforwards. Maybe loop unrolling and rejuggling arithmetic to make better use of a pipeline, if the compiler was that smart.

> As far as I know, what's changed that mid-80s world and now is that a number of levels below ordinary assembler have been added.

You make good points about caches and memory protection being invisible to C, but they're invisible to application programmers, too, most of the time, and the VAX had those things as well.

Another thing that's changed is that chips have grown application-visible capabilities which C can't model. gcc transforms certain string operations into SIMD code, which vectorizes it and turns a loop into a few fast opcodes. You can't tell a C compiler to do that portably without relying on another standard. C didn't even get official, portable support for atomics until C11.

You can dance with the compiler, and insert code sequences and functions and hope the optimizer gets the hint and does the magic, but that's contrary to the spirit of a language like C, which was a fairly thin layer over assembly back in the heyday of scalar machines. I don't know any modern language which fills that role for modern scalar/vector hybrid designs.


SIMD design itself isn't constant between different processor families. Any purported standardized language for scalar/vector hybrid either has to rely on a smart optimizer or be utterly platform specific.


> SIMD design itself isn't constant between different processor families. Any purported standardized language for scalar/vector hybrid either has to rely on a smart optimizer or be utterly platform specific.

That is indeed part of the problem. There might be enough lowest-common-denominator there to standardize, like there is with atomics, I don't know, but I'm not saying that C needs to add SIMD support. I'm saying that any low-level language needs to directly expose machine functionality, which includes some SIMD stuff on some classes of processor.

Maybe there will be a shakeout, like how scalar processors largely shook out to being byte-addressable machines with flat address spaces and pointers one word size large, as opposed to word-addressable systems with two pointers to a machine word (the PDP-10 family) or segmented systems, like lots of systems plus the redoubtable IBM PC. C can definitely run on those "odd" systems, which weren't so odd when C was first being standardized, but char array access definitely gets more efficient when the machine can access a char in one opcode. (You could have a char the same size as an int. It's standards-conformant. But it doesn't help your compiler compile code intended for other systems.) C could standardize SIMD access once that happens. However, it would be nice to have a semi-portable high-level assembly which targets all 'sane' architectures and is close to the hardware.


You’re mistaken about the PDP-10. Yes, you could pack two pointer-to-word pointers into a single word; but a single word could also contain a single pointer-to-byte. See http://pdp10.nocrew.org/docs/instruction-set/Byte.html for all the instructions that deal with bytes, including auto-increment! And bytes could be any size you want, per pointer, from 1 to 36 bits.

Not to say that I disagree with your main point.


C#/.Net has vector operations in System.Numerics.Vectors namespace, which will use SSE, AVX2 or neon, if avaliable. However, there are numerous Simd instructions cannot be mapped that way.

For that reason, .Net core 3 added Simd intrinsics, so now you can give AVX/whatever instructions directly.

If I remember correctly someone made a very performant physics engine with the vector API


I found it insightful, because it goes on to discuss how C has had a huge effect on the programming model all the way from the processor to the compiler and it’s led to problems in performance and security. They suggest that it may be worthwhile designing or using a language that can better handle how processors are structured today.


...how processors are structured today

But it admits that processors today are structured to look like processors of the mid-80s (even though they definitely aren't). Today's processors have lower levels than but those lower aren't intended to be accessed by ordinary programs using any language. Maybe that's a bad thing but we've migrated to a very different argument here.

Maybe, they're saying we could have processors which wouldn't emulate the PDP-11. Sure, maybe we could. Like say, Nvdia GPUs programmed using the cuda system, which is based on ... C++, which also "isn't a low level language".

I mean GPU definitely have emerged as alternate way to use the bizillion transistors of modern chips but broadly, I don't see this as invalidating C (or c++) as a next-step-up-from-assembly level of programming. I mean, the reality is C actually is used in all sorts of complicated memory spaces and chips, often mixed with assembler.

IE, the "not low level" claim still is kind of a troll imo.


While CUDA is designed to make C++ run optimally well (several CppCon talks about it), they are also designed to run any language with a PTX backend well, including C, Fortran, Java, Haskell and .NET.


I think you touched close to an opinion of mine which is anything lower lever than c becomes very hard for ordinary people to work in. I remember DSP's in the 80 and 90's. Where to program them you needed to deeply understand how the machine worked. And guess what eventually manufacturers ported C over to them so that programmers could be productive when working on the non-performant parts of the code base.

If anything modern processors are even worse under the hood. With the added problem that you can't feed one of them raw 'true' instructions fast enough to keep them from stalling.


> Because... adding levels below C and conventional assembler still leaves C exactly as many levels below "high level" language as it was before and if there's a "true low level language" for today I'd like to hear about it.

Me too, actually. I know you meant that rhetorically, but what if you designed an instruction set that better matched modern processor designs and then built a low-level compiled language on top of it? My hunch is that modern processor designs are so complicated that you’d have to do similar amounts of abstraction to make the language usable, but I’m not sure.


The optimizers are much stronger nowadays. They rewrite the program, so that the resulting assembly might have nothing to do with the code you wrote.

Especially if undefined behavior is invalid. Decades ago you did not need to care about undefined behavior. You write a + b, and you know the compiler emits an ADD instruction for +, and that ADD of x86 does not distinguish between signed and unsigned instructions, and you get the same result for signed and unsigned numbers, regardless of overflow. But nowadays the optimizer comes, says, wait, signed overflow is undefined, I will optimize the entire function away.


> still leaves C exactly as many levels below "high level" language as it was before

And the levels below assembly are inaccessible. Assembly/machine code is the lowest level you that's accessible.


I have read through the comments already posted and the comments from the previous HN discussion linked also. I can’t help but feel like I got something completely different from this article than everyone else. I am convinced that Chisnall used the ‘C is not a Low-Level Language’ title as click bait. The actual point of the article is to push the view that the x86 ISA, which according to him, is structured the way it is due purely to the desire to make the massive amount of existing C code run faster. His argument is essentially that ‘low-level’ programmers are not delusional, but are purposely being deluded by chip manufacturers. C, and x86 assembly, according to The article are not low level because they have only a passing relevance to the actual architecture of modern CPUs. Chisnall then goes on to argue that a low level language would require an ISA that presents a clearer picture of the actual architecture and would be geared for performance given the actual functionality of the CPU. He then bats around several features that could be a part of an ISA for a multi core, hierarchical memory, pipelined chip. His references to alternate memory models, changes in register structure and amount, a push for immutability and other features to adapt the ISA to reflect what would constitute actual performant code.

I’m all for his vision, it seems like there could be an x86 ISA translation layer or a portion of cores dedicated to maintaining X86 compatibility l, while transitioning to a new ISA. In fact, just have a new ISA be the target the CPU reducers x86 to and also expose that underlying ISA. But as said elsewhere in the thread, it’s been tried before and it hasn’t worked yet.


But it's not clear that this has been tried. Itanium wasn't exactly that, was it?

Edit: It would be easy to imagine exposing some of the lower level details in a new ISA that lives alongside x86, and then allowing languages that have the right abstractions to make use of them, creating better fits between existing abstract programming models and underlying computational resources...


This article quotes Perlis' famous saying that "a programming language is low-level when it calls attention to the irrelevant", and I am reminded of another Perlis aphorism :

« Adapting old programs to fit new machines usually means adapting new machines to behave like old ones. »


Only met him once, but the man was a giant of computer science.


A number of claims here and in the original article are inaccurate interpretations of the almost random walk that we have made to get to our modern processor designs and the C programming language.

Instruction level parallelism and out of order execution were done by the seminal CDC 6600 as early as 1964–at the time one of the worlds fastest computers. (I remember a conversation with Seymour Cray about the difficulty of handling machine state during an interrupt on such an architecture.) The C programming language didn’t come along until almost 10 years later.

As the article says, C is a good fit to the architecture of the PDP-11, mini-computer very different that the mainframes of the time. There were many competing visions for what a “high-level” programming language ought to look like back then, Pascal (1970), LISP (pre-1960), Prolog (1972), FORTRAN (pre-1960), COBOL (pre-1960), Smalltalk (1972), Forth (1970), APL (1966), Algol (pre-1960), Jovial (pre-1960), PL/1 (1964), CLU (1974). As a professional developer and CS grad student during this period we were well aware of these alternatives. Many of these had escape hatches to gain low level access to the machines that they ran on and libraries were customarily written in assembly language. C came along during this period. It wasn’t my favorite—the whole pointer/array punning seemed unnecessary to me.

Why did C prevail? Was it because it was low-level? No, there were other well established languages capable of low level work. I recall, a few reasons: Unix, DEC’s PDP-11, and Yourdon.

First Unix was amazing and C was the favored language on Unix. Just being able to type man on a TTY or VDT and see the man page for a command was so novel. Unix OS and commands were written in C.

Second, the PDP-11 was a very popular good machine and Unix ran on the PDP-11. Unix on a PDP-11 was a lot more fun than dropping off decks of punch cards to run on an IBM 360 or CDC mainframe.

Edward Yourdon was an influential American software consultant, author and lecturer. He had picked C over Pascal and other languages as a “practical” general purpose high level language to recommend.

Meanwhile, hardware was was not at all a monoculture; even though the PDP-11 was a comercial success, it was a simple machine and just a mini-computer. There were many attempts at alternative architectures. Harvard memory architecture machines, capability based addressing, programmable wide-microcode, LISP machines, RISC systems. I’ve programmed or designed systems for most of these.

So why did the x86 become one of the dominant architectures? It was because of the power of mass production of integrated circuits. Computers using other architectures can be built, but like the LISP machines, they will be slower and more expensive than mass produced processors.


All those languages are very serial and branch-happy too.

What would the alternatives be? SIMD, basically, and some sort of language layered on that -- array languages most likely -- and a complete retraining of programmers.

TFA also talks about the UltraSPARC CMT architecture and says it's a bad fit for C because most C programs don't use a lot of threads. That's nonsense though, since in fact there are many C10K-style, NPROC threads/processes applications out there. Sure, many applications remain that are thread-per-client, but those were obsolete in the 90s, and most such apps I run into are Java apps because Java didn't tackle async I/O way back when. I suppose Java is also C's fault since Java resembles C.

C is a scapegoat here, but TFA still has a point if we ignore that part of it: our programming languages (not just C) are serial and branch-happy, even when they have well-developed threading and parallelism features, and this translates to pressure on CPUs to do a lot of branch prediction.

But we do have less-radical ways out. For example, the CMT architecture results in pretty lousy per hardware thread performance, but pretty good overall performance with minimal or no Spectre/Meltdown trouble (because the architecture can elide most or all branch prediction) -- this won't do for laptops, but there's no reason it shouldn't do for cloud given server applications written in C10K/CPS/await styles.

My bet would be on a hybrid world with a mix of CMT and SIMD, and maybe also some deeply pipelines cores. CMT CPUs for services, SIMD for relevant applications, and deeply-pipelined CPUs for control purposes.


The evolution of minicomputers recapitulated the evolution of mainframes. The evolution of microprocessors recapitulated the evolution of minicomputers.


The money quote:

---

...processor architects were trying to build not just fast processors, but fast processors that expose the same abstract machine as a PDP-11. This is essential because it allows C programmers to continue in the belief that their language is close to the underlying hardware.

---

All else follows: hardware parallelism and memory hierarchy are not exposed to standard C. The compiler rewrites the code ruthlessly to replace loops with sequential instructions, vector instructions, etc (or not, then you wonder why and how to trigger the optimization).

C compilers do a number of things to continue supporting abstractions from 50 years ago. The article suggests that maybe other approaches, not compatible with C, could be considered for CPUs (not just GPUs).


The proposed benefit of C is that it is “close to the metal”, and from that follows that the generated code is “obvious” and thus its performance characteristics are “easy” to reason about.

It turns out that none of these three things are actually true. That just leaves us with a language poorly adapted to today’s use cases and simultaneously hardware that has optimized for the C abstract machine in not always useful or secure ways.


You're quite correct in saying that none of those things are true but you're wrong in saying that these are "the proposed benefit" of C. I feel that's a common misconception really.

A huge advantage of C that most people seem to forget these days is that it's quite a minimalist language by today's standards so it's relatively easy to learn and reason about.

But maybe the most compelling advantage is that its underlying architectural concepts map well to most real-life CPUs so it's a much better basis for compilers to generate efficient code than many languages. This is a subtly different thing from being "close to the metal". It's really more like "has compatible design concepts with CPUs". The abstract machine of C has relatively few gotchas when converting to machine code and that's what it's really about.

That doesn't mean that C's the "high level assembler" people speak about though. It's not. Take a look at some optimised assembler output from a C compiler and you'll see that's patently not true. It can be very difficult to even understand how the C source code relates to the generated assembler code - often you'll see that none of the same operations occur and nothing happens in the same order.


C is easy to learn but not to reason about. First of all, C is really C + macro-C, two separate languages that don't know about each other. Second, all the damn UB. Third, weak typing and fourth, memory errors. So no, reasoning about C is far from easy.


These arguments sound like the arguments of someone who hasn't programmed much in C. C's undefined behavior is a non-issue in almost any realistic programming situation - it would only normally crop up in cases where you're deliberately doing something strange like overflowing a variable. The typing is a non-issue if you don't do stupid things. Memory errors can be an issue but tools like valgrind make them a relatively minor hassle. None of these affect your ability to reason about C, at least not as a normal programmer.


> overflowing a variable

...can easily happen indeliberately. Just as any other exceptions. Which C can't even handle in a deterministic way because it doesn't have built-in exceptions, so you have to rely on libraries or remembering to check error codes. Not easy to reason about.

> typing is a non-issue

When all your function ptrs have to be cast to and from (void*) - hell yeah it's an issue.

> minor hassle

Then why are buffer overflow exploits in C programs on the news?


> When all your function ptrs have to be cast to and from (void*) - hell yeah it's an issue.

I don't know how you got this impression. This is 100% incorrect.


It is comparatively easier to reason about C than other languages that are as useful (meaning the number and quality of libraries, portability) as C.


Wait; what? If C is not a Low-Level Language, then what is a Low-Level Language?

"The features that led to these vulnerabilities, along with several others, were added to let C programmers continue to believe they were programming in a low-level language when this hasn't been the case for decades."

Now C is again the root of all evils...

But I'm afraid that's not right, all those CPU optimizations (branch predictions, speculative execution, caches, etc.) are not tied to any specific languages.

They have been designed to make existing programs run faster; if all our software stack was written in Java, Lisp or PHP, I think that on the hardware front, most of the same decisions would have been made.


> Wait; what? If C is not a Low-Level Language, then what is a Low-Level Language?

Assembly, actual machine code. (Contrary to the article, C was never a low-level language, when it was younger it was literally a textbook high-level language because it allows abstracting from the specific machine, and while it's less likely to be what a textbook points to as an example today, that hasn't changed.)


Following the metrics if the article, assembly language isn't low level either. Assembly language only gives you access to 16 integer registers and the 16 (?) sse/avx SIMD registers on x86_64. It doesn't give you access to the 64 or so integer registers or the who knows how many SIMD registers there are. Assembly instructions do not map to uops, no matter how much we pretend they do. We couldn't even program uops of we wanted to. These instructions are not executed in the order we specify them, and some of them are not executed at all: modern CPUs have their own dead code detectors and will drop instructions if it feels like it.

Assembly language programmers have less control of the microcode than raw JVM bytecode programmers have over the x86_64 instructions that eventually get executed have.


Right, but that's the hardware interface. The CPU consumes a compressed instruction stream. Compression is achieved by the compiler via a lossy mapping of infinite registers onto a finite register set. This stream is then re-inflated by the CPU through discovering false dependencies in the interference graph via register renaming, and then cleaning up spilling via caching.

If this seems absurdly complex, it might be because of the absurd complexity. But the alternative has been tried, and tried and tried (RISC, VLIW), and always a failure. Well fuck.


I mean... sure?

But what's the point, what low level languages are there? The linked article is arguing that C isn't low level because modern CPUs behave so differently than what their hardware interface suggests they do. If we accept this argument, then assembly language isn't low level either, because it suffers from all these same limitations. If assembly language isn't low level, then why is "low level" even a phrase?

My point is that if you're going to argue that C isn't low level, then it's hard to argue that assembly language is. Conversely, if you're going to argue that assembly language is low level, it's hard to argue that C isn't. So it's flippant to argue (in this thread) that assembly language is low level without also rebuking the article or coming up with a persuasive argument as to why C shouldn't be lumped in with assembly language.

Personally, I think the article is wrong. C is low level. It is useful to distinguish between C and Python in terms of C is low level and Python is high level. It is a useful mental model, therefore I'm keeping it. But if people in this thread are going to make both arguments that the article is correct and assembly language is low level, you'll need to justify that fairly strongly.

I'm also not arguing that RISC or (ew) VLIW are the answer.


> The linked article is arguing that C isn't low level because modern CPUs behave so differently than what their hardware interface suggests they do.

Which is right in conclusion, but wrong on reasoning.

C isn't low level because it allows, by design, allows writing code that works on very different hardware interfaces by abstracting away from what the particular machine is does independently of whether or not the CPU behaves the way it's interface suggests. This is why decades ago C was a textbook example of an HLL and nothing relevant to that description has changed in the intervening period.

> It is useful to distinguish between C and Python in terms of C is low level and Python is high level.

Python is in the general class of languages for which the term very high level programming language was created, and, yes, it's useful to distinguish between Python and C (hence the term coined for that purpose), but it's also useful to distinguish between Assembly and C (hence the terms coined for that purpose.)


> allows writing code that works on very different hardware interfaces by abstracting away from what the particular machine is does

Looks at a 8/16bit in-order processor with synchronous, byte-at-a-time memory access and perhaps 1Kbit of on-chip registers total.

Looks at a 64bit, out-of-order, speculative, multicore behemoth with 64(or 72)bit data bus accessed by a embarassingly complicated asynchronous protocol, and cached in multiple MB of on-die RAM, as well as dozens of general purpose registers and hundreds if not thousands of special-purpose or model-specific registers.

Looks at QEMU and other x86 interpreters.

So what you're saying is that x86 assembly is a very bad high-level language?


"Itanium was in-order VLIW, hope people will build compiler to get perf. We came from opposite direction - we use dynamic scheduling. We are not VLIW, every node defines sub-graphs and dependent instructions. We designed the compiler first. We build hardware around the compiler, Intel approach the opposite." https://www.anandtech.com/show/13255/hot-chips-2018-tachyum-...


>Assembly language programmers have less control of the microcode than raw JVM bytecode programmers have over the x86_64 instructions that eventually get executed have.

Can you expand on this a little? There is no compilation that happens for the assembly code as far as I am aware. Wouldn't that execute all of the code serially? I am not an expert in this domain, just curious.


> There is no compilation that happens for the assembly code as far as I am aware. ... Wouldn't that execute all of the code serially?

Nope, that's largely what the article is getting at, more or less. Modern x86 processors optimized the x86 machine code so much that they quite literally 'compile' it down to what are called micro operations, and those micro operations are what the CPU actually executes. And then it goes beyond that, because the x86 machine code doesn't really map to the processor's actual implementation, so the CPU does extra things like register renaming, where it dynamically maps the 16 or so registers exposed in the machine code to say 64 or 128 internal registers (So an instruction like `inc %eax` may actually just write the incremented '%eax' to a completely new internal register rather than modifying the existing value, with that new internal register being the new `%eax`).

And it uses all of this to then aggressively execute the machine code completely out of order, by seeing which instructions have dependencies on other instructions and determining which can be executed out-of-order without affecting the end result. The point of doing this is that there's lots of actions that can stall the processor, with the big two being branching and fetching memory (Either from cache, or main memory). The CPU is much faster than memory and even cache, so any time you have to go to either of those causes a big performance hit, but if you can continue executing instructions during that time (Because they don't depend on that memory) then you can get a lot more performance.

For branching, it effectively prevents the out-of-order execution at that point because the CPU doesn't know what instruction will be executed after the branch. The CPU can do 'branch prediction' however, where it guesses the result of the branch and then keeps executing from that point while waiting for the branch to be resolved. If the guess was right, then there is no delay. If the guess was wrong, the work it did was thrown out and it starts executing from the right location.

Note that, generally speaking, none of these are bad things by themselves, I would even argue they're great things and adding such features to a processor is somewhat inevitable if you want to retain decades of compatibility like we have. But it has arguably resulted in hardware bugs like Spectre and Meltdown, though I would argue it's a lot more nuanced then that and then the article implies. And none of this really has to do with C, we're only talking about x86 assembly (which exists in the way it does almost purely for backwards compatibility).

Intel and AMD do not expose the micro operations in any form, preventing a lot of what the article is talking about. But at the same time, you can easily argue that's a good thing because if they did they would either need to support whatever form they expose for the next decade (And eventually result in a different set of weird optimizations to boost performance while maintaining compatibility), or you'd have to compile different versions of your code for every new CPU (Which would be a disaster).

Edit: I left out one more relevant detail (Which I'm only including because the article talks about it a fair amount) - the CPU requests memory in chunks called 'cache lines', usually 32, 64, or 128 bytes in size. This means that whether or not the CPU will have a particular piece of memory when you code is execution is a more complicated question, because if multiple parts of your code reference memory within the same cache line, it will be a lot faster since it will only require one memory fetch. And code that has no branches will all be in the same cache line (Or consecutive cache lines), which makes the out-of-order execution simpler since all of the code is already fetched. And more still, there's a complex process for ensuring consistency of cached memory across multiple cores/CPUs. Older CPUs didn't bother doing any of this because memory was fast enough to simply be read/written on demand without slowing the CPU down, so the x86 instruction set (generally) acts as though you're reading/writing directly to main memory, without any cache, and it's up to the CPU to maintain that illusion.


Mapping to uops is a trivial translation that hardly counts as compilation. Everything else is dynamic scheduling and speculation which is also not complilation as it is (mostly) data dependent.


> Mapping to uops is a trivial translation that hardly counts as compilation.

That's fair, but now we're just arguing the semantics of what is and isn't compilation :) I understand your criticism though, it's just a 'translator'.


> Following the metrics if the article

The metrics of the article may describe a useful distinction, but it's not really the one the language levels terminology was designed to capture, though it is not too distantly related.


> Assembly, actual machine code

Actually, the author is effectively arguing that x86 assembly is not low-level. Which is somewhat true, but none of the levels under x86 assembly are exposed to the programmer, for the most part.


Microcode. Back when C became a thing, microcode only existed on 'big iron'. That changed 20+ years ago.

The microarchitecture is the 'real' architecture you're running on, the ISA that assembly language and C code is written against is a facade. It has value in that we don't need to rewrite everything every few years when the microarchitecture changes, but the downside is what we consider low level programming languages talking 'directly' to the hardware are now going through another layer of abstraction.


A matter of perspective.

”In a low-level language like C...” - applications programmer

”In a high-level language like C...” - chip designer


Only marginally lower, but a common example is Ada.

You can actually describe hardware registers sanely and portably in Ada. You cannot do that in C.

(It obviously still works, because C is ubiquitous, and so processor and compiler vendors do their hardest to "make it work", but that's no accomplishment of C)


Good previous discussion with 316 comments : https://news.ycombinator.com/item?id=16967675


The talk of out-of-order execution and caching doesn't make much sense. You could write in machine code and still have no idea how long a memory access will take or in what order your instructions will execute, so by this article's logic machine code ia a high-level language. Maybe "sequential instructions with flat memory model" is a high-level abstraction of what a modern machine does, but it is the only abstraction we have. The article proves not that C is high-level but that modern CPUs offer only a high-level interface.

(And you can get around some of this, too. You can issue prefetches and so forth.)


But would this prevent the next Spectre? Not necessarily. Hardware vendors would continue to optimize execution of machine code in unexpected ways, including ways that allow for side channel attacks. They wouldn't be optimizing for C code, but they would still be optimizing for some portable low-level language.

Unless we made that low-level language constantly change as hardware microarchitecture improves (thus giving up on portability), I think we'd be back with the same problem of a mismatch between low-level languages and underlying CPU architecture.


TFA doesn't even propose a non-serial programming paradigm that programmers can easily learn. I suppose array languages, maybe? It's hard to imagine a CPU architecture where there's no branches, or minimal branches, or a language that greatly minimizes them yet can be used to implement the sorts of software we're fond of. A lot of what we do with computers is highly parallel, but also highly serial with lots of logic -- is that the fault of C, or is that natural and C is just a scapegoat? My money is on the latter.


I program in ARM assembler, C, Rust, Java and Clojure on a daily basis and for me C is definitely low level language.

For me a "level" up is something that helps me structure my program in a fundamentally better way. Java has VM and garbage collector, Lisps have macros and REPL. For me these are fundamental enablers to create different types of flows that would not be at all practical in assembly or C.

The difference between assembly and C is just that you need couple of instructions in Assembly to get an equivalent of line of code in C but the fundamental program structure is the same.

The automation is nice but other than reduced overhead the same problems that are difficult in Assembly are still difficult in C and for me this means they are roughly on a same level.


> The difference between assembly and C is just that you need couple of instructions in Assembly to get an equivalent of line of code in C but the fundamental program structure is the same.

This is absolutely not true, unless you turn off all optimizations. If you take that route you find that your C code is slower (sometimes orders of magnitude slower) than code written in other languages.

C is less interesting without breaking the idea that "couple of instructions in assembly" == "a line of C code". Is also much less interesting without undefined behavior (which is what allows many of the optimizations that makes C "fast" and thus "closer to the metal" in many people's minds).

I don't fully agree with the article, but I certainly think it is long past time for a new systems programming language that is safe-by-default. There is plenty of proof that we can define away most types of "undefined behavior", get many kinds of memory safety, while still providing escape hatches for situations where it really matters. Unfortunately we are still firmly in the "denial" phase with half of our industry arguing that C is just fine and dandy.


Compiler optimizations are something that people were rolling off in assembly long before they were realized in C compilers.

You will also notice that compiler optimizations have not much to do with a language being lower or higher level. You go for a higher level language because you have performance to spare and you would instead want to write your large application with less effort.


The level up that c represents is: abstract flow control structures and abstraction of the processor instructions.


I call these automations because they automate what you could otherwise do by hand like in old days.

An experienced C and assembly developer can take almost any piece of assembly code and easily write C equivalent of it and vice versa, take any piece of C code and write equivalent assembly.


> An experienced C and assembly developer can take almost any piece of assembly code and easily write C equivalent of it...

If the assembly was originally compiled from C, the new "C equivalent" can have radically different flow than the original had.

Modern compilers can do rather crazy and surprising transformations to your code.

They might omit large chunks or even turn your loop structure "inside out" to enable vectorization optimizations.

They can pattern match your algorithm and replace it with something faster. See: https://lemire.me/blog/2016/05/23/the-surprising-cleverness-...


I can do the same with e.g. Turbo/Free Pascal, yet no one claims it is an high level Assembler.


The point is that C has no concept of caches. Yet efficient cache usage is crucial to writing performant low level code.

So what C programmers have to engage in cargo culted patterns to try and trigger the correct behaviour from their optimising compiler.


The instruction set architecture has no concept of it either, nor about instruction level parallelism, branch prediction or speculative execution. There is no way of getting this right other than knowing about how the processor is implemented under the hood and issue the right assembly based on that. Often experimentation is needed to get it right.

To fix this, the instruction set need to be changed, and this need to be kept in order to keep it compatible with earlier versions. Thus the instruction set remains as it is with the exception that new instruction may be added.


I'm always amazed that multi-thread code works as well as it does. I can have some data in multiple caches being used by multiple threads on separate CPUs and as long as I get the memory barriers right, it will work. Combine that with predictive branching and out-of-order execution and it feels even more magical.


Yeah, multi threading on a modern processor basically seems like witch craft to me.


You need to define low-level language carefully. As the article says, a language like assembly is indeed close to the metal, however each assembly language is too close to some particular metal to work on other metal. On the other hand, C is probably as close to the metal as you can get and still run on a z80, z8000, m68k, i386, i286, arm, etc.

We would all love to C(!) a better common low-level language that was common across architectures but there is not one that I know of. Anyone?

There is indeed a problem associated with Spectre, Meltdown, etc., but associating that with the C language seems like a misdirection.


This paper revealed me some of the things than modern designs have adopted, which the regular uninformed coder would never notice (which he may not need to most of the times). But recently I have been looking into HFT computers and I see that these things are running C code (a friend works at a small startup who said they use C programs for most of the orders) on regular computers with most even using off the shelf hardware (intels 9900KS and 9990XE are hot targets and anandtech and servethehome have shown off some hardware). HFTs are highly sensitive to optimizations and it is my understanding that the lower they go to the hardware, the better returns given how competitive it can get.

With so much in the middle from a high level C program to low level instructions on a CPU, I wonder if we will see companies like JP Morgan and Morgan Stanley (big ones with money and time to invest) enter the chip business heavily. This could then bring back some of those optimizations to the consumer space and startups in the area of fast and efficient C code then might get into trouble. As of now this area seems to be open to compete.


Sure, a new processor that is designed to be optimized for a different threading and memory model might be better in a lot of ways, but backwards compatibility is important. We can't simply throw away all of the existing software written in C, and in most cases a regression in performance for C programs would be unacceptable. Sure, such a processor might find niche use cases, where it doesn't need to run any legacy software (including OS) and can take advantage of massive parellism, but I don't see it being able to replace the prevalent abstract computer model used in CPU's today.


Is assembly a low level language? It presumably benefits from instruction parallelism, branch prediction, caching too.


Not sure, but someone basically told me nasm is not an assembler because it has optimization options


There's not objectively defined tiers for what level each programming language is on, when people describe a language as low-level they are speaking relatively; They're saying 'picture a typical programming language, I'm talking about something like that except more low-level'.

Almost all people would consider C to be a low-level language compared to most languages they're familiar with. If you work in assembly all day then maybe you don't think C is a low level language, but those people aren't the ones calling it low-level.

Unless someone wants to define a cutoff for what a 'low-level' language is(and that would be a bad idea, the way it's used now is very useful and working as intended - it'd be better to make a new word) then I think the way the term is typically used is perfectly fine.

I could see an argument for saying C is 'less low-level than people typically assume', but saying it's not low level makes me think it's on the same level as Java or something.

I'm probably just nitpicking though, I always feel like people ignore the idea that the point of language is to communicate what you're thinking to someone else and have them interpret what you're trying to say as closely as possible, and language is already pretty good at evolving in a way that optimizes this. Trying to change that or force people to address technicalities that are outside the scope of what they're trying to communicate only complicates it(ignoring obvious exceptions like if you're writing a scientific paper and say things that are factually wrong for the sake of clarity)


There is no surer route to being blasted on HR than to suggest there is anything about C that makes it slow, or distant from the actual machine.

Certainly, C is close to assembly language, but assembly language is itself itself a compiled and heavily rewritten and optimized language, nowadays.

The actual machines we have today are so complex that we are not smart enough to program them in actual machine language or anything close to it, but people insist they want a machine language that looks primitive, and close to C. So, the manufacturers give us that, and then compile the hell out of it, in hardware, trying stoically to extract performant instructions from the vague hints we provide them via the instructions we are willing to give them -- that resemble C.

We are stuck in a deadly embrace: any new language must perform well on a machine designed to emulate the C abstract machine, and any new processor has to emulate that abstract machine.

A new language designed to direct the operations of a wholly different design, no matter how capable, has no chance to succeed.

The only way forward may be for a language to be designed to program FPGAs directly, and bypass the whole C-industrial complex. Unfortunately, FPGAs are still mired in medieval-guild-style secrecy, so there is no more access to their internals than to mainstream C engines. The access offered is via Verilog or VHDL, which resembles C. It is not clear to what degree the guts of FPGAs are compromised by this orientation.

If somebody ever musters the courage to publish a fully exposed FPGA that can be programmed directly, and can be latticed by tens or hundreds for increased power, then it will become possible to create a wholly new language that need not be efficiently translatable to for execution on machines that don't resemble all the C machines. I won't be holding my breath.


> The root cause of the Spectre and Meltdown vulnerabilities was that processor architects were trying to build not just fast processors, but fast processors that expose the same abstract machine as a PDP-11.

What would an abstract machine that better matched current processors look like?

Edit: Probably should've read the rest of the article first


This paper might be of interest: https://arxiv.org/pdf/1902.05178.pdf

it explains the 'concept' of these kind of attacks and why its kind of impossible to make an abstraction which does not suffer from such flaws in the presence of high precision timers which are in turn needed for high precision applications (not sure what, i guess realtime applications or things which need to measure super precise.. maybe audio / video ? )

the paper goes a bit further and imo is a bit simpler to read than the original spectre / meltdown papers.


OoO execution, virtual memory, simd, virtualization, microcoding, caches, and other features that are claimed to exist to propagate the illusion of a pdp11 they all predate or are contemporary with the creation of C.

They have been invented because they objectively make computers faster or easier to use.


Don’t GPUs also do prefetching and use internal texture, tile, and rasterization caches? Don’t some GPU drivers attempt shader optimization to maximize ILP? Didn’t some GPUs have or used to have multiple FMAD and special ALUs pipe?

I’m a little iffy at the idea that we should consider GPU models safe as GPUs for most of their history were single user, and there hasn’t been a lot of time to attack virtualized containers sharing GPUs.


> Low-level languages are "close to the metal," whereas high-level languages are closer to how humans think.

Assembly instructions or the numbers they map aren't that hard to think about, and I prefer them to object hierarchies for example.


Will RISC-V have any effect on this? Is there a way to use RISC-V that solves at least some of these problems. It seems like it should be a priority to solve this no?


I wish my computer had 1024 fast PDP-11 cores in it.


That is an interesting idea. I wonder if a small processor core could be stamped out on an fpga and hundreds of small processors can run simultaneously.

Then I wish my c++ objects each ran in a separate (virtual) processor instance. They could have event signalling built into the language as well as the hardware.

It might force people to partition their design for more fine grained parallelism. C++ objects use function calls for interfacing with objects. Using object methods for event handling is just a ridiculous hack. Each object should have its own memory space.


> I wonder if a small processor core could be stamped out on an fpga and hundreds of small processors can run simultaneously

I think it's very likely that this idea is well explored. CellBE, Tilera, Xeon Phi/MIC, GPUs, etc.

> Then I wish my c++ objects each ran in a separate (virtual) processor instance. They could have event signalling built into the language as well as the hardware.

You're right to identify that the challenge for this kind of design is to come up with a programming model and an associated IPC or I/O or memory tier/caching mechanism. The HPC space is a graveyard of accelerator concepts that never reached critical mass. GPUs have been the rare success. They can amortize their business across multiple industries. They're already built in to a lot of PCs, so easy to experiment with. OpenGL and later CuDA/OpenCL created an abstraction that was fast and somewhat portable (not as much in cudas case). The abstraction relieved you of the burden of having to know much about the device's internal design whole still being quite fast.

> Each object should have its own memory space.

I don't think I know what advantage this provides. Can you share more? What do you do about composition? Nested memory spaces? Sounds challenging and potentially high overhead.


Yes the examples you have all have multiple processing elements but they are vector processors. I was talking about simple and cheap scalar processors.

GPUs rely on symmetry to simplify the hardware design. Multiple (like 64) processing elements share the same instruction decoder. They have to access adjacent registers. So they become vector processors.

CPUs devote a huge chip area to caches and instruction pipelines. GPUs took out much of that area and complexity and replaced it with raw floating point computing power. For certain applications this has proven to be a good trade off.

What I described was a similar trade off...replacing a few heavily pipelined processors with massive amounts of cache memory with smaller cheaper cores. I wonder if it might prove to be the optimal micro-architecture for certain applications and given certain languages and certain data patterns.


I'm not sure how many of these[0] would fit in the largest FPGA you could easily buy.

[0] https://wfjm.github.io/home/w11/


It likely has. GPU.


Porting e.g. the original Quake to that would be quite an undertaking.


If C is not low-level anymore andnot fast without a ton of optimizations, then let's see a truly low-level language that's as fast or faster. Let's see Erlang or whatever consistently beat C. That's supposed to be easier for a language that maps better to the modern hardware, right? So where is it?


Nowhere bevause the modern hardware is only developped to be accessed C-like.

Of course that cripple it, but that is what the market ask so it is the only one produced.

Or you could look at HPC. Ever wondered why they use Fortran?


<joke>Why has no one mention Go yet?</joke>




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: