Hacker News new | past | comments | ask | show | jobs | submit login
C and C++ prioritize performance over correctness (swtch.com)
265 points by bumbledraven on Aug 18, 2023 | hide | past | favorite | 543 comments



> There are undeniably power users for whom every last bit of performance translates to very large sums of money, and I don’t claim to know how to satisfy them otherwise.

That is the key, right there.

In the 1970s, C may have been considered a general-purpose programming langauge. Today, given the landscape of languages currently available, C and C++ have a much more niche role. They are appropriate for the "power users" described above, who need every last bit of performance, at the cost of more development effort.

When I'm working in C, I'm frequently watching the assembly language output closely, making sure that I'm getting the optimizations I expect. I frequently find missed optimization bugs in compilers. In these scenarios, undefined behavior is a tool that can actually help achieve my goal. The question I'm always asking myself is: what do I have to write in C to get the assembly language output I expect? Here is an example of such a journey: https://blog.reverberate.org/2021/04/21/musttail-efficient-i...

I created the https://github.com/protocolbuffers/upb project a long time ago. It's written in C, and over the years it has gotten to a state where the speed and code size are pretty compelling. Both speed and code size are very important to the use cases where it is being used. It's a relatively small code base also. I think focused, performance-oriented kernels are the area where C makes the most sense.


> In the 1970s, C may have been considered a general-purpose programming langauge. Today, given the landscape of languages currently available, C and C++ have a much more niche role. They are appropriate for the "power users" described above, who need every last bit of performance, at the cost of more development effort.

I really don't think this is true. I've worked in CAD and video games and embedded software, and in all those you're likely using C++ (not to mention a lot of desktop software that can't afford to embed a chromium instance for electron.) For some reason people here just assume that anything that isn't web development or backend server CRUD stuff is a niche.

As much attention as something like Rust or whatnot gets on hacker news, the reality is that if you can't afford garbage collector pauses, and you need performance, you're using C/C++ most of the time.


Rust is not a GC language. It can and does achieve the same perf as any modem C++ with the latest flavor of the C++ safety guardrails.

The set of reasons to start a new project in C/C++ are few and the list is shrinking by the day.


I love Rust, but as a daily C++ user (embedded microcontrollers, not supported by Rust yet, and then native Linux tooling to interface with them) what I find most frustrating about Rust is the number of crates that need to be knit together to make working software. Often these crates are from a single dev who last committed in 2020 or something. I really wish there was something like go’s standard library in Rust. That has been my barrier to adoption, and believe me I WANT to use it at work and I don’t want to juggle chainsaws anymore.


I'm sympathetic. I'm partway into an embedded project on an stm32 / m0 for which there _is_ good rust support through Embassy and it's utterly magical. And at the same time, trying to do something for which there isn't a good no-std crate is, er, anti-magical to the point of forcing me to change the design. The ecosystem isn't as comprehensive yet.

But when it works, wow. This has been the most fun embedded project I've ever done with the least debugging.


I have replaced my use of C (and C++) in embedded with Rust for the last couple of years. Sure, some parts are still missing or immature, but overall also very promising. And I do enjoy it so much more than the MCU manufacturer's please-insert-your-c-code-between-these-comments-and-pray kind of development.


Are you doing this on personal projects or commercial projects? I'm pretty happy with c++17 on all my embedded commercial projects


Strictly commercial. But if one is happy with other language I guess there is little reason to switch.


Er how is that different from C++? It doesn't have a go-like std library either, and you can totally use Rust without crates.io in the same manner.


I suppose that wasn’t clear. My mistake. I use C/C++ for micro controller code every day, because Rust doesn’t support my microcontroller, doesn’t have a real RTOS, and I’m making real hardware. Something that needs to run in the field for 10+ years and I can’t get out there and ensure it’s updated, and my company doesn’t want to invest millions of dollars on beta testing embedded rust on. So embedded is out for now but I’m looking forward to when it’s supported fully and out of beta. Most embedded controllers come with some RTOS and libraries they support, written in C.

For tooling, I can context switch out of C++ which does have boost, into [rust, go, python… etc] and deal with that switch, or just be lazy and write it in C++. I’ve tried to write three tools in Rust so far, and the pain of not having a good stdlib, of essentially searching the internet for a blog post that solves my issue then finding the blog post was written by one dev 4 years ago to pump up their library that was written 4 years ago and then never supported after that… it’s a bit exhausting.

Again, before ya’ll attack. This is from the perspective of a willing, excited customer. I want to use Rust at work and advocate for it. Just saying, it’s not easy in the state it’s in.


Which microcontroller are you using? Rust support for embedded targets is slowly improving, so there might be a beta build for your chip.


ESP-32, the Rust support is not mature enough to ship commercially.

C-410 from Qualcomm, which is a whole different world, I can’t even look at the source code.


In C++, you'll often see projects just use boost, which is a big monolith of useful libraries.


Embedded microcontrollers are not the place where boost is used.


Counterpoint: I use Boost on embedded microcontrollers. (Albeit only header-only components of Boost.)


The people behind the esp32 are writing a rust compiler.... I have played with it only a little, but it worked

Edit:spelling


Not a rust compiler, they provide crates to interface with their C code in ESP-IDF, which is essentially a FreeRTOS port that supports the dual core Espressif chips, which vanilla FreeRTOS does not support. Also some of their own libraries for things like MQTT, which I found unfortunately subpar in comparison to the vanilla FreeRTOS code.

It’s all beta software, but here is what they list in the docs:

Services like Wi-Fi, HTTP client/server, MQTT, OTA updates, logging etc. are exposed via Espressif's open source IoT Development Framework, ESP-IDF. It is mostly written in C and as such is exposed to Rust in the canonical split crate style:

a sys crate to provide the actual unsafe bindings (esp-idf-sys) a higher level crate offering safe and comfortable Rust abstractions (esp-idf-svc) The final piece of the puzzle is low-level hardware access, which is again provided in a split fashion:

esp-idf-hal implements the hardware-independent embedded-hal traits like analog/digital conversion, digital I/O pins, or SPI communication - as the name suggests, it also uses ESP-IDF as a foundation if direct register manipulation is required, esp32c3 provides the peripheral access crate generated by svd2rust. More information is available in the ecosystem chapter of The Rust on ESP Book.


A full rust compiler instead of use the "de facto" one? on a language as complicated as rust that sounds scary unless they have a lot of financial backing and significant amount of compiler devs.


Why would they not just adapt the code generator for the reference compiler? Rust is a moving target, committing to a new compiler at this point is a massive investment.


However, Rust doesn't have a "I know what I'm doing, let me be" switch. No, "unsafe" is not that.

I have a couple of languages in my belt C, C++, Go, Python, Java are the primary ones. C and C++ are reserved for "power use". For power use, I mean scientific, HPC stuff or any code I need the full capacity of the processor.

To get that, sometimes you need -ffmast-math, sometimes said undefined behavior, or code patterns which translate to better code for the processor you target.

Rust doesn't allow any of that, moreover limits me the way I can write my programs, no thanks.

For security-sensitive compiled code, I can use Rust, but I can't use Rust anywhere, and everywhere. It also has its niche, and it can't and won't replace C or C++ at the end of the day. It'll get its slice (which might be sizeable), a couple of compiler implementations, and will evolve to become another mainstream , boring language, which is good.

Also, if you give half an effort, C++ is pretty secure and robust to begin with, esp. on the pointers department. Just be mindful, and run a couple of valgrind tests on your code, and you're set. Been there, done that.


To get that, sometimes you need -ffmast-math, sometimes said undefined behavior, or code patterns which translate to better code for the processor you target.

There isn't really any fundamental limitation of Rust for those situations. A better reason why C++ is better than Rust for compute is the support for OpenACC, OpenCL, OpenMP, CUDA, ROCm/HIP, etc.

Also, if you give half an effort, C++ is pretty secure and robust to begin with, esp. on the pointers department.

The problem is that even if a "Safe" C++ could be defined, it would need to interoperate with the rest of the "Unsafe" C++ world. Perhaps a programmer can write absolutely safe C++ themselves but all bets are off once you're working with other programmers.

Just be mindful, and run a couple of valgrind tests on your code, and you're set. Been there, done that.

Seriously? A "couple of valgrind tests" would be insufficient for catching all but the most trivial and/or blatant memory safety issues in C++, especially when dealing with malicious users. Although I agree that dynamic analysis like valgrind and sanitizers should be the default for C++ development, industrial grade static analysis tools are at least as important. And needless to say, if you find Rust complains about code too much then just wait for what static analysis tooling will complain about.


That kind of integration is another reason, yes, but this is a passable moat, so I'm not talking about it much (mind you, I live in HPC world).

If programming is a team sport, every team player should be up to the level your project needs. If they are not, they should raise their bars. Seriously.

Yes, seriously, be mindful. Implement allocation and destruction first, fill your code between these two steps.

A couple of Valgrind tests is actually a whole procedure. It's an unfortunate downplay by me [0], probably because I'm very used to use Valgrind as a second nature.

[0]: https://news.ycombinator.com/item?id=31218757


So what about multithreaded code with potential race conditions that result in memory errors sometimes. Valgrind already only able to test the actually ran code paths, let alone find race condition-y bugs.

Also, this elitist notion of raising the bar is an instant “turnoff” for me, with all due respect, I can only assume from that that the given person may not be as much an expert as they think of themselves. Not saying that it applies to you as well, but you don’t leave me much other choice.


Memory safe - I will give you that. All the guarantees you listed are fine for synchronous Rust. But race conditions and deadlocks are not unknown in async Rust. Lots of race condition issues in tokio - including issues where futures leak.


That’s true, and usually I am the one pointing this out, that Rust is only data race free :D thanks for the pointer!


> Rust doesn't have a "I know what I'm doing, let me be" switch.

Every piece of code you write is security-sensitive. IMO It is our professional responsibility to our users that we treat it that way.

For a vast majority of situations, even if you know what you are doing, you are human and therefore will make mistakes.

If the (probably) most heavily invested C++ codebase, Chrome, has memory use-after-free/overflow bugs, there is no way that truly safe C++ can be written at any product-scale.


> Every piece of code you write is security-sensitive.

No, not every application is CRUD, not every application is interactive and no, not every application serves users and communicate with outside world in a way allowing to be abused during its lifetime.

This doesn't mean it gives me the freedom to write reckless code all over, but the security model of an authentication module and a mathematical simulation is vastly different.

> there is no way that truly safe C++ can be written at any product-scale.

I agree to disagree here, because of my own experience.


Like every other job in the world has a concept of liability,that will eventually arrive on computing world at scale.


The assumptions you make are almost palpable at this point.

Simulation precision and accuracy already carries tons of liability.

Talking like I or we don’t care about consequences is disturbing, esp. with the experience you have.


Well your responses look like you don't care, as if you had any clue about my experience as well, unless the sightseeing from that high altitude cloud is giving mirages.

Any cybersecurity law that requires citzens votes for adoption, will certainly get my signature.


If running static analyzers, testing for memory leaks, running end to end tests which takes days, adding strong bounds checking, and failing fast and with great noise as soon as something does not look right is not caring, you’re right. I don’t care.

All this is done for a code which has no network access, no kernel level work, even doesn’t accept any input after parsing the input model and making sure that it’s well formed with no loose ends.

Imagine what I’d do if it does any kind of authentication or user affecting work.


Call me surprised. OK, fair enough.


During my PhD I wrote a lot of simulation code in Matlab. This code was only ever run by me on my personal machine for the purpose of generating plots for my thesis. How is that security-sensitive?

We all agree that security is important and a lot of code running in security-sensitive contexts is bad. That's very different from saying that all code is security-sensitive.


Matlab is memory-safe, though, which is the kind of safety issue we talk about.


> Matlab is memory-safe, though

Who sold you that notion?

Matlab is proprietary glue holding together modules written in Fortran, C, C++, Java that allows the adhoc inclusion of DLLS put together by graduate students on the fly.

In what universe is any of that "memory safe"?


AFAIK it used to be written/interpreted in Java (though it uses a JIT compiler nowadays, doesn’t it?), and in that sense the language itself is memory safe. Of course if you link to a dependency written in an unsafe language you can have memory issues, but that is not what people commonly mean by “a memory safe language”.


> Of course if you link to a dependency written in an unsafe language

Like the matrix handling, plotting, and other modules dragged in from the 1990s standard FORTRAN library era.

Don't get me wrong, I have placed a lot of trust in those kinds of numeric libraries since the 1980s - NIST and other national labs code and review to a high standard - but they weren't then and aren't now "memory safe" just very probably and mostly bug free.

Unless they were, say, rewritten in Java .. and that's a whole other big ball of issues .. translated code.

I'm happy to play along and agree that the JIT compiled 'Matlab language' parts of Matlab are memory safe - but to the best of my experience with past versions that's just the glue portions - the reason so many people use Matlab is for the chunks that get glued together.


Aren’t many of these Fortran libraries heavily used all throughout almost any kind of application, somewhere deep down their dependency tree?


Absolutely .. and generally written to a very high standard with a lot of checking, rechecking and running forwards and backwards across multiple architectures with extensive technical mailing lists.

They can still have bugs, memory issues, be called with parameters that cause resource blow outs and aren't "memory safe" in the sense that modern managed languages are.

Matlab is exactly as reliable as well written Fortran and C code can be.


> To get that, sometimes you need -ffmast-math, sometimes said undefined behavior, or code patterns which translate to better code for the processor you target.

> Rust doesn't allow any of that, moreover limits me the way I can write my programs, no thanks.

I'm willing to prove you wrong here. Can you give some concrete examples?

For -ffast-math you can most certainly enable it, but AFAIK for now only in a localized fashion through intrinsics, e.g.:

https://doc.rust-lang.org/std/intrinsics/fn.fadd_fast.html

https://doc.rust-lang.org/std/intrinsics/fn.fmul_fast.html

So instead of doing this the dumb way and enabling -ffast-math for the whole program (which most certainly won't need it) you can profile where you're spending the majority of your time, and only do this where it'll matter, and without the possibility of randomly breaking the rest of your numeric code.

Personally I find this to be a vastly better approach.


Thanks for the information, I'll look into that. There's another comment talking about an algorithm I have implemented, it should be around in this thread.

> So instead of doing this the dumb way and enabling -ffast-math for the whole program (which most certainly won't need it)...

When you write HPC enabled simulation code, you certainly need it, and this is where I use it.


Is HPC code magically immune to values containing infs or NaNs?


In what sense does rust not expose the full power of the processor to the user? Can you give a concrete example?

> Just be mindful, and run a couple of valgrind tests on your code, and you're set.

Thousands of severe CVEs every year attest to the effectiveness of that mindset. "Just git gud" is not a meaningful thing to say, as even experienced devs routinely make exploitable (and other) mistakes.


Rust probably won’t allow me to implement a couple of lockless parallel algorithms I have implemented in my Ph.D., which are working on small matrices (3K by 3K). These algorithms allow any number of CPUs do the required processing without waiting for each other, and with no possibility of stepping on each other, by design.

In the tests, it became evident that the speed of the code is limited by the memory bandwidth of the system, yet for the processors I work with that limit is also very near to practical performance limit of the FPUs, as well. I could have squeezed a bit more performance by reordering the matrices to abuse prefetcher better, but it was good enough (~30x faster than the naive implementation with better accuracy) and I had no time.

Well, the method I verify the said codebase is here [0]. Also, if BSD guys can write secure code with C, everybody can. I think their buffer overflow error count is still <10 after all these years.

[0]: https://news.ycombinator.com/item?id=31218757


> Rust probably won’t allow me to implement a couple of lockless parallel algorithms I have implemented in my Ph.D., which are working on small matrices (3K by 3K)

That’s a bold claim to make without evidence. Things like that totally seem doable in Rust without a lot of ceremony. You might have to launder some lifetimes through unsafe if you can’t convince the compiler statically, but that’s not that hard.

> Also, if BSD guys can write secure code with C, everybody can. I think their buffer overflow error count is still <10 after all these years.

As was the problem with Linux enthusiasts claiming Linux was safer bedside fewer CVEs, BSD similarly benefits from having almost no attention paid to it + very few core developers so the feature growth rate is minimal. I don’t but the claim that BSD’s qualities extrapolate anywhere rather than a more mundane explanation or that BSD is the exception that proves the role somehow given no other successful project seems to be able to do the same.


I would happily give it a try, when I have some time on the side. But that day is not today, unfortunately.

> I don’t but the claim that BSD’s qualities extrapolate anywhere rather than a more mundane explanation or that BSD is the exception that proves the role somehow given no other successful project seems to be able to do the same.

It's mostly about "it's done when it's done" mentality and always putting the discipline before writing code fast. It's a vastly different mentality when compared to modern, conventional software development. This mentality also makes what Debian is Debian.

On the other end of the spectrum, we have the stressed solo developer syndrome, which brings us OpenSSL with its all problems and grievances.


> Rust probably won’t allow me to implement a couple of lockless parallel algorithms I have implemented in my Ph.D.

Why? Can you be more specific? Rust has built-in support for atomics and essentially the same memory model as C++, e.g.:

https://doc.rust-lang.org/std/sync/atomic/struct.AtomicU32.h...

And you can sidestep essentially any requirements of normal Rust using an `UnsafeCell` (e.g. that's how mutexes are implemented, with an `UnsafeCell` holding the data and an atomic).

I've written my share of lockless algorithms and I haven't really noticed any deficiency compared to C++, but then, I'm not an expert in this area, so if there is a deficiency here I'd love to hear about it.


A lot of people who criticize c++ seem to still think it's 2008 and there aren't the crazy amount of compiler checks/linters/standard practices that have happened since c++11 was released.

I like rust and I dick around with it a bit on some small REST projects I have for home automation but for day-to-day work I stick with c++ until rust support picks up in the industry a lot more than it currently is.


Because many of us only know C++ best practices from conference slides, and our own side projects, while we cry in despair when looking at typical corporate code bases.

Why do you think Bjarne Stroustoup main subject on conferences is how to write C++ properly without C style coding, for several years now?


What does Rust doesn’t allow? There really are only very few hard limits I know of, e.g. tail call elimination.

Ad absurdum it has quite great inline asm support.


I phrased it poorly, I didn't mean to imply Rust is GC'd. I mean that it's still niche. C++ very much is not niche, which is my point. The other non-niche languages generally are GC'd (java, C#, javascript, etc.)

> The set of reasons to start a new project in C/C++ are few and the list is shrinking by the day.

That'd be true if every project started from scratch with only the language standard library. And yet..... almost any project you start is going to be dependent on a large chunk of code you didn't write, even on greenfield projects.


> That'd be true if every project started from scratch with only the language standard library. And yet..... almost any project you start is going to be dependent on a large chunk of code you didn't write, even on greenfield projects.

I think this is true, and I'd refine my original statement accordingly. My original comment was thinking more from first principles, not as much about pragmatic considerations of ecosystem support.

If we were to disregard the momentum of existing ecosystems, I think C/C++ would be niche choices today: very important for certain, focused use cases, but not meant for the masses. Taking into account the momentum of existing ecosystems however, they still play a large role in many domains.


New large scale C++ projects are started every single day. There are more than 5 million C++ programmers out there. Compared with that Rust is but a drop in the ocean.


> For some reason people here just assume that anything that isn't web development or backend server CRUD stuff is a niche.

I think that's because anything more "low level" (using the phrase freely) quickly becomes highly specialized, whereas web / CRUD development is a dime a dozen.

Source: Am web / CRUD developer. It's a weird one, on the one side I've been building yet another numeric form field with a range validation in the past week, but on the other I can claim I've worked in diverse industries; public transit, energy, consumer investment banking, etc. But in the end it just boils down to querying, showing and updating data, what data? Doesn't matter, it's all just strings and ints with some validation rules in the end.

But there's my problem, I don't know enough about specialized software like CAD or video games or embedded software to even have an interest in heading in that direction, let alone actually being able to find a job opening for it, let alone being able to get the job.


For me the key is that I can lay things out in memory exactly the way I want, if necessary to the point where I can fit things in cache entirely when I need the performance and only to break out of the cache when I'm done with the data. This is obviously not always possible but the longer you can keep it up the faster your code runs and the gains you can get like that are often quite unexpected. I spend a lot more time with gprof than with a debugger.


Perf is a niche, here's another: address space lets you talk to hardware.

VM and OS abstractions have been so successful that you can go a whole career without talking directly to hardware, but remember, at the bottom of the abstraction pile something has to resolve down to voltages on wires. Function calls, method names, and JSON blobs don't do that. So what does? What abstraction bridges the gap between ASCII spells and physical voltages?

Address space. I/O ports exist, but mostly for legacy/standard reasons. Address space is the versatile and important bridge out of a VM and into the wider world. It's no mistake that C and C++ let you touch it, and it's no mistake that other languages lock it away. Those are the correct choices for their respective abstraction levels.


Idk what you're on about, I mmaped a `/dev/uio` from Python this morning. Yeah, I had to add it in a .dts file and rebuild my image, but even slow as shit high level languages like Python let you bang on registers if you really want to.


That worked because you were on an OS that supported it, using a device with simple enough behavior that things like timing or extra/elided writes didn't matter. It's great when that works, but there are very many environments and devices for which that option won't exist.


> That worked because you were on an OS that supported it

I mean... yeah? I'd be surprised if most operating systems didn't have a facility for directly accessing memory mapped devices in some capacity. Even Windows does I think. Have any examples of an OS that doesn't support banging directly on memory?

> using a device with simple enough behavior that things like timing or extra/elided writes didn't matter

Yeah, but that's less of a function of the language like GP was referring to and more a property of using a non-RTOS that doesn't play nice with timing sensitive things from user-space. The language itself is not really the issue there, which was my point.


>I mmaped a `/dev/uio` from Python this morning.

You might want to go and have a look at the implementation of 'mmap' in CPython. Here's a clue: it's in the name 'CPython'.


There's nothing special about C that makes it possible to use mmap. Just because CPython happens to be implemented in C doesn't mean anything in this context. I could do the same thing in Go or Hare, neither of which have any C in them and be able to do the same thing.


As it is for an OS written in C, sure it does have some helper C wrappers.

It is still “syscall”s at the end of the day, and many programming languages can output assembly just fine.


Perhaps, but C and C++ assume flat address spaces and modern hardware includes many programmable devices with their own device memory, e.g. GPUs. Naturally this discontinuity causes a great deal of pain and many schemes have been developed to bridge this gap such as USM (Unified Shared Memory).

Personally I would like to see a native language which attempts to acknowledge and work with disjoint and/or "far away" address spaces. However the daunting complexity of such a feature would likely exclude it from any portable programming language.


Those disjoint memory addresses can be an absolute pain to deal with, ask anyone who had to spend time dragging performance out of the PS3 despite it being faster on paper. UMA/USM can also bring it's own set of issues when you have pathological access patterns that collide with normal system memory utilization.

For what its worth UMA/USM wasn't build to bridge a gap but rather to offer greater flexibility in resource utilization for embedded platforms, that's been moving upstream(along with tiling GPUs) over the years. With UMA you can page in other data on a use-case basis which is why they were relatively popular in phones, if you don't have a bunch of textures loaded on the GPU you can give that space back to other programs. Although come to think of it we used to stream audio data from GPU memory on certain consoles that didn't have large discrete system memory(the bus connecting System <-> GPU memory had some pretty harsh restrictions so you had to limit it to non-bursty, low throughput data which audio/music fit well into).


There is a technical report for C (from the C standards committee) called 'Embedded C' which extends C with 'named address space' storage qualifiers. So you can do 'float _Gpu myarray[1<<14];' As far as I know, nobody uses it.


Rust is making advances here (look at “embedded Rust” efforts). I am curious since I haven’t written kernel code since C went sideways: how easy is it to write a driver that has to manipulate registers with arbitrary behavior at arbitrary addresses with modern C compilers and avoid all undefined behavior? I seem to recall Linus has a rant on this.


Thanks for giving an actual example of such optimizations. In my personal experience my C++ (and Rust) code was often outperformed by the JVM's optimizations so I've found it hard to relate to the tradeoffs C++ developers assume are obvious to the rest of us.


I struggle to resonate with what you are saying, as my experience is the opposite. I'm curious where this discrepancy is rooted. Reckless hypothesis: are you working on majority latency or majority throughput sensitive systems?

I have seen so, so, so many examples of systems where latencies, including and especially tail latencies, end up mattering substantially and where java becomes a major liability.

In my experience, actually, carefully controlling things like p99 latency is actually the most important reason C++ is preferred rather than the more vaguely specified "performance".


The specific example that comes to mind was translating a Java application doing similarity search on a large dataset into fairly naive Rust doing the same. Throughput I guess. It may be possible to optimize Rust to get there but it’s also possible to (and in this case did) end up with less understandable code that runs at 30% the speed.

Edit: And probably for that specific example it’d be best to go all the way to some optimized library like FAISS, so maybe C++ still wins?


I've seen C++ systems that are considerably slower than equivalent Java systems, despite the lack of stack allocation and boxing in Java. It's mostly throughput, malloc is slow, the C++ smart pointers cause memory barriers and the heap fragments. Memory management for complex applications is hard and the GC often gives better results.


I've seen so may flat profiles due to shared_ptr. Rust has done a lot of things right but one thing it really did well was putting a decent amount of friction into std::sync::Arc<T>(and offering std::rc::Rc<T> when you don't want atomics!) vs &mut T or even Box<T>. Everyone reaches for shared_ptr when 99% of the time unique_ptr is the correct option.


I see comments like yours more than I see anyone suggesting actually using shared_ptr widely. In my experience, most people (that use smart pointers - many do not) prefer to use unique_ptr where they can.


I don't think anyone is suggesting shared_ptr explicitly, more that if you're coming from Python/Java/etc it's a similar to a GC memory model and a bit more familiar if that's where your experience is. I've observed in a number of codebases unless you're setting a standard of unique_ptr by default it's fairly easy for shared_ptr to become widely used.

FWIW I consider shared_ptr a bit of a code-smell that you don't have your ownership model well understood, there are cases to use it but 90% of the time you can eliminate it with a more explicit design(and avoid chances of leaks + penalties from atomics).


Bear in mind that my ultimate perspective is that you shouldn't use smart pointers (or C++) at all.

But even if you think they have some value, it isn't a flaw in a language if it's not immediately obvious how to write code in it for people that are new to it. If you're coming from Python or Java, then learn how to write C++. There are probably as many books on C++ as on any other language out there.


Badly written C++ will be as slow as well written Java. Absolutely. But there is no way any Java code will perform better than well optimised C++. High performance C++ use custom allocators to completely avoid using malloc (for example).


From my experience with embedded coding you are correct. Most stuff lives and dies enclosed in a single call chain and isn't subject to spooky action at a distance. And stuff that is I often firewall it behind a well tested API.


It would be fascinating if you could give any such comparison examples. The only time I've seen JVM come anywhere close to C++ in normal usage is if the C++ code was written like it was Java - that is, lots of shared_ptr heap allocations for nearly everything. Or perhaps you're one of the rare few that write Java code like it was C instead? You can definitely get something fast like that, but it seems all too rare.


There is no universe where well written and optimised C++ code will be out-performed by Java code. You might get similar performance in Java if you create zero objects (except during initialisation) and then manually manage the usage of objects at runtime. However only if the matching C++ code is written naively.


I think the reason to use C/C++ over java has less to do with the various optimizations and more to do with control over memory layout (and thus at least indirect control over cache usage and so on). Plus you remove a lot of "noise" in terms of performance (GC hiccups, JIT translation, VM load time, etc.).


+1. Part of the problem is that x86 assembly is hardly programming the metal anymore. The performance characteristics of processors has changed over the years also, compilers can be more up-to-date than humans.


x86 assembly doesn't represent the actual opcodes the CPU executes anymore, but it's still the low level "API" we have to the CPU. Even if assembly isn't programming to the metal, it's definitely more to the metal than C, and C is more to the metal than Java, etc. Metalness is a gradient


Metalness is a gradient

It would be better to say that "Metalness" is a sort of "Feature Set". IMO, most programmers would tend to agree that C++ is far closer to Java than C is, yet C++ is every bit as low level as C is. Indeed, even a managed language like C# supports raw pointers and even inline assembly if one is willing to get their hands dirty.


There is expressivity and abstraction-ability which are language features, and there is “levelness” or metalness as is it being referred to.

Low and high level languages have two definitions I have seen, one is more objective, but utterly useless: only assemblies are low-level, everything else is high. The other one is more about what can be controlled in a given language’s idiomatic usage, I believe that’s what most people actually refer to intuitively - e.g. C#/Java have ways to manipulate pointers, but you would generally not say that.

In that regard, C, C++, Rust are quite close to the metal with the latter two actually being even closer as they have native control over SIMD datatypes as well, while C doesn’t have that. Note, this is only a partial ordering, as e.g. C#/Java also have control over vector instructions, yet I wouldn’t go around claiming them closer to the metal as C.

Going back to expressivity, C++ is very expressive, while C is very inexpressive, but that is a completely orthogonal axis.


Lol, I like the word "metalness".


In the 1970s, the alternative to C was assembly. In fact, I can remember hearing stories of the fights needed to pry every-day, normal developers---not systems programmers, not power users---off of assembly to a "higher level language".

It wasn't until the 90s that it became clear that C was not appropriate for applications work (and this was in the era of 4-16MB machines), although that took a long time to sink in.


In the 1960s they already had languages such as Lisp, Fortran, Algol, Basic... Even Pascal is two years older than C, and ML also came out around that time.

The statement "the alternative to C was assembly" is simply incorrect.


For practical purposes - involving real world constraints in memory size, cpu power and storage - C was the tool of choice, it ran on just about everything, allowed you to get half decent performance without having to resort to assembler and was extremely memory efficient compared to other languages. Compilers were affordable (Turbo C for instance, but also many others such as Mark Williams C) even for young people like me (young then...) and there was enough documentation to get you off the ground. Other languages may have been available (besides the ubiquitous BASIC, and some niche languages such as FORTH) but they were as a rule either academic, highly proprietary and very, very expensive.

So that left C (before cfront was introduced). And we ran with it, everybody around was using C, there wasn't the enormous choice in languages that you have today, you either programmed in C or you were working in assembler for serious and time-critical work.


Modula-2 and Pascal dialects compilers were also quite affordable, many of which being sold by companies that were selling the same C compilers, there were even product packages to get all languages in one go.

Easily proven by looking into computing magazine archives.

So lets not distorce history.


I'm not 'distorting history', I'm just relating what was practical. Yes, there was Modula-2 and yes there was Pascal. But both felt extremely academic and restricted. Pascal still lives on today in the form of Delphi, which has to be a record for being in continuous use as a complete environment. I even paid for both Pascal and Modula-2 on the BBC Micro but performance was just horrible compared to the BASIC that came with that machine in spite of being compiled, compilation took a long time, even for simple programs. Wirth's books were enlightening though and I took a lot of lessons and algorithms from them and ported them to C (some of those I'm still using today!).


Yes you definitly are, given how Modula-2 and Pascal were widely adopted across the European continent, specially in demoscene circles in the PC and Amiga ecosystems.

As if the BBC Micro was an example of the industry adoption at large during those days.

Any Byte, Computer Shopper, DDJ, The C Users Journal (later The C/C++ Users Journal), Crash, Amiga Format, Input, PC Techniques, You Computer, Your Sinclair,... from 1980 - 1995 thereabouts, shows a different picture in article contents and ad sections, in terms of language adoption and available set of compilers being sold.


One of my favorite quotes comes from DDJ in the late 80s, from a letter to the editor. Something like:

"Object oriented programming is just functions with persistent state and multiple entry points, and we all learned how bad that was in the 70s."


I'm not sure why this upsets you as much as it does. For me the 'common' languages in use were various flavors of BASIC, COBOL, various flavors of Assembly and C in that order and those were doing a lot of work compared to the rest of the large number of languages that was technically available. I personally liked and worked with many more but they weren't exactly in mainstream use even though you could get them if you wanted to.

It's a bit like me saying that today Erlang is widely adopted. Yes it is, in some circles and I absolutely love it. But even though it has some adoption the big heavy lifters today are Python, C++, C and Java. Maybe C# on account of MS and probably Javascript should rate a mention. That doesn't mean that the other 2500 or so computer programming languages do not exist or do not have relevance. But it is a realistic reflection of market share. At the computer store where I occasionally worked you could see this reflected in the requests for boxed copies of compilers and interpreters for various languages.

FoxPRO + Delphi became a very powerful combination for SME administrative systems (and before that DBIII) and once Apple launched Objective-C that got some significant market share on their platform too. But any kind of commercially developed software outside of those niches was likely to be built on the list above.

I've taken a great interest in various programming languages and learned a lot of them to the point that I could write code in them to compare them, so I'm well aware all of these (and many more!) existed. The machines I worked with were: Dragon32, BBC Micro, Acorn Atom, Apple II, C64 (though, mostly from a hardware point of view, to fix them), Atari ST (lots of work on that one), IBM PC, some mainframes (notably: IBM 4381, Sperry 1100/90) and probably others that I don't remember right now.

Some of those only had proprietary languages and compilers on them, and on some things were more free. If there was a language on any of those systems that made it large enough to rate an article in the magazines you mentioned I probably played with it. But across all of the people that I worked with at the time I've met exactly one person the programmed in Pascal and they abandoned the project because it was dog slow (it got ported to assembly...) and Modula-2 or Oberon I've never come across in the wild other than with my own experiments.

BASIC, COBOL, Assembly and C were the languages that I recall being in mainstream use. That does not invalidate your experience, it may simply not match mine.


Yeah, my experience was indeed something else, regarding the software industry in Iberian Peninsula, Germany, France and Switzerland from 1970's until modern times.

What triggers me is the usual kind of message that C wiped everything away, being the first of its kind, when on other realities, it only started to matter when UNIX tooling became part of the picture.

On my bubble C became a thing, when Windows 3.0 went mainstream.

Coding in C on MS-DOS was mostly preparing exercises for Xenix programming classes, a single tower that students had to take turns on, so we need to prepare everything in advance on Turbo C 2.0.

On Amiga it was mostly Assembly and AMOS, on the demoscene community I was part of.


In what year did you start programming professionally and in what languages was that done and on what kind of machines?


Ah, trying a trick question?

In 1992, using a mix of Turbo Pascal, DBase III+ and Clipper on MS-DOS computers, and Novell Netware.


> Ah, trying a trick question?

No, I was just curious. Why would you even think I was trying to trick you? And into what?

You normally make a ton of sense but in this thread I can't follow you and I sense a ton of emotion and projection that are entirely out of character for you.


On practice those were either proprietary or required beefy machines that the people writing C couldn't get.

On the limited environment where C got created, it was the only option. And everybody suddenly adopted that limited environment, because it was cheap. And then it improved until you could use any of those other languages, but at that point everybody was using C already.


Fortran predates C by over a decade, and dozens of compilers existed for it by the mid-60s. Much of that legacy code is still in use to this day in scientific computing. One example: John Pople won his Nobel prize for work he did (and implemented in Fortran) in computational chemistry - the prize was delayed until '98 but he did the work in the 60s. The software he commercialized it into, Gaussian, still gets releases to this day and remains one of the most popular computational chemistry packages.

It's really dependent on which field you're in. Not all scientific computing requires a beefy computer, but for a very long time it (and I guess LISP) dominated scientific computing. That said, I think it's a very good point to bring up the network effect of using C - if I need to hire a developer in 1985, it's probably easier to find someone with industry (not academic) experience who knows C than it is to find someone who knows Fortran.

I do kinda prefer Fortran to C though, it's so much cleaner to do math in. Maybe somewhere there's an alternate universe where IBM invented C and Bell invented Fortran to win the popularity war.


I tried to get access to FORTRAN and it was just way too expensive and required machines that I would not have been able to get close to. C ran on anything from Atari STs, IBM PCs, Amiga's and any other machine that an ordinary person could get their hands on.

The other mainstream language at the time was BASIC, comparable to the way PHP is viewed today by many.

And with the advent of the 32 bit x86 era GCC and djgpp as well as early Linux systems really unlocked a lot of power. Before then you'd have to have access to a VAX or a fancy workstation to get that kind of performance out of PC hardware. It's funny how long it took for the 386 to become fully operational, many years after it was launched you had to jump through all kinds of hoops to be able to use the system properly, whereas on a 68K based system costing a tiny fraction that was considered completely normal.


My perspective is a bit biased by scientific computing, I do more of that than enterprise stuff (and Python has been fine for personal use). It's cool to see the perspective of someone who was around for the early stages of it though.

How did people see Fortran back then - nowadays it's seen as outdated but fast, but was it seen as interesting, and what drove you to seek it out?

Other side question if it's okay, I keep seeing references to VAXen around historical documents and usenet posts from the 80s and 90s, what made them special compared to other hardware you had back then?


My one experience with FORTRAN was when working for a big dutch architect who made spaceframes, I built their cad system and a colleague built the finite element analysis module based on an existing library. We agreed on a common format and happily exchanged files with coordinates, wall thicknesses and information about the materials the structure was made out of and forces in one direction and a file with displacements in the other. It worked like a charm (combined C and FORTRAN). I thought it was quite readable, it felt a bit archaic but on the whole not more archaic than COBOL which I had also worked with.

The reason that library existed in FORTRAN was that it had a native 'vector' type and allowed for decent optimization on the proper hardware (ie: multiply and accumulate) which we did not have. But the library had been validated and was approved for civil engineering purposes, porting it over would have been a ton of work and would not have served any purpose, besides it would have required recertification.

As for VAXen: A VAX 11/780 is a 32 bit minicomputer, something the size of a large fridge (though later there were also smaller ones and today you could probably stick one on a business card). It had a - for the time - a relatively large amount of memory, and was a timesharing system, in other words, multiple people used the same computer via terminals.

They weren't special per se other than that a whole raft of programmers from those days cut their teeth on them either because they came across them in universities or because they worked with them professionally. They were 'affordable' in the sense that you did not need to be a multinational or a bank in order to buy one, think a few hundred thousand $ (which was still quite a fortune back then).

I had occasional access to one through the uni account of a friend, but never did any serious work with them. The first powerful machine I got my hands on was the Atari ST, which had a 68K chip in it and allowed the connection of a hard drive. Together those two things really boosted my capabilities, suddenly I had access to a whole 512K of RAM (later 1M) and some serious compute. Compared to a time shared VAX it was much better, though the VAX had more raw power.

Concurrent to that I programmed on mainframes for a bank for about a year or so, as well as on the BBC Micro computer (6502 based) and the Dragon 32 (a UK based Color Computer clone).

Fun times. Computing back then was both more and less accessible than it is today. More because the machines were so much simpler, less because you didn't have the internet at your disposal to look stuff up.


There was a freely available FORTRAN on the Fred Fish Amiga disks in the late 80s. It came with a weird license---free for everything, but it couldn't be used for military or military-related things.

The 386 was hampered by backwards compatibility with the weird memory architecture for the 286 and 8086, IIRC. 68Ks were just soooo much easier.


Oh cool, I never knew that. 386 flat mode was easy enough, but only djgpp unlocked that properly, everything else was a hodgepodge of the weirdest memory models.

I remember making a bootloader for my own OS using TurboC that somehow had to fish four files from a minix based filesystem and dump them in memory at certain addresses and then switch to flat mode and start executing the kernel which would then initialize the system properly. That was some really weird mixed mode voodoo to switch back and forth between protected mode and real mode to be able to both use BIOS calls to fetch blocks from disk and to be able to park the data in the right spots in contiguous memory.

Very tricky stuff to get right, it took me forever before I had the first indication that something was executing after the inevitable hail Mary jump to the first instruction in the loaded kernel. But I got it to work. I actually had a guitar footpedal hooked to the reset button because it took me too many dives under the desk to recover from a crash. That was a very slow development cycle without any chance of debugging. At some point I had 8 leds hooked up to the printer port to use some 'out' opcodes to tell me where I had gotten to in the code. Poor mans emulator :)


FORTRAN didn't officially grow a program stack until FORTRAN 90 (?), although I'm told some of the earlier FORTRAN 77-era compilers supported it.

I remember meeting people who could not wrap their heads around the idea that function parameters weren't all passed by reference. The idea of recursion would have caused them to detonate.


IIRC, Mac OS Classic was written in Pascal, as were other operating systems. C just won the popularity contest.


That was about a decade after C had already won.


The Xerox Alto was early 70's and that GUI OS was written in Pascal. I always thought Pascal was better designed and less bizarre than C. Null terminated strings were particularly a bad idea.


"Why Pascal is Not My Favorite Programming Language", Brian W. Kernighan, April 2, 1981:

https://www.cs.virginia.edu/~evans/cs655/readings/bwk-on-pas...

One of the major problems, as I recall, was that Wirth's basic Pascal definition was very limited---the max 256 character strings, for example. So the makers of "production" Pascal compilers added extensions to do all of the things that you would want to do and unfortunately those extensions were all incompatible.


A famous rant ignoring Pascal dialects, while C still lacked a standard and was a bunch of dialects all over the place.


I've never seen a C dialect, other than not supporting bit fields. The language was usually the same although the libraries were all over the place.


Then you haven't used any C compilers during the CP/M, 8-bit and 16-bit days.

Here are two, RatC, Small-C.

Additionally what do you think GCC C is, in regards to K&R C and ANSI C?


Oh, by 73. Impressive. I didn't know that.

Still, that was a much more powerful machine than the ones people wrote C for. And when the cheap segment of those "fridge computers" became powerful enough to run whatever you wanted to put on it, people started using small workstations. And when those workstations became powerful enough, we got PCs.

It's only when the PCs got powerful enough that we could start to ignore the compiler demands and go with whatever language fit us better. (The later reductions all were based on cross-compiling, so they don't matter here.)


Go check the capabilities of systems written in high level languages like ESPOL and NEWP in 1961.

Let not say that a 1961 system is more powerful than a PDP-11.


Mesa not Pascal.


Yes apologies, I misremembered. Mesa is definitely a decendant of Pascal though, more like Modula-2. Very ahead of its time.


Sorry for being a bit pedantic, it predates Modula-2, Niklaus Wirth got in contact with it during his first sabbatical at Xerox PARC, after returning to ETHZ Modula was born, and shortly thereafter Modula-2 and Lillth Workstation, with the OS written in Modula-2.

Several years later, a similar encounter with Mesa's evolution, Mesa/Cedar, gave birth to Oberon.

As for being ahead of its time, very much indeed.

Many people relate to Smalltalk and Interlisp-D, and are unaware of strong typed systems programming at Xerox, with a tooling that only decades cater came to be.

Mesa authors were quite adamant that in order to succeed, Mesa had to support comparable tooling to Smalltalk and Interlisp-D environments.


The fact that those things existed does not refute the GP's point. Many companies well into the 80s at least had programmers and codebases that had to be weaned away from assembly, despite the fact that higher-level alternatives had existed for a while. That's just empirical fact, regardless of the reasons. I myself started programming in 68K assembly on the original Mac because I couldn't afford a compiler and interpreted languages (there were at least Lisp and Forth) couldn't be used to write things like desk accessories. Remember, gcc wasn't always something one could just assume for any given platform. The original statement is correct; yours is not, because of limited perspective.


How many Lisp Machines did Texas Instruments sell? (I mean outside of that one lab at UT Austin.) :-)

I'm talking about applications developers who came out of the small mainframe/minicomputer world of the 70s and into the workstation/microcomputer world of the 80s. They started with assembly, and prying them off of it was as hard as convincing engineers to use FORTRAN. Convincing those application developers to use a garbage collected language, Java, was hard in the 90s.


Only inside Bell Labs, the world outside was enjoying high level systems programming languages since 1958 with the introduction of JOVIAL.


Where by "the world outside" you mean DoD contractors?


Everyone that was using JOVIAL, ESPOL, PL/I, PL/S, PL/X, PL.8, PL/M, Mesa, ALGOL 60 and 68 based languages, Modula-2, Solo Pascal, Concurrent Pascal, Lisp, BASIC compilers,....

Burroughs, IBM, DEC, Xerox, ETHZ, MIT, Intel,....


The sheer amount of software that you're likely using right now that's written in C would seem to contradict your claim.


The market share of C in desktop, server, and other "enterprise" applications has drastically dropped since the 90s. Nowadays it's quite rare to see C chosen for new projects where it isn't required. In fact, despite of how pervasive it is, a huge amount of C code cannot be built on a pure C toolchain, i.e. C is essentially like Fortran.


It still hasn't entirely sunk in. :-)

Plus, legacy code.


Why is C inappropriate for applications work?


The technical arguments should be obvious, e.g. spending ones complexity budget on manual memory management and avoiding footguns. But one amusing anecdote is that the open source GNOME project founders were so traumatized by their experience building complex GUI apps in C (already a dubious idea 20 years ago), that they started building a C# compiler and the Mono project was born.


In 2023, you'll be hard pushed to find a GNOME application actually using C# and Mono. The vast majority of GNOME components are written in C, with a large number of them written in Vala and some in Rust and JavaScript.


It was indeed a big yak to shave. Cloning a Microsoft technology probably wasn't a good idea for OSS adoption either.


I'm sure; there are still threads that come up from time to time in GNU circles about whether Mono is actually FOSS (TLDR: it is), in which those of a more paranoid personality allege an EEE attempt by Microsoft. If this were the case, Microsoft clearly did not succeed even without Java as a legitimate competitor! Interestingly, Mono has had most success from its use in the Unity game engine.


You don't spend "complexity budget" on manual memory management. Manual memory management is much simpler and easier than trying to manage complex arrangements where things are being allocated and deallocated all over the place, all the time.

Gnome programs are largely still written in C, but it isn't actually C. Not really. It's glib, which is almost a whole new language written on top of C. It's its own little world and it's not a good world. For example, it calls abort() whenever it fails to allocate memory.


> You don't spend "complexity budget" on manual memory management.

Of course you do, especially in applications where there is no benefit to be gained versus using a GC. This is why Java was such a huge success, despite offering very little else over C++ other than GC (I think OCaml is a much better example of a GC language). Consider that an entire book has been written on such details as move semantics. For most GUI apps, a GC or other automatic memory management has proved fine.

> Gnome programs are largely still written in C, but it isn't actually C.

It's still C whatever libraries are used. It's still manual memory management with footguns. This is why they created "Vala".


>Of course you do, especially in applications where there is no benefit to be gained versus using a GC.

I am afraid you misunderstood my comment. Of course C isn't good for writing code that calls 'malloc' millions of times per second. That's just bad code, which you can write in any language. You should not be dynamically allocating memory all over the place. If you have sane allocation strategies, then having to 'manually' manage them is completely fine.

>This is why Java was such a huge success, despite offering very little else over C++ other than GC (I think OCaml is a much better example of a GC language).

Java is the quintessential example of a language that was made popular through marketing and hype.

Getting reasonable performance out of the JVM requires far greater expertise than doing the same in C. The JVM's GC is insanely complicated and has a million different knobs which can drastically affect performance. Or they can apparently do nothing. Until you turn other knobs...

>Consider that an entire book has been written on such details as move semantics.

There are no 'move semantics' in C. Yet another way in which it is superior to C++. C++, which is a horrible language designed by nerds who enjoy standardese more than they love their own children, has 'move semantics' bolted on with an arcane 'rvalue references' mechanism that gets more complicated in every version of the standard.

>For most GUI apps, a GC or other automatic memory management has proved fine.

Retained-mode UI is something that is only reasonable with automatic memory management. The way that lifetimes of objects works in those APIs really does require a garbage collector because it's all so dynamic.

The immediate-mode UI approach, which has many other benefits as well, works perfectly fine with manual memory management.

>It's still C whatever libraries are used. It's still manual memory management with footguns. This is why they created "Vala".

Objective-C is more like C than "glib" C is. As the suckless guys said:

"glib - implements C++ STL on top of C (because C++ sucks so much, let's reinvent it!), adding lots of useless data types for "portability" and "readability" reasons. even worse, it is not possible to write robust applications using glib, since it aborts in out-of-memory situations. glib usage is required to write gtk+ and gnome applications, but is also used when common functionality is needed (e.g. hashlists, base64 decoder, etc). it is not suited at all for static linking due to its huge size and the authors explicitly state that "static linking is not supported"."


Immediate mode UI? Not having to allocate every where? Memory allocation that tells you when you run out of memory? Static linking?

What systems are you thinking about?


Try writing a generic, reusable data structure in C. It's agony.


You don’t need to write generic and reusable data structures in C. You write a data structure suited for the problem at hand, which often means that it’s going to be simpler and more performant because of the known constraints around it.


It could be more performant because of the known constraints around it, or it could be an ad-hoc, informally-specified, bug-ridden, slow implementation of half of some data structure. At least with a generic and resuable data structure you have a known reliable building block. Again, performance over safety.


> It could be more performant

No, it almost always is. The designers of a generic library can't anticipate the use case, so can't make appropriate tradeoffs.

For example, compare `std::unordered_map` to any well written C hash table. The vast majority of hash tables will never have individual items removed from them, but a significant amount of complexity and performance is lost to this feature.


No, it almost always is.

A library author can spend ridiculous amounts of time refining and optimizing their implementations, far more than any application programmer could afford or justify.

The designers of a generic library can't anticipate the use case, so can't make appropriate tradeoffs.

This is definitely not true. Take C++ for instance, not only is it possible to specialize generic code for particular types, but it's absolutely routine to do so. Furthermore, with all sorts of C++ template features (type traits, SFINAE, CRTP, Concepts, etc) even user-defined types can be specialized, in fact it's possible to provide users with all sorts of dials and knobs to customize the behavior of generic code for their own use case. This functionality is not just a quality-of-life improvement for library users, it has profound implications for performance portability.

For example, compare `std::unordered_map` to any well written C hash table.

std::unordered_map is a strawman. There are a plethora of generic C++ hash tables which would match, if not soundly outperform, their C counterpart. Also, even if we blindly accepted your claim, then how do you explain qsort often being beaten by std::sort or printf and its variants being crushed by libfmt? What about the fact that Eigen is a better linear algebra library than any alternative written in C?


> A library author can spend ridiculous amounts of time

That's true. But simply having knowledge of the goal and a few simplifying assumptions can beat all the optimization in the world. In other words, a polished sub-optimal approach isn't as good as just having a better approach. `std::unordered_map` is heavily optimized, but can't make any tradeoffs because it's a general tool.

> plethora of generic C++ hash tables which would match, if not soundly outperform, their C counterpart.

Post one.

> not only is it possible to specialize generic code for particular types, but it's absolutely routine to do so.

Yep, it can do type base specialization, not application based specialization though. That requires a programmer.

> how do you explain qsort often being beaten by std::sort

a standard library C function often cannot be inlined to remove the comparison function pointer call, whereas std::sort trivially can.

If you wrote one yourself for a particular problem, it would not have this issue. This is actually a great example of where C excels because the choice of sorting algorithm so much depends on the kind of data you are sorting.

Let me be clear about my claim: tailor made solutions to each problem will almost always be faster than generic solutions. Do you really disagree with that? If you want to argue that maybe it's not productive to work that way, that's a different argument.


"If you wrote one yourself for a particular problem, it would not have this issue. This is actually a great example of where C excels because the choice of sorting algorithm so much depends on the kind of data you are sorting."

Did I get it right that you argue for re-implementation in every of your apps of some sorting algorithm which is most fit to your data?

Why not use instead a generic library implementing a particular sorting algorithm parameterized by the data type and maybe by some policies specifying minor variations of the algorithm?

"Let me be clear about my claim: tailor made solutions to each problem will almost always be faster than generic solutions. Do you really disagree with that?"

I do. I don't think even you invent a special sorting algorithm for each of your applications that need sorting.


> post one

Abseil or folly both have optimized hashtables, I believe. Rust's standard HashMap follows the same design. It involves SIMD to look for a bucket whose hash matches the query's so redoing it in C every time you need a hash table will be quite impractical.


Let me be clear about my claim: tailor made solutions to each problem will almost always be faster than generic solutions. Do you really disagree with that?

I disagree with it in the sense that I disagree with the statement "A human will always be able to write the same or better assembly than a C compiler, because humans can learn the compiler's tricks and make optimizations which the compiler is not allowed to make." It's a true statement, but it's so detached and irrelevant that it hardly matters.

Generic code has proven itself time and time again, even Go caved in and supported it.


The assembly example isn't a good comparison. We have compilers that can generate a lot of assembly tricks most programmers wouldn't write. We don't have a way compiler that can analyze the logical constraints of a programming problem and simplify the data structures in the library.

> Generic code has proven itself time and time again, even Go caved in and supported it.

I'm not saying anything against the language feature generics. There is plenty of use for them even in a self contained code base.


Getting generic data structures that are more efficient than his specialized C data structures is exactly what happened to Bryan Cantrill when he ported a carefully optimized C program to naive Rust.

> Yes, you read that correctly: my naive Rust was ~32% faster than my carefully implemented C.[0]

> As a result, this code spends all of its time constantly updating an efficient data structure to be able to make this decision. For the C version, this is a binary search tree (an AVL tree), but Rust (interestingly) doesn’t offer a binary search tree — and it is instead implemented with a BTreeSet, which implements a B-tree. B-trees are common when dealing with on-disk state, where the cost of loading a node contained in a disk block is much, much less than the cost of searching that node for a desired datum, but they are less common as a replacement for an in-memory BST[1]

> So, where does all of this leave us? Certainly, Rust’s foundational data structures perform very well. Indeed, it might be tempting to conclude that, because a significant fraction of the delta here is the difference in data structures (i.e., BST vs. B-tree), the difference in language (i.e., C vs. Rust) doesn’t matter at all.[1]

> Implementing a B-tree this way, however, would be a mess. The value of a B-tree is in the contiguity of nodes — that is, it is the allocation that is a core part of the win of the data structure. I’m sure it isn’t impossible to implement an intrusive B-tree in C, but it would require so much more caller cooperation (and therefore a more complicated and more error-prone interface) that I do imagine that it would have you questioning life choices quite a bit along the way. (After all, a B-tree is a win — but it’s a constant-time win.)[1]

> All of this adds up to the existential win of Rust: powerful abstractions without sacrificing performance.[1]

[0]: http://dtrace.org/blogs/bmc/2018/09/18/falling-in-love-with-...

[1]: http://dtrace.org/blogs/bmc/2018/09/28/the-relative-performa...


Yes a library author can spend a lot of time refining and optimising their generic data structure, but can never escape that it is generic.

No amount of optimisation will make a hash table designed for items to be removed competitive with one where items do not need to be removed.

>Take C++ for instance, not only is it possible to specialize generic code for particular types, but it's absolutely routine to do so.

So it's not a generic data structure, then. When you specialise a template, you essentially write a concrete data structure for a particular type. Rather than writing a big generic data structure that's inefficient then specialise it to the particular type, it is much easier just to write that specialised data structure in the first place.

>std::unordered_map is a strawman. There are a plethora of generic C++ hash tables which would match, if not soundly outperform, their C counterpart.

How is it a strawman? It's in the standard library.

>Also, even if we blindly accepted your claim, then how do you explain qsort often being beaten by std::sort or printf and its variants being crushed by libfmt?

printf is on the order of 50 years old. libfmt as written about 5 minutes ago. Do you take into account in your comparison the many more years in which printf has been useful? Do you take into account the amount of time it takes printf to compile vs a huge C++ library like libfmt?

Do you take into account all the code that has been slowed down by C++ programmers writing bad code and assuming a sufficiently smart compiler will inline everything for them? Do you take into account all the horrifically slow iostreams code out there?

qsort and std::sort do completely different things. Comparing them is absurd. qsort takes the size and comparison operator at runtime. std::sort requires them to be specified at compile times. I frequently use qsort in a way that you simply could not use std::sort, because those things are runtime-variable.

The proper comparison to std::sort is the implementation of a sorting algorithm written in C, specialised to the code it was written to work with. Then you can debate 'is it worth using this for the minor performance gain' etc. But comparing it to qsort is inane and demonstrates you don't even know what the two functions do.


"So it's not a generic data structure, then."

In generic C++ code, you can specialize a part of the generic algorithm to tune it to a particular use case. Usually it takes the form of a small class template which can be specialized for a particular type and is used by the generic algorithm operating on that type. This class template is called trait, policy or strategy depending on the way it is used.

"qsort and std::sort do completely different things. Comparing them is absurd. qsort takes the size and comparison operator at runtime. std::sort requires them to be specified at compile times."

Not at all, you can pass a function pointer to std::sort just as well if you need to [0]. Most of the time you don't need this indirection but in C you are stuck with it unless you copy-paste-edit qsort.

[0] https://godbolt.org/z/M1v8azojT


For one example,

https://github.com/tmmcguire/rust-toys/blob/master/alternati...

is a program that mmap's an anagram dictionary file and builds a fast-n-dirty hashmap dictionary over the file data. It took about an afternoon to write and was pretty decent.

https://maniagnosis.crsr.net/2014/08/letterpress-cheating-in...


Unordered map is a known design issue, on the same order as std::vector<bool>. C doesn't even have std::vector<int>>


That must be why every C project has its own string, dynamic array, hashtable, etc. It's definitely more performant to have several different implementations of the same thing fighting for icache


It's a nice thought, but in practice C binaries are orders of magnitude smaller than any other language. Also compilers make that tradeoff for inlining all the time.


To be fair, most C++ projects (outside of embedded and games) use STL for string/dynamic array/hashtable, and while that standardization is certainly convenient I'm not sure STL is generally faster than most hand written C data structures, even with the duplicated code.


Entirely true. Unfortunately, that comes at a cost of programming time and effort and a decent amount of risk.


Agony might be a bit much, and i'm not trying to defend C because this is one of the strong reasons for using C++/STL...

But, generally one should just reach for a library first before doing a complex data structure regardless of language. And for example, the linux kernel does a fine job of doing some fairly complex data structures in a reusable way with little more than the macro processor and sometimes a support module. In a way the ease of writing linked lists/etc are why there are so many differing implementations. So, if your application is GPL, just steal the kernel functions, they are reasonably well optimized, and tend to be largely debugged, are standalone, etc.


Try doing the same in Go and it's even worse.


That's writing some other language in C syntax. You use arrays or you write a specialized version for the use case.


What does that have to do with writing an application?


The short answer is that C makes it easy to do things that you probably shouldn't ought to be doing. (The the answer to the opposite, "what work is C appropriate for?" is systems and embedded stuff, where doing-things-you-probably-shouldn't is largely the name of the game.)

Another answer is that C doesn't have much in the way of guardrails; in a world where programmer time is much more expensive than machine time, guardrails make programming much faster.


Every language makes it easy to do things that you probably shouldn't be doing. I don't know how many times I've seen naive accidentally-quadratic string handling code in high level languages, for example.

Your answer is just buzzword after buzzword. What makes C bad for writing applications?


SIGSEGV. Or those times when you really wish you had gotten a SIGSEGV.


Write correct code.


When talking about hot path optimizations though assembly is still a good alternative.


It's quite clear to me that C is appropriate for applications work. Most applications I use on a daily basis are written in C (and were written after the 90s), and most of the rest are written in C++. What makes you think you can't write applications in C?


You obviously can, but doing so takes a considerable effort. Not making that effort leads to lots and lots of bugs.


> When I'm working in C, I'm frequently watching the assembly language output closely, making sure that I'm getting the optimizations I expect. I frequently find missed optimization bugs in compilers.

Do you repeat this exercise when you upgrade or change compilers?


For hot functions that were carefully optimized, I have test harnesses that run them and measure their cycle counts, which are asserted to be equal to the cycle count after hand-optimization (+/- some small margin if there is non-determinism). This is sensitive to regressions while having a very low false-positive rate due to being (approximately) deterministic.


The easy way to achieve that is to freeze the assembly once it is generated and to keep the C inlined around as documentation, as well as a way to regenerate the whole thing should it come to that (and then, indeed you'll need to audit the diff to make sure the code generator didn't just optimize away your carefully unrolled loops).


Is there any tooling for that or are you talking hypothetically?


Just standard unix tooling, what else do you need? It's as powerful a set of text manipulation tools as you could wish for.


If the assembly is that important I'd find a way to put it into my unit tests. Create a "golden" for the assembly and it should trigger a diff if it changes.


Read your article (and cloudfare's) and as someone who uses musttail heavily I don't understand the hype, as you mentioned in your blog: you can get tail call optimizations with (-O2), musttail just gives you that guarantee which is nice, but the article makes it sound as if it unlocks something that was not possible before and interpreters will greatly benefit from it, but it's more reasonable to ask your user to compile with optimizations on than it is to ask them to use a compiler that supports musttail (gcc doesn't). Moreover, musstail has a lot of limitations it would be hard to use in more complex interpreter loops


Ordinarily you're at the whim of the optimizer whether calls in tail position are made in a way that grows the stack or keeps it constant. musttail guarantees that those calls can and are made in a way that does not let the stack grow unbounded, even without other conventional -On optimization. This makes the threaded interpreter design pattern safe from stack overflow, where it used to be you'd have to make sure you were optimizing and looking carefully at the output assembly.

If nothing else musttail aids testing and debugging. Unoptimized code uses a lot more stack, both because it hasn't spent the time to assign values to registers, but people often debug unoptimized code because having values live on the stack makes debugging easier. The combination of unoptimized code and calls in tail position not made in a way that keeps stack size constant means you hit stack overflow super easily. musttail means that problem is at least localized to the maximum stack use of each function, which is typically not a problem for small-step interpreters. Alternatives to musttail generally involve detecting somehow whether or not enough optimization was enabled and switching to a safer, slower interpreter if not positive... but that just means you're debug and optimized builds work totally differently, not at all ideal!


I do buy it in case of C++, but C? You can’t even have a proper data structure with it due to it being so inexpressive - and surely data structures are one of the most significant factors of performance.

Besides, the industry hands down chose C++ for the truly performance-oriented workloads.


Yeah I don't use c any longer unless I absolutely have to. I mostly do embedded so that happens sometimes. I much prefer c++11 on up.


I just want to thank you for that blog post! I’m currently writing a little bytecode vm for something and this really improves my code! Luckily I’m currently already only targeting clang, so I’m ok with using a non-portable feature.


With modern CPUs, that kind of hand-tuned assembly gets rarer and rarer. Pipelines and caching and branch prediction make it hard to know what's actually going to be faster. And even when you do know enough about the CPU, you only know that CPU -- sometimes only one model of a CPU.

There's still a niche for it, but it's tiny and it keeps getting tinier.


That... really isn't all that true. I find that myth being mostly perpetuated by people who don't do any kind of performance work or have any understanding just how terribly unperformant most code out there is.


That is my observation as a compiler writer.


Of course the problem is that the vast corpus of legacy C code was not written with such aggressive compilers in mind.


> In effect, Clang has noticed the uninitialized variable and chosen not to report the error to the user but instead to pretend i is always initialized above 10, making the loop disappear.

No. This is what I call the "portable assembler"-understanding of undefined behavior and it is entirely false. Clang does not need to pretend that i is initialized with a value larger than 10, there is no requirement in the C standard that undefined behavior has to be explainable by looking at it like a portable assembler. Clang is free to produce whatever output it wants because the behavior of that code is literally _undefined_.

Also, compilers don't reason about code the same way humans do. They apply a large number of small transformations, each of these transformations is very reasonable and it is their combination that results in "absurd" optimization results.

I agree that undefined behavior is a silly concept but that's the fault of the standard, not of compilers. Also, several projects existed that aimed to fully define undefined behavior and produce a compiler for this fully defined C, none of them successful.


1. The people creating the C standard were adamant that just following the standard was not sufficient to produce a "fit-for-purpose" compiler. This was intentional.

2. They were also adamant that being a "portable assembler" with predictable, machine-level semantics was an explicit goal of the standard.

3. The C standard actually does have text giving a list of acceptable behaviours for a compiler and "silently remove the code" is not in that list. And this text used to be normative, but was later made non-normative.

So I blame the people who messed with the standard, and guess who those people were?


> The C standard actually does have text giving a list of acceptable behaviours for a compiler

The exact opposite is explicitly stated in the standard (from C11 section 3.4.3):

    undefined behavior

    behavior, upon use of a nonportable or erroneous program
    construct or of erroneous data, for which this International
    Standard imposes no requirements 
The standard then lists some examples of undefined behavior, and it's true that "silently removing the code" is not in the list. Still, I think it's pretty clear that it's acceptable behavior, since the standard just stated it imposes no requirements.


"Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message)."

Note "permissible" and "ranges from ... to".

Again, this used to be normative in the original ANSI standard. It was changed in later versions to no longer be normative. Exactly as I wrote.


Which is logically equivalent to imposing no requirements. "ignoring the situation completely with unpredictable results" does not meaningfully constrain the possible behaviors.


That turns out not to be the case.

“Ignoring” is not “taking action based on”

Ignoring is, for example, ignoring the fact that an array access is out of bounds and performing the array access.

Ignoring is not noticing that there is undefined behavior and removing the access and the entire loop that contains the access. Or a safety check.


> ignoring the fact that an array access is out of bounds and performing the array access

This doesn't mean anything. The standard imposes no requirements that an array access is translated into a memory access by the compiler. And that's a good thing because it enables optimizations that can be critical to achieve decent performance.

Assume for example that the compiler is able to prove (after inlining) that all elements of an array are equal to zero. This array is then used in a loop where in each iteration an index into this array is calculated and the corresponding array element is added to an accumulator. If the standard were to impose a rule that array accesses have to correspond to memory accesses, this code could not be optimized because one of the computed indices might lie outside of the array and the compiler is then forced to produce assembly that performs the memory access.

With UB, however, the compiler is allowed to assume that all accesses to the array are valid and will be able to completely remove the (very cache-unfriendly) loop.


So, generally, optimizations shouldn't be allowed? Dead code removal shouldn't be allowed? Substituting constants shouldn't be allowed? Because "ignoring" UB is pretty much what the compiler did, and then let its optimization passes run to their conclusion.

> ignoring the fact that an array access is out of bounds and performing the array access

The idea that this is even meaningful is precisely the "portable assembler" misconception.


Except the misconception is that it is not.

"Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler:”

https://www.open-std.org/JTC1/SC22/WG14/www/docs/n897.pdf

p10, line 39

"C code can be portable. "

line 30


I think you’re misinterpreting. Maybe it’s clearer if we elide the relative subclause:

  undefined behavior

  behavior … for which this International Standard imposes no requirements
That is an obvious definition of what is undefined behavior. It’s not giving license to do whatever. That said the ship has sailed and what implementors do obviously matters more than what the standard says.


If there are no requirements on what it's doing, how is that not a license to do whatever?

There is not even a requirement that a theoretical program that contains e.g. only preceding code, would still maintain any invariants. So I don't see what an instance of "whatever" that violates "no requirements" would look like.


Read with an either an atypical or malicious degree of literalism the standard supports ransomwaring your machine and indeed your entire network in the face of undefined behavior.

What it actually means is the standard doesn’t require doing the impossible, not that a program error is license to generate malicious code. That’s why it gives a (now formerly) normative list of possible approaches.


No maliciousness is required. "1 + 2" produces "3" only because the standard defines it so -- it is therefore required that 1 + 2 produce 3. When the standard does not say what to do, there is no requirement.

You said "it's not a license to do whatever" which is only true because it isn't a license. The code can still do whatever, because there is no requirement that it do anything other than "whatever".

You seem to be arguing from a position that the code has meaning unless somehow UB gets involved, but the reality is that these bits and bytes don't mean anything until the spec tells us what meaning it has.


> until the spec tells us what meaning it has.

That's a commonly held belief, but wrong. My first C compiler predated the ANSI C standard. Yet the code very much had meaning, just not in terms of the C standard.

The C standard defines the minimum a C compiler has to do to be in compliance with the C standard. It is not a complete definition, and this is intentional.


> > until the spec tells us what meaning it has.

> That's a commonly held belief, but wrong. My first C compiler predated the ANSI C standard. Yet the code very much had meaning, just not in terms of the C standard.

Much like /* plain English comments */ have meaning, just not in terms of the C standard.

Talking about prestandard C isn't relevant because we're talking about what a standards conforming compiler can do with undefined behaviour per the text of the standard. I do not find your counterexample compelling.

> The C standard defines the minimum a C compiler has to do to be in compliance with the C standard.

The standard does not define what a compiler has to do. Instead, it defines the C abstract machine as well as syntax attached to behaviour in terms of that abstract machine. When a program which is restricted to that syntax is executed, it must have the effects (upon the C abstract machine*) as specified in the standard.

Sometimes the standard doesn't specify anything for a given syntax in a given abstract machine state, and sometimes the standard explicitly specifies that "the behavior is undefined".

> It is not a complete definition, and this is intentional.

I'm not sure what you mean. Compiler extensions exist (and are generally regarded as a necessary evil), is that what you're referring to? Or are you referring to implementation-defined things (all of which are explicitly called out in the standard)? Or something else? A citation, if you can?

* Notably the C language says nothing about the real machine. Supposing you have an I/O port at address 0x9000, accessing "*((char*)0x9000) = 1;" does not need to access memory address 0x9000 at all. Of course, compiler engineers are not silly and implemented it as accessing 0x9000 on the underlying machine. Similarly the standard committee is careful to ensure that the language it defines can be implemented without needed the overhead of a VM or interpreter, though sometimes they get that wrong. (For example, see the issues combining multiple alloca() and C99's "int foo[*];" in the same function.)*


> we're talking about what a standards conforming compiler can do

No. You may be, I am not.

I am talking about what a reasonable C compiler can do, and what the ANSI standards committee intended.

Adhering to the ANSI/ISO standard is at best a necessary condition for producing a useful C compiler. It is most definitely not a sufficient condition. As I've pointed out many times before, this was intentional.

And the existence of pre-standard C compilers that worked is, of course, clear evidence that it is not a necessary condition either. Or at least was not.

The C standard leaves a lot "undefined" or "implementation defined" that is actually well-defined on a concrete machine the compiled code runs on. If you seriously think the intention of all this was to allow demons to fly out of nostrils or for compilers to start mining bitcoins, which is all perfectly legal by the standard, well, I don't really know what to say.

Particularly because the creators of the standards clearly stated so. In the very standard itself. Now they didn't make that language mandatory, so yeah, you can make a "standards compliant" compiler that violates that intent. But it will be a sucky compiler.

C ≠ The ANSI/ISO C standard.


> The C standard leaves a lot "undefined" or "implementation defined" that is actually well-defined on a concrete machine the compiled code runs on.

This assumes there is a one-to-one correspondence between C constructs and machine code, which isn't true. C isn't a "portable assembler" and compilers are luckily able to choose whatever machine code they think will perform best under the assumption that there is no undefined behavior.

> But it will be a sucky compiler.

I don't think gcc or clang are sucky compilers at all. In fact, their ability to aggressively optimize valid code is extremely helpful for producing high-performance programs.

> C ≠ The ANSI/ISO C standard

So C is defined neither by the standard _nor_ by existing implementations and is instead defined to be whatever your headcanon is?


> This assumes there is a one-to-one correspondence between C constructs and machine code

No it does not.

> which isn't true.

Actually, it is true in a vast majority of cases.

> C isn't a "portable assembler"

Not sure why people keep repeating this despite it being so obviously and patently untrue:

"Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler:”

https://www.open-std.org/JTC1/SC22/WG14/www/docs/n897.pdf

p10, line 39

"C code can be portable. "

line 30


> And the existence of pre-standard C compilers that worked is, of course, clear evidence that it is not a necessary condition either. Or at least was not.

The difference is that those compilers didn't have a spec to follow or disobey anyways. The can always just say "whatever happened is right" and you could argue that the compiler did something _unhelpful_ but the compiler broke no promise because it made no promise.

> Particularly because the creators of the standards clearly stated so. In the very standard itself.

They clearly stated the opposite:

""" * Undefined behavior --- behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message) """

from ANSI C89 (§1.6).

The compiler may ignore the situation completely leading to unpredictable results. If the compiler ignores a situation, such as a signed integer overflow, and that leads to unpredictable results, such as "next time we see an if-statement, execute both the if-block and the else-block", then that's 100% conforming. That's exactly what was intended. This sort of thing can happen, suppose the evaluation of an if-condition ends up in some CPU flags, and the compiler knows the two flags are exclusive because the only way they wouldn't be is if the program had a signed integer overflow.

FWIW, GCC 1.17, released in January 1988, would launch 'nethack' during the compile if it detected a #pragma that it didn't understand. The idea of interpreting it as "anything can happen" is neither incorrect nor new. (Technically in this case, an unknown pragma has implementation-defined behavior which is the same as UB plus a requirement that the behavior must be documented in the compiler's manual.) It was a bad idea though and they removed it because a compiler that does that is, well, not good. We call this property "quality of implementation", but it wasn't a correctness issue.


> didn't have a spec to follow or disobey anyways.

Exactly. Yet they were still C compilers. So the idea that "being a C compiler" is the same as "follows the spec" is clearly nonsense. You can follow the spec and not be a usable C compiler, and you can be a usable C compiler and not follow the spec.

And the C spec not being complete was intentional, because otherwise too many already existing C compiler would have not had a chance of becoming ANSI compliant, and thus the spec would have been meaningless.

> They clearly stated the opposite:

> [..]

> Permissible undefined behavior ranges from ignoring ...

Very clearly gives a list of "permissible behaviors". Now if you believe that one of those options is "do anything you please" when it both clearly doesn't say that, it just "ignore with unpredictable results" AND it doesn't make logical sense, not sure how to help you.

(The two other options for permissible behavior are clearly completely redundant if one of them permits you to do anything you want whatsoever).

> "next time we see an if-statement, execute both the if-block and the else-block", then that's 100% conforming

It is "conforming" to the spec that has made that part non-binding. It does not conform to "ignore the situation", because the unpredictable behavior mentioned in the spec is that of the environment, not the compiler.


> Very clearly gives a list of "permissible behaviors"

A statement like "we have products ranging from cake decorating to peanut crocheting to jousting lances" does not mean that this is a list of the only three types of items in the store. This construction is called a "false range" in English. When you have a range of something that does not have an order, it means "varied things, left unspecified". It's very clearly not supposed to be an exhaustive list, merely a few examples.

So the standard lists three examples. The first is that the runtime behaviour of the program may do any-unpredictable-thing. The second is that the compiler may 'behave in a documented manner' and maybe issue an error. They wanted these two examples because they didn't want any misunderstanding that UB was limited to what could be shown UB statically at compile time, nor that it was limited to only having effects on the program at runtime.

I'm not honestly sure why they bothered adding the third "oh, the compiler or the program terminates with an error message". I could speculate that this is what they wish would actually happen, and including it in the list improves the chance of that.

> unpredictable behavior mentioned in the spec is that of the environment, not the compiler.

I don't believe that's correct -- I think it appertains to the program not the environment or the compiler -- but it doesn't matter either way. The environment is responsible for supporting execution of the program, so if it's unpredictable then it follows that any unpredictable things can happen to your program -- it would be like trying to run on a CPU that's experiencing physical failures.


> A statement like "we have products ranging from cake decorating to ...

If you have a statement like that, that is likely true. However, this is not a statement like that.

1. It gives a range of 3 permissible options. If one of these "permissible" options is "you can do anything", what are the other 2 options doing on that list?

2. Even worse for your interpretation, the very word permissible only makes sense if there are things that are not permissible. So once again, "you can do anything" makes no sense.

Both of these are non-sensical. Now had they actually written "you can do anything", this would be easy: they wrote nonsense. But they didn't write "you can do anything". What they actually wrote "ignoring the situation completely with unpredictable results". That this somehow (how?) means "I can do anything" is purely your interpretation. And your interpretation leads to nonsense. So clearly your interpretation is wrong, particularly when there is an alternative interpretation, that does not lead to nonsense.

In addition, the "interpretation" that does not lead to nonsense is the one that takes the words literally. "Ignore the situation". Not "act on the situation and then do anything I damn well please".

3. Even a heterogenous range restricts. Yes, "we have products ranging from cake decorating to peanut crocheting to jousting lances" does not mean that those are the only items in the store. But even with your somewhat odd choice of items, if you go into that store and ask for an aircraft carrier, you will get odd looks, because the range of items mentioned clearly restricts the items they stock to non-aircraft carriers.

> it would be like trying to run on a CPU that's experiencing physical failures.

No, it is like reading beyond the range of an array: the machine will attempt to read fron that location, that may return a value, we don't know what value, or it may signal a fault. What it does is not defined by the standard, hence undefined behavior.

It's not that* hard.


> 1. It gives a range of 3 permissible options. If one of these "permissible" options is "you can do anything", what are the other 2 options doing on that list?

Before the list it clearly says, "behavior, [... when UB occurs elided ...], for which the Standard imposes no requirements."

How can it both impose no requirements, yet simultaneously impose a requirement that it come out of that list of three options?

> That this somehow (how?) means "I can do anything" is purely your interpretation.

Purely mine?

* https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...

* https://blog.regehr.org/archives/213

* https://blog.llvm.org/2011/05/what-every-c-programmer-should...

* https://stackoverflow.com/questions/32132574/does-undefined-...

* https://en.wikipedia.org/wiki/Undefined_behavior#Examples_in...

* https://www.youtube.com/watch?v=ehyHyAIa5so

I must have an amazing number of sock puppet accounts.


> Purely mine?

Surely you've observed that bad ideas and even errors are not immune to spread.


That is also true.

And it turns out that the opinion "we can do anything we want" is based on the non-normativity of the section we were debating about, not on misinterpreting it.


You started the discussion by saying that we can understand what the committee meant by simply reading the text in the standard that they wrote. Remember? Why then does it matter if it's non-normative text (aka. informative)? If it's non-normative it's there to explain what they were thinking.

I double-checked my copy of the C89 standard (draft), and as far as I can tell the text is normative. Non-normative text includes footnotes and appendices and editor's notes in square brackets, but I didn't see any of those involved. Sometimes there's a section that states 'the following text is non-normative' but I didn't see any of that either. Why do you think it's non-normative?

If that text is non-normative, where is the normative definition of undefined behavior?

Regardless, I think I understand what's happened here. If I may? You've read the text and you've repeatedly commented that we have to discard the "it allows you to do anything" interpretation because that interpretation is nonsense.

I'll stipulate that when reading English, we regularly have to discard nonsense interpretations. "Out the window, the mountains looked over a beautiful lake", we discard the interpretation where the mountains have eyes and are looking out a window. This is a normal part of reading English.

I posit that you believe that the "anything can happen" interpretation is impossible so strongly that no possible wording would ever lead you to the interpretation intended by the committee.

So instead, how about I explain why the committee chose to define UB this way? How it isn't nonsense?

It's so that a C compiler could use the plain "add" instruction for an addition in C across all the crazy CPU designs. Here, let me make a simple example, this isn't a real CPU. Suppose the CPU has a status register which contains "signed overflow" as one of its bits. This bit is set or cleared when you do an ALU operation, including ADD. The same status register is reused when doing a memory operation, but that bit is reused to indicate whether you're going to the first or second bank of memory. The CPU authors think that this is great, if you're doing pointer math and you add a pointer and an integer then you transparently overflow from the first bank of memory into the second and it looks like a contiguous address space! The system integrator (or, motherboard designer, roughly) decides to use bank selection for a different purpose. There's no way the first bank of memory would ever be completely full (nobody buys or sells that much RAM) so they put the RAM on bank 0 and the I/O ports on bank 1. Their system has memory-mapped I/O! So far, everybody's done something that seems sensible to them. Third, the C programmer writes "*p = x + y;". What happens if x + y are signed ints and the addition overflows? The signed overflow bit gets set, then the STORE instruction accesses the I/O ports instead of the memory!

Is the C compiler buggy for not inserting an extra instruction that clears the sign bit of the register? The committee intentionally decided that no, this is how signed integers should work and if you want potentially slower integers with guaranteed semantics that you should use unsigned integers instead. I think that attaching well-definedness to signed/unsigned was a bad move, but this is what they did. (And it does what you've said you want in other comments: the + in C becomes whatever the machine ADD instruction does!)

The C committee invented undefined behavior as a way to ensure that the compiler really could ignore the situation. (FWIW, real CPU ISAs back then had all kinds of interesting ideas. We hadn't yet agreed that bytes are 8 bits. Or that we should use 2's complement. Some designers looked at division by zero as an invalid operations and thought that this was an excellent feature that should be brought over to other operations like add and mul, hence "trap values" in the C standard.)

C and Unix were a commercial success (CPU firms could skip writing their own OS every time), and starting then, CPU designers made ISAs where C could be easily lowered to efficient assembly on their machines. This notion that there was an efficient lowering from C to the CPU, at the time C was standardised, is an anachronism. C created UB to handle then-contemporary CPUs, then later CPUs created ISAs that matched C syntax. If you don't work in compilers or assembly or CPU design, this might be surprising, but if you aren't intentional about making your ISA work well in C, it's easy to accidentally make one which doesn't. Intel MMX famously couldn't be targeted from compilers because the compilers don't have sufficient information to solve where to put the necessary EMMS instructions. Oops!

Decades down the road, CPU designs evolved and started creating new patterns that don't match C well -- and the C language didn't evolve with them. What expression is PSHUFB in C? Or VPMASKMOVD? The CPU firms knew enough to make sure that the compilers could support their new instructions, so they added these as CPU-specific extensions. If you wanted them, you had to write non-portable code that only worked when compiling to target their CPU and not others.

The compiler engineers believe in C being a portable language. If you write code using SSE or AVX and compile it to ARM with clang, it will compile and port the SSE builtins to ARM Neon vector extensions. Doing this required the compilers to be a whole lot smarter about how the code works, and is a large part of the source of modern complaints about compilers "exploiting" undefined behaviour. I counted 23 ADD instructions in contemporary x86-64, assuming you include fused instructions (LEA) and exclude things like OR (saturating add without carry between bits) and XOR (addition with 1-bit vector lanes, lol).

Finally, C defines an abstract machine. In this context, machine is a "term of art" in computer science, popular types of machines include cellular automata (the machine side of the 'regular expression' language), push-down automata and Turing machines. If you've seen those before, you may know that they're usually pictured as a directed graph, with states drawn as nodes and state transitions as directed edges, the edges labelled with the circumstance under which this edge is taken. Now, C's machine and the three I listed have a key difference, those three are all decision machines, meaning they exist to either accept or reject an input string. The C machine is a functional machine, it describes a function that transforms and input to an output. (In this treatment, you may picture side-effects as being part of the output.) The C standard defines how such C abstract machines are written down (the C syntax) and what semantics they have: states, and state transitions. The question is what happens when you are in a state and receive an input for which the standard does not define any particular state transition? In a cellular automata or PDA or Turing machine you have a single state named "error" (or "reject") at which point you reject the input string as not being a member of the set that the machine is deciding (aka., your input string fails to match the regex, and we're done). In a functional machine, we don't traditionally have such a state. By defining UB in the way they did, the C standard is stating that when no state transition is specified, it may go anywhere, including to states that aren't required to exist and on which the standard places no requirements.


If it doesn't change whether the argument is correct or not, why did mpweiher bring it up?


Citation needed. Which people? What did they actually say? What was the text that supposedly forbade this interpretation of UB? Please don't tell me this is again that tired wankery over "permissible" versus "possible". As if the choice of synonym mattered.


It's a rather infamous change between C89 and C99 where the description of UB was changed from basically don't do this to please do this and compilers can do whatever they want if you do


The definition of "undefined behavior" did not change in the way you describe between C89/C90 and C99. In both editions, one possible consequence of undefined behavior is "ignoring the situation completely with unpredictable results" -- i.e., compilers can do whatever they want.

There is no "don't do this" or "please do this" in either edition. Both merely describe the possible consequences if you do.

C90: undefined behavior: Behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

If a "shall" or "shall not" requirement that appears outside of a constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this Standard by the words "undefined behavior" or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe "behavior that is undefined."

C99: undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). EXAMPLE An example of undefined behavior is the behavior on integer overflow.

(Some of the wording in the C90 definition was moved to the Conformance section in C99.)


“Permissible” ≠ “possible”


True -- but how does that affect the semantics?

Both definitions say that undefined behavior can be dealt with by "ignoring the situation completely with unpredictable results". There are no restrictions on what can happen.

(The standard joke is that it can make demons fly out of your nose. Of course that's not physically possible, but it would not violate the standard.)


Ignoring ≠ taking action based on

The standard joke is a joke, because it is wrong.


> The standard joke is a joke, because it is wrong.

No, it's a joke because it's _physically_ impossible but allowed.

That said, clang and gcc are of course not antagonistic and don't make use of UB to anger their users. Instead, they make use of UB to aggressively optimize (valid) programs, which is important because C is used for a lot of high-performance code where every bit of optimization can save a lot of time and money. The fact that this sometimes leads to invald programs (i.e. programs with no defined behavior according to the standard) being optimized to correct but "absurd" results is just a trade-off compiler writers like to take. Mostly because such programs are erroneous anyway, even under a strict "portable assembler" view, which again the standard does not enforce (or even encourage).


> The standard joke is a joke, because it is wrong.

No, it is a joke because it is silly.

It is correct and intended for pedagogy.


"Permissible" and "possible" are not synonyms.


If they mean the same thing, why was it changed? Hint: they don't mean the same thing. At all.


You have never changed the wording of anything you had written to make your intent clearer?


C is not a "portable assembler".

An assembly language program specifies a series of CPU instructions.

A C program specifies runtime behavior.

That's a huge semantic difference.


"Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler:”

https://www.open-std.org/JTC1/SC22/WG14/www/docs/n897.pdf

p10, line 39

"C code can be portable. "

line 30


> No. This is what I call the "portable assembler"-understanding of undefined behavior and it is entirely false.

"C has been characterized (both admiringly and invidiously) as a portable assembly language" - Dennis Ritchie

The idea of C as a portable assembler is not without its problems, to be sure -- it is an oxymoron at worst, and a squishy idea at best. But the tendency of compiler people to refuse to take the idea seriously, even for a second, just seems odd. The Linux kernel's memory-barriers.txt famously starts out by saying:

"Some doubts may be resolved by referring to the formal memory consistency model and related documentation at tools/memory-model/. Nevertheless, even this memory model should be viewed as the collective opinion of its maintainers rather than as an infallible oracle."

Isn't that consistent with the general idea of a portable assembler?

> I agree that undefined behavior is a silly concept but that's the fault of the standard, not of compilers.

The people that work on compilers have significant overlap with the people that work on the standard. They certainly seem to share the same culture.


> But the tendency of compiler people to refuse to take the idea seriously, even for a second, just seems odd.

It's not taken seriously because it shouldn't be taken seriously. It's a profoundly ignorant idea that's entirely delusional about reality. Architectures differ in ways that are much more profound than how parameters go on the stack or what arguments instructions take. As a matter of fact the C standard bends over backwards in the attempt of not specifying a memory model.

Any language that takes itself seriously is defined in terms of its abstract machine. The only alternative is the Perl way: "the interpreter is the specification", and I don't see how that's any better.


> It's not taken seriously because it shouldn't be taken seriously

I really don't know what you're arguing against. I never questioned the general usefulness of an abstract machine. I merely pointed out that a large amount of important C code exists that is in tension with the idea that of an all important abstract machine. This is an empirical fact. Is it not?

You are free to interpret this body of C code as "not true ISO C", I suppose. Kind of like how the C standard is free to remove integer overflow checks in the presence of undefined behavior.


> As a matter of fact the C standard bends over backwards in the attempt of not specifying a memory model.

I mean, C explicitly specifies a memory model and has since C11


I wonder what's the best solution here then. A different language that actually is portable assembly, or has less undefined behaviour or simpler semantics (e.g RIIR), or making -O0 behave as portable assembly?


Step 1: Define just what "portable assembly" actually means.

An assembly program specifies a sequence of CPU instructions. You can't do that in a higher-level language.

Perhaps you could define a C-like language with a more straightforward abstract machine. What would such a language say about the behavior of integer overflow, or dereferencing a null pointer, or writing outside the bounds of an array object?

You could resolve some of those things by adding mandatory run-time checks, but then you have a language that's at a higher level than C.


> Perhaps you could define a C-like language with a more straightforward abstract machine. What would such a language say about the behavior of integer overflow

Whatever the CPU does. Eg, on x86, twos complement.

> or dereferencing a null pointer

Whatever the CPU does. Eg, on X86/Linux in userspace, it segfaults 100% predictably.

> or writing outside the bounds of an array object?

Whatever the CPU does. Eg, on X86/Linux, write to whatever is next in memory, or segfault.

> You could resolve some of those things by adding mandatory run-time checks, but then you have a language that's at a higher level than C.

No checks needed. Since we're talking about "portable assembly", we're talking about translating to assembly in the most direct manner possible. So dereferencing a NULL pointer literally reads from address 0x0.


> What would such a language say about the behavior of integer overflow

Two's complement (i.e. the result which is equivalent to the mathematical answer modulo 2^{width})

> dereferencing a null pointer

A load/store instruction to address zero.

> writing outside the bounds of an array object

A store instruction to the corresponding address. It's possible this could overwrite something important on the stack like a return address, in which case the compiler doesn't have to work around this (though if the compiler detects this statically, it should complain rather than treating it as unreachable)


The reason not to define these things is exactly so C can be used as a high-level assembler, and the answer is always “whatever it is that the CPU naturally does”

"Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler:”

https://www.open-std.org/JTC1/SC22/WG14/www/docs/n897.pdf

p10, line 39

"C code can be portable. "

line 30


That's an interesting opinion.

But it has very little to do with the C programming language.


> The idea of C as a portable assembler is not without its problems

The main problem is that C is not a "portable assembler". You mainly argue that it should be, but it simply isn't (and hasn't been for a long time if it ever was).

> The people that work on compilers have significant overlap with the people that work on the standard. They certainly seem to share the same culture.

Isn't that beside the point? If you want C to be a "portable assembler" you have to write a standard that specifies its behavior. The compilers will then follow.


> Also, compilers don't reason about code the same way humans do. They apply a large number of small transformations, each of these transformations is very reasonable and it is their combination that results in "absurd" optimization results.

So true.

It baffles me how sometimes I hear colleagues of mine just "assuming" that the compiler will deal with something. Something that does require a really high level reasoning to come true.

It's absurd.


IMO it all comes down to the, 'Premature optimization is the root of all evil.' That saying was, at its best, not great. But it seems at some point down the road the 'premature' part of that sentence was lost to history and it became something much worse. It goes some way towards explaining why we now need machines that would be considered supercomputers, just a couple of decades prior, to run a text editor.

Unreal Engine is a beautiful example of this. The source code of that engine is absolutely full of 'premature optimizations.' For instance repeatedly converting a Quaternion to a Rotator (Euler angles) is basically never going to get red-lined on a profiler, let alone be a bottleneck. Yet there's a caching class used in the engine for this exact purpose. It's all of these little 'premature optimizations' that end up creating a system that runs vastly better than most any other engine out there, feature for feature.


It's a dogma pendulum like "gotos are evil," "YAGNI," object-oriented design (is good/bad), etc.

A pattern or anti-pattern gets identified and lessons drawn from it. Then people take the lesson as dogma and drive it to absurd extremes. Naturally this doesn't have the desired effect, and then the baby gets thrown out with the bathwater, people do the exact opposite and take that to absurd extremes.

Caching potentially expensive (relatively speaking) operations isn't "premature optimization," it's a design principle. It's much harder to retrofit something like this into a project after the fact than having and using it from the start.

"Premature optimization" is applying complicated micro-optimizations early on that won't survive any moderate code change.


It depends on the compiler really. C is nice is that you can choose a compiler fit for your purpose but it makes my next example harder to I'm going to discuss go. I was doing leetcode recently and one of the problems was to combine some set of sorted linked arrays into one bigger array. There are a few ways to do this but I ran two of them to see which was better.

1.) I select the lowest from the set of first elements in the arrays, add it to the end of my new list, then increment the iterator for that array. 2.) Dump all of them into one array then sort the new array entirely.

Surprisingly the second option was faster and didn't have a higher memory cost. From what I could tell this was because it used fewer memory allocations, could run better optimizations, and the standard library in go has a good sorting algorithm. One of the greatest skills I think we can develop as programmers is knowing when to trust the compiler and when not to.

All of this is to say I agree with you on principle but there is nuance and a degree to which we can trust the compiler, but only if we've chosen one that we are familiar with and that is well suited for the task at hand.


Not necessarily just a matter of details of the assembly, but also of cache utilization.


That's why the author prefaced the sentence with "In effect". The effect in this case is the same as if the compiler had pretended i was initialized to a high value - the loop is omitted.


It's still fundamentally a misunderstanding of what the compiler is doing, and thinking about it that way is just going out of your way to cause confusion. The compiler saw a use of an uninitialized value, concluded that it must not be reachable as that's the only way for it to be a legal C program, and then deleted the dead code. You can make up all sorts of other behaviors that the compiler could have done which would have had the same result in a specific scenario, but why would you?


I don't think the author misunderstands what the compiler is doing, just saying that it's probably unexpected behavior from the perspective of the program's author.


I don't think that the compiler approach is try and fix the code

> it must not be reachable as that's the only way for it to be a legal C program, and then deleted the dead code.

Rather it would make more sense if the approach was more like "this branch is UB so I can whatever is most convenient in terms of optimization" in this case it was merging the 2 branches but discarding the code for the UB branch.

But from a behavioral point of view all these formulations describe the same result


The compiler isn't trying to "fix" anything and I'm not sure where that idea came from. The core concept is that the compiler assumes the program is valid and optimizes based on that assumption. If UB would occur if certain values are passed as an argument to a function, the compiler assumes that those values won't be passed. If UB occurs unconditionally in a function, the compiler assumes that the function won't be called.


> The core concept is that the compiler assumes the program is valid

I was referring to this.

IIRC the spec language is that the compiler is free to assume that UB never happens, but from an operational perspective I believe that the compiler simply stop caring about those cases.

by this I mean that for code like

    #include <stdio.h>

    int main() {
        printf("Hello, ");
        0/0;
        *(NULL);
        printf("World!\n");
        return 0;
    }
most compiler will just pretend that bad lines did not exists

https://godbolt.org/#z:OYLghAFBqd5TKALEBjA9gEwKYFFMCWALugE4A...


The compiler is not trying to fix your program. It is just assuming that somehow those lines are never hit, and then is deleting the dead code to make your binary smaller.


I agree, I was only disagreeing with this view:

> > The core concept is that the compiler assumes the program is valid

I believe that it is more correct to say that the compiler is unrestricted in what it does with UB, so it handles it in the most convenient way (often toward optimizing binary size, runtime performance, and/or compilation time)

By reductio ad absurdum assuming that the program is valid would give the same behaviour. I just believe that it is not what happens operatively.


> No. This is what I call the "portable assembler"-understanding of undefined behavior and it is entirely false. Clang does not need to pretend that i is initialized with a value larger than 10, there is no requirement in the C standard that undefined behavior has to be explainable by looking at it like a portable assembler. Clang is free to produce whatever output it wants because the behavior of that code is literally _undefined_.

You're kind of lawyering this. Sure, it's "undefined", but is that useful to anyone outside of compiler writers? How useful is it to have a program that's very fast but entirely wrong? If the behavior is undefined, I want an error, not a free license for the compiler to do whatever the hell it wants.


> How useful is it to have a program that's very fast but entirely wrong?

This only affects you if your program has undefined behavior, at that point your program was wrong anyway, you were just lucky that the compiler compiled it into something that works. Taking a different compiler or a different target architecture and you might not be lucky. So even the old "portable assembly" kind of undefined behavior is actually the opposite of portable.

> Sure, it's "undefined", but is that useful to anyone outside of compiler writers?

It may surprise you that compiler writers are not evil people that take joy in the suffering of their users. They implement these aggressive optimizations because they actually produce better results for valid programs. It just happens that this coincides with more "absurd" results if the program has undefined behavior. But at that point your program was wrong anyway.

If you absolutely want "portable assembler"-semantics for your C program you can do that by using (very) old C compilers. Your program will not get optimized as aggressively, but "portable assembler"-semantics and optimizations are inconsistent with each other anyway.


Doesn't -Wall give you the error you want?


Not always - in some cases you need -Wextra or even -Weverything to get a warning. [Edit: was thinking of GCC here, and yesterday's similar thread - clang may be more forthcoming with warnings, I don't know.]


Exactly, and in addition to that, making the loop disappear is not only good for performance but from a compiler perspective much easier than producing a helpful error message that explains the situation well.


> Clang is free to produce whatever output it wants because the behavior of that code is literally _undefined_.

> They apply a large number of small transformations, each of these transformations is very reasonable

One of those transformation detects that we are reading uninitialized memory and acts accordingly. Given that most of the case where we do it is by mistake, I think that doing anything other than raising an error (or at least a big warning) is not a very reasonable thing to do. For those case where such behavior is desired, a compiler flag could be provided.

The fact that the standard allows the current behavior, doesn't mean that the compiler should do it.


> One of those transformation detects that we are reading uninitialized memory and acts accordingly.

This might be pedantic but variables do not have to correspond to memory locations. The compiler could decide to keep the variable in a register or the compiler might realize that the variable never changes and apply constant folding eliminating it entirely. Assuming that variables correspond to memory locations (or really any assumptions about how code is translated into assembly) is a part of the "portable assembler"-understanding of C that simply doesn't apply to (optimizing) C compilers.

> Given that most of the case where we do it is by mistake, I think that doing anything other than raising an error (or at least a big warning) is not a very reasonable thing to do.

I agree that the compiler should warn if it can prove that a programs behavior is undefined unconditionally. I don't think it's as easy as you make it out to be, but I encourage you to check out the source code of clang and fix this issue if that's possible. That said, in practice undefined behavior is often encountered conditionally (e.g. the behavior of this function is undefined if and only if the first argument is non-null and the second argument is null). Optimizing such programs is trivial (simply assume that undefined behavior cannot happen and optimize accordingly), detecting that the program actually calls that function with invalid parameters on the other hand is _literally_ impossible and even a best-effort approach would be computationally expensive and miss a lot of cases.


Is all UB silly? E.g., wouldn't fully defining what happens when one goes beyond the end of an array impose a non-trivial performance hit for at least some code?


Yes. But there's middle ground between fully-defined behavior (lots of slow checks) and what current compiler-writers think UB is (do whatever I want).

Specifically, implement UB the way it is described in the standard: pretend it isn't UB, do it anyway, consequences be damned. That's what "ignore the situation with unpredictable results" actually means.


> compiler-writers think UB is

The current standard is _very_ explicit that undefined behavior is indeed undefined, i.e. "do whatever you want".

> pretend it isn't UB, do it anyway, consequences be damned.

This explicitly isn't a requirement, but even if it were, "ignoring the situation completely with unpredictable results" can be interpreted in numerous ways. One of these ways is "ignoring any cases in which UB is encountered" which is exactly what compilers are doing. Then again, saying "the compiler didn't ignore the situation and as a consequence I got results I didn't predict" isn't a strong argument when the standard specifically told you that you will get unpredictable results.


"The Party told you to reject the evidence of your eyes and ears. It was their final, most essential command."

There's a certain poetic consistency in ignoring arbitrary portions of the standard to justify ignoring arbitrary portions of the input code.


Which part of the standard is ignored? Again, the standard is _very_ explicit about what undefined behavior means. If you don't like that you can either try to change the standard or use the numerous command line options provided by most compilers to tell your compiler that you would like certain undefined behaviors have a defined meaning.

Saying that compilers shouldn't ignore code with undefined behavior is like saying compilers shouldn't ignore the body of an if-statement just because the condition evaluated to false.


You're right on one point: the standard is very explicit.

And because it is explicit—a fact you yourself just admitted—the fact that silent erasure of non-dead code is not a listed option in response to UB means that it is not allowed.


The standard is explicit that the behavior of code with undefined behavior is well... undefined and that implementations can do whatever they want.


Reasonable people can disagree as to whether that interpretation is valid.

No reasonable person can say that it is explicit. It simply, factually, is not. At no point in any version of the C Standard does the text "implementations can do whatever they want" appear.

I have no time for blatant and insulting dishonesty. We're done here.


"3.4.3 undefined behavior

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements"

It is _very_ explicit. The following note is (as all notes are) not normative. So even if the note would cast any doubt (it really doesn't), it can safely be ignored.


There are enough high-performance languages without undefined behavior and I don't think they suffer heavily for it.


> Also, compilers don't reason about code the same way humans do.

Not these compilers for sure. But I don't agree that all compilers are broken.

> They apply a large number of small transformations, each of these transformations is very reasonable and it is their combination that results in "absurd" optimization results.

Humans use techniques like natural deduction to apply a series of transformations that do not lead to absurd results.


> Humans use techniques like natural deduction to apply a series of transformations that do not lead to absurd results.

  All men are immortal.
  Socrates is a man.
  Therefore, Socrates is immortal.
If you were to give a group of CS students the exercise to determine whether the this deduction is valid, they would all answer yes. So no, humans that apply natural deduction can also derive absurd results.

That said, I don't think all C programmers would agree that the result of that optimization is absurd in the first place.


> Clang is free to produce whatever output it wants because the behavior of that code is literally _undefined_

The standard says no such thing. It lists a number of acceptable ways to handle undefined behavior and none of them are do anything you like.


> I agree that undefined behavior is a silly concept but that's the fault of the standard, not of compilers.

While undefined behavior is the standard's fault, its interpretation is up to the compiler.


The reason for leaving integer overflow undefined was not primarily because of one's complement machines. It was for loop optimization.

Consider for (int i = 0; i < ARRAY_WIDTH; i++) out[i] = in[i + offset];

Assuming that i + offset, &out + i, and &in + i + offset do not overflow allows the loop to be cleanly vectorized without checking for wraparound.

The compiler developers in the 80s were trying to come up with rules that didn't require C to be 5x slower than Fortran on scientific applications, and dealing with the consequences of a[i] being equivalent to *(&a + i).


It sure would be nice if modern CPUs had two sets of integer instructions: one that wraps, and one that triggers an exception on wrap, with zero overhead in non-wrapping case.

Then we could compile all code with the latter, except for specifically marked edge cases where wrap is a desired part of the logic.


That doesn't help at all if your loop variable is a 32 bit int that your compiler decided to transform away into vectorized loads from a 64 bit pointer.

But that's exactly one of the transformations that get enabled by assuming undefined overflow.


But in that case, it's not going to be wrapping either. We'll just read beyond the end of the buffer, which a bounds check should catch.

Or perhaps I'm not thinking of the specific sequence that would 1) not wrap during modifying index and 2) not hit bounds check after.

It would need to be a requirement that compilers can't upcast all your ints to 64 bit ones, do all the math, and then write them back - would need specific instructions for each size.


ARM actually kind of has that. The register file has an overflow flag that is not cleared on subsequent operations (sticky overflow). So instead of triggering an exception, which is prohibitively expensive, you can do a series of calculations and check afterwards if an overflow happened in any of them. A bit like NaN for floating point. From what I understand the flag alone is still costly, so we will have to see if it survives.


It doesn't matter if triggering the exception is expensive. At that point overflow has already occured, so your program state is now nonsense, and you might as well just let it crash unhandled. Much better outcome than reading memory at some mysterious offset.

If just having the ability for an exception to occur during an instruction causes overhead, that would be a big problem though.

Edit to add: We need to do the check on every operation. Just going through one iteration of the loop might have already corrupted some arbitrary memory, for example. Manually inserted checks on some flag bits don't scale to securing real programs.


> At that point overflow has already occured, so your program state is now nonsense, and you might as well just let it crash unhandled.

The trick is to check and clear the flag before any instruction that would have a side-effect, that depends on the arithmetic result.

IEEE 754-compliant floating point units have a similar behaviour with NaN that is a bit more versatile: an arithmetic instruction results in NaN if any operand is NaN, but an instruction with side-effect (compare, convert or store) will raise an exception if given a NaN.


Think about it like that. If you allow an exception you essentially create many branches with all their negative consequences. With the sticky bit you combine them to one branch (that is still expensive[1]).

[1] https://news.ycombinator.com/item?id=8766417


That's quite interesting and reasonable


You can compile code with -fwrapv and for most programs the overhead is minimal (the exception being that if you're writing number crunching code, the overhead is going to be huge). For my personal projects I have -fwrapv as part of the default compiler flags, and I remove the flag in opt builds. I honestly haven't caught that many bugs using it, but for the few bugs it did catch it saved me a lot of debugging time.



It could be, if the hardware supported it. Consider this quote from that page:

"in some debug configurations overflow is detected and results in a panic"

That's not good enough. We want to always detect it! Many critical bugs are caused by this in production builds too. Solving it at the language level would require inserting branches on every integer operation which is obviously not acceptable.


> That's not good enough. We want to always detect it

So, select the configuration where that's the behaviour? overflow-checks = true

> Solving it at the language level would require inserting branches on every integer operation

Yes, so that's what you have to do if you actually want this, if you won't pay for it then you can't have it.


Well, the whole point of my post was that I would really like a hardware feature that does it without overhead. How that would work behind the scenes, I have no idea. Not a hardware engineer.


If you really want it, use it. Hardware vendors optimise stuff they see being done, they don't optimise stuff that somebody mentioned on a forum they'd kinda like but have never used because it was expensive. Maybe if you have Apple money and can buy ARM or something then you could just express such whims, but for mere mortals that's not an option.

Newer CPUs in several lines clearly optimise to do Acquire-Release because while it's not the only possible concurrency ordering model, it's the one the programmers learned, so if you make that one go faster your benchmark numbers improve.

Modern CPUs often have onboard AES. That's not because it won a competition, not directly, it's because everybody uses it - without hardware support they use the same encryption in software.

The Intel 486DX and then the Pentium happened because it turns out that people like FPUs, and eventually enough people were buying an FPU for their x86 computer that you know, why not sell them a $100 CPU and a $100 FPU as a single device for $190, they save $10 and you keep the $80+ you saved because duh, that's the same device as the FPU nobody is making an actual separate FPU you fools.

Even the zero-terminated string, which I think is a terrible idea, is sped up on "modern" hardware because the CPU vendors know C programs will use that after the 1970s.


MIPS for example has this. It has `addu` for normal integer addition that does not trap and `add` if you want to trap on overflows.


x86 had overflow checking support (via OF flag and INTO insn) but it got slowed down and later dropped from 64 bit mode.


The solution is to have the compiler automatically split iteration into a known in-bounds part and a possibly-out-of-bounds part. In this case, generating an additional check that {ARRAY_WIDTH < (INT_MAX - offset)} would be sufficient to guarantee that {i + offset} doesn't wrap around, enabling further reasoning in a specialized copy of the loop. (In this example, it's unclear what the relation to ARRAY_WIDTH is to in and out).

The HotSpot C2 compiler (and AFAIK, Graal) do partitition iteration spaces in this way.

This does have some complexity cost in the compiler, of course, and it produces more code, especially if the compiler generates multiple copies of the loop. But given that relatively few super-hot loops of this kind are in typical programs, it is worth it.


>But given that relatively few super-hot loops of this kind are in typical programs, it is worth it.

In C programs these kind of super-hot loops are quite common, because if someone didn't have many such loops they probably wouldn't need to write in C/C++. And if C had the same overhead as Java in these cases, then people who needed to squeeze out every drop of performance would use a different language.


> these kind of super-hot loops are quite common

People say this, then they write microbenchmarks. Then compilers optimize the heck out of these microbenchmarks (and numerical kernels). Rinse and repeat for several decades, and every dirty trick that you can think of gets blessed under the guise of UB.

When in reality, real programs do not spend all their time in one (or even a handful) of super-hot loops. C was designed to let compilers have a field day so that 1980s supercomputers could get 5% more FP performance, by hook or by crook. This kind of dangerous optimizations do not make much different for the vast majority of application code, the majority of which spends its time chasing pointers and chugging through lots of abstraction layers.


> When in reality, real programs do not spend all their time in one (or even a handful) of super-hot loops.

This paper from 2015 finds that a large amount of CPU in Google data centers is spent in a relatively small number of core components, known as the "datacenter tax": https://static.googleusercontent.com/media/research.google.c...

    > [We] identify common building blocks in the lower levels of the software stack.
    > This “datacenter tax” can comprise nearly 30% of cycles
    > across jobs running in the fleet, which makes its constituents
    > prime candidates for hardware specialization in future server
    > systems-on-chips.
Some of the components they identify are memmove() (definitely a loop), protocol buffers (proto parsing is a loop), and compression/decompression (a loop).


>C was designed to let compilers have a field day so that 1980s supercomputers could get 5% more FP performance, by hook or by crook.

C was designed to help create Unix. It just happened to turn out to be something that could be compiled into more efficient code than most other languages at the time, without being overly difficult to work with.

From what I understand, 80s supercomputers were more likely to run Fortran. Fortran has fewer problems with pointer aliasing than C, so a Fortran compiler could generate better code than a C compiler for the type of code they would be running.


The idea that only a tiny portion of code is "hot" is just not true.

But you're also missing the entire idea anyway. It's not about bounds checking, it's about at what point overflow occurs. `int` is a sized type, so if overflow is defined for it then it has to also overflow at that size. This prevents using native size of the machine if it's larger than that of `int`. Which these days it very often is since int is 32bit. So you couldn't promote it to a 64bit relative address as then it'll overflow at 64bits instead of 32bits.


> This prevents using native size of the machine if it's larger than that of `int`.

Having implemented multiple compiler backends for 64-bit machines, this is not true. All 64-bit machines I am aware of, and certainly all modern ones, do 32-bit arithmetic just fine.


Yes it is, see x86 example in https://youtu.be/yG1OZ69H_-o at about the 40 minute mark for a real world example


It's actually really annoying to get an example from a video, but nevertheless...

The example shows you still didn't really seem to grok my point, which I will state again. A compiler can introduce additional checks for overflow that guard a large region (e.g. the repeated code in that example plus the loop after--maybe even the whole function!) and then make exactly the same transformations in the guarded region. The result is exactly the same machine code. It will go fast and not go out-of-bounds or wrap around. If the program would go out-of-bounds or wrap around, the guard will dynamically catch than and branch to a slower copy of the code that does exactly the right behavior at the right place. This is exactly how Java JITs approach C code quality. They use all the same tricks, they just have to try harder to handle the edge cases.

And besides, there are several other ways to get that code--instead of using int32_t, it could have been uintptr_t, which replaces UB with platform-specific behavior but nevertheless lets the compiler do the LEA and other addressing mode tricks. No need to resort to UB.

The entire mindset of opening holes in the language specification to allow compilers to optimize without thinking hard is just wrong. Programs have bugs. They go out of bounds. Preserving the exact out-of-bounds behavior is what allows programmers to debug their code. Checking bounds is what keeps programs from getting pwned all the time. It's not an opportunity for magic go-fast at the expense of security and debuggability.

We're here in this muck today because of a persistent and deliberate choice to be hostile to programmers, programs, and people running those programs because compiler designers didn't want to work hard. That doesn't fly in literally any other programming language domain; C is the odd duck and it's frankly ruined people's reasoning centers.


> The entire mindset of opening holes in the language specification to allow compilers to optimize without thinking hard is just wrong.

I would counter that 2's compliment overflow is a worse design in practice as easily 99% of the time overflow happening at all is a bug. Rust almost got this right by making overflow be a panic in debug builds, but then they took the worst option in making it defined 2's compliment in release builds

But also the mindset that everything must be extensively defined is just wrong. Nobody's standard library does this anyway, but also it's not feasible to do. You cannot define the exact outcome of data races, for example. UB is inevitable, it'll always exist in all languages. The debate is as such over where you put that line, not whether or not it exists at all


> UB is inevitable

There's a huge difference between UB (as in C/C++) and nondeterminism, so I don't accept this assertion. UB in C gives no semantics to executions containing it; it is neither spatially nor temporally contained. No other programming language has such a ridiculous and program- and programmer-hostile concept. And for good reason. It makes reasoning about buggy programs a non-starter. And all programs have bugs. We debug them by analyzing their runtime behavior! Crashing immediately upon a bug, exactly the same way on every machine, is the absolute best way to find and fix bugs that cannot be caught statically.


It is more true in good code than it is in bad code. If you use atomic reference counting everywhere or you're constantly allocating memory and freeing it all over the place, like a lot of "modern" code, you do tend to get quite flat profiles. But in good code, which allocates very infrequently, and is designed around flat arrays of data being processed in loops, you naturally do sit in tight loops a lot.


No, it still isn't true. Consider things like a web browser. Huge codebases where damn near the entire thing is "hot." Or OS's like iOS and Android - again, huge codebases where everything is "hot" (everything is either impacting responsiveness or battery life, after all), and vanishingly few if any "super hot tight loops." Same again with game engines like Unreal. It does have data processing hot loops like physics simulations, but it also has audio paths, rendering, AI, etc... that are also all still very much hot. Even the code that does loading, a "rare" event, is still hot because the user is blocked on it.

The "hot data processing loop" definitely exists in some systems, notably those that really just do one single thing, but those are also far from even the majority case. At least not on anything where the end user is directly paying the computational bill (aka, very few client apps fit the "single hot loop" profile)

But if you go pitch Apple with a "hey, we want iOS to be 20-30% slower because some people are uncomfortable with narrow contracts" they would rightly laugh you right out of the room.


Web browsers are the worst offenders for what I described: highly dynamic retained-mode GUIs with huge amounts of memory allocation all over the place. They are bloated and very much badly designed, so even simple websites slow any 5+-year-old computer to a crawl.

If people cared about performance (including latency and power usage) on mobile they wouldnt write such heavy apps. Reality is, people plug their phones in every day and so power usage isnt actually that important. Kept in mind, but clearly not optimised for.


That sounds very unlikely given that unsigned overflow is defined. The original motivation for undefined behaviour was not performance. I'm pretty sure autovectorisation was not a thing in 1989. The author's theory sounds far more likely.

I'd love to see an actual citation though if you aren't just guessing.


Maybe the idea hadn't been invented back then, but it seems obvious to me that the correct response is an iterator protocol. Or at least a hardcoded for-in syntax.


Adding an iterator in C++ means adding at least 2 more objects, multiple function calls, operator overload, templates, the whole package.

You don't trust your compiler to optimize the sane trivial C case, but you trust it to optimize all that garbage away?


if your data's sequential, creating an iterator in C++ is as simple as returning a begin and end pointer, and will be optimized away by any level other than O0

https://godbolt.org/z/WEjzEr5j4


But iterating over pointers is once again optimized with lots of undefined behavior at the corners. So you are replacing one source of undefined behavior with another.


Replacing undefined behavior at the program-level with undefined behavior written and tested as part of the standard library, usually vendored and distributed in concert with the compiler, seems like an obvious net-positive to me.


Pointer arithmetic optimization based on undefined behavior is a problem regardless.

Life is always better after minimizing the total number of types of undefined behavior.


> with lots of undefined behavior at the corners.

What behavior is undefined in incrementing a pointer between a begin and end range?


Of course a basic iteration between begin()/end() will never contain out of range elements, but neither will valid increment between two integers. No need for iterators in that case either.

Say I want to do something fancy, like getting element number 11 from an array.

With an integer index I can pass 11, with random access iterators I can use begin() + 11.

Now my array only has five elements. So I check.

11 < 5? Comparison valid, successfully avoided a crash.

begin() + 11 < end() ? How where the rules for pointer comparison again, something about within an allocation and one past the end?


> something about within an allocation and one past the end?

Yeah, I forgot about that. So I agree there is some subtly which is likely to catch beginners.

Your example could safely be:

   if (std::distance(begin, end) > 5) 
Another approach I would recommend is to write a `guarded_advance` which takes an integer and the end pointer.

Also note that the situation you are describing is still a little unusual because the baseline assumption is it takes linear time to advance an iterator by more than 1 increment.

> but neither will valid increment between two integers. No need for iterators in that case either.

The purpose of an iterator is to abstract data structure access. The coordinate inside a complex data structure may not be representable by an integer.


A pointer is a valid C++ iterator.

> but you trust it to optimize all that garbage away?

Yep, if you learn about compilers you learn what kind of optimizations are easy and hard for them to make. The kind that just flattens a few levels of nested definitions are easier.


Care to explain? An iterator is a nice high level concept, but the CPU still has to do the &in + i + offset arithmetic. I don't see how replacing `i` with syntactic sugar changes the need to check for overflow.


I think the point the GP is making is that with an iterator protocol, the iterator implementation itself is free to make a different choice on implementation strategies, based on the shape of the data and the hardware available, and this is transparent to client code. So for example, a container containing only primitive ints or floats on a machine with a NVidia Hopper GPU might choose to allocate arrays as a multiple of 64, and then iterate by groups of 64, taking advantage of the full warp without needing any overflow checks. Obviously a linked list or an array of strings couldn't do this, but then, they wouldn't want to, and hiding the loop behind an iterator lets the container choose the appropriate loop structure for the format and hardware.

I've heard criticisms of C and C++ that they are simultaneously too high-level and too low-level. Too high-level in that the execution model doesn't actually match the sort of massively parallel numeric computations that modern hardware gives, and too low-level that the source code input into the compiler doesn't give enough information about the real structure of the program to make decisions that really matter, like algorithm choice.

It's interesting that the most compute-intensive machine learning models are actually implemented in Python, which doesn't even pretend to be low-level. The reason is because the actual computation is done in GPU/TPU-specific assembly, so Python just holds the high-level intent of the model and the optimization occurs on the primitives that the processor actually uses.


"It's interesting that a lot of performance-critical code tends to be written in C++, which sometimes pretends not to be that low level. The reason is because the actual performance critical code is really running CPU-specific assembly, so C++ just holds the high level intent of the model and the optimization happens on the primitives that the processor actually uses."


> the iterator implementation itself is free to make a different choice on implementation strategies

That's just UB with more steps. What will the spec say? "Behavior of integer overflow is undefined. Unless the overflow happens within an iterated for loop, in which case the behavior is undefined and the iterator can do whatever it wants".

> I've heard criticisms of C and C++ that they are simultaneously too high-level and too low-level.

I've heard this as well, and I think there is some truth to it, but C is the least-bad offender relative to any other language.

C maps extremely well to assembly. The fact that assembly no longer perfectly captures the implementation of the CPU has nothing to do with C. Every other general purpose[1] language has to target the same abstraction that C does.

Given that reality, C in fact maps better to the hardware than any other language. Because it is faster than any other language. Any higher level language that gives the compiler more information about algorithm choice is slower than C is. That's the bottom line.

[1] This is ignoring proprietary, hardware specific tools like CUDA. That's clearly in a different category when discussing programming languages, IMO.


The reasoning behind the decision to make integer overflow UB changes. As the thread starter mentioned, that reasoning was loops, so you don't need an overflow counter for everyday loops. Take loops out of the equation, and take certain high-performance integer computations where arguably you should have a dedicated FixedInt type, and the logical spec behavior might be silent promotion to BigInt (like JS, Python 3, Lisp, Scheme, Haskell) or a panic (like in Rust).

> [1] This is ignoring proprietary, hardware specific tools like CUDA. That's clearly in a different category when discussing programming languages, IMO.

Arguably they should be part of the conversation. One main reason for the recent ascendancy of NVidia over Intel is that they're basically unwrapping all the layers of microcode translation that Intel uses to make a modern superscalar processor act like an 8086, and saying "Here, we're going to devote that silicon to giving you more compute cores, you figure out how to use them effectively."


> implemented in Python,

A program which constructs an AST out of python classes and spits out GPU code is a compiler. The python is never executed.

Trivially, compilers can generate code faster than their hose language, but that doesn't make the host language fast. The compiler would be even faster if it were written in C++.


That's sort of the point. If you want to make programs really fast and really expressive, your target language ought to be as close to the hardware as possible, and your source language ought to be as expressive as possible, and then you just write a compiler to translate between them.


If you don't initialize the initial iterator state, or compare with something that isn't a valid iterator of the same container, you kinda end up with the same issues.

The complaint here is that the warnings/errors for the integer case don't seem to be on by default. With the warnings command line flags enabled, this case is easily detected by the compiler, same as if it was some iterator object.


I don't think that was the historical justification, but regardless, it is a terribly limiting and oblique hack of a convention for guiding the compiler. Why should this guidance only be possible for signed values? Why have such an important distinction be controlled in such an indirect fashion?

Just 2 reasons why I prefer explicit vectorization anywhere that I feel it is important to have vectorized code.


> It would certainly not hurt performance to emit a compiler warning about deleting the if statement testing for signed overflow, or about optimizing away the possible null pointer dereference in Do().

I think that the nature of Clang and LLVM makes this kind of reporting non-trivial / non-obvious. Deletions of this form may manifest as a result of multiple independent optimizations being brought to bear, culminating in noticing a trivially always-taken, or never-taken branch. Thus, deletion is the just the last step of a process, and at that step, the original intent of the code (e.g. an overflow check) has been lost to time.

Attempts to recover that original intent are likely to fail. LLVM instructions may have source location information, but these are just file:line:column triples, not pointers to AST nodes, and so they lack actionability: you can't reliably go from a source location to an AST node. In general though, LLVM's modular architecture means that you can't assume the presence of an AST in the first place, as the optimization may be executed independent of Clang, e.g. via the `opt` command-line tool. This further implies that the pass may operate on a separate machine than the original LLVM IR was produced, meaning you can't trust that a referenced source file even exists. There's also the DWARF metadata, but that's just another minefield.

Perhaps there's a middle ground with `switch`, `select`, or `br` instructions in LLVM operating on `undef` values, though.


C and C++ are most often used when performance is the highest priority. Undefined behavior is basically the standards committee allowing the compiler developers maximum flexibility to optimize for performance over error checking/handling/reporting. The penalty is that errors can become harder to detect.

It appears the author is a Go advocate. I assume they are valuing clearly defined error checking/handling/reporting (the authors definition of correctness) over performance. If that's what you are looking for, consider Go.


Nowadays, in 1980's....

"Oh, it was quite a while ago. I kind of stopped when C came out. That was a big blow. We were making so much good progress on optimizations and transformations. We were getting rid of just one nice problem after another. When C came out, at one of the SIGPLAN compiler conferences, there was a debate between Steve Johnson from Bell Labs, who was supporting C, and one of our people, Bill Harrison, who was working on a project that I had at that time supporting automatic optimization...The nubbin of the debate was Steve's defense of not having to build optimizers anymore because the programmer would take care of it. That it was really a programmer's issue.... Seibel: Do you think C is a reasonable language if they had restricted its use to operating-system kernels? Allen: Oh, yeah. That would have been fine. And, in fact, you need to have something like that, something where experts can really fine-tune without big bottlenecks because those are key problems to solve. By 1960, we had a long list of amazing languages: Lisp, APL, Fortran, COBOL, Algol 60. These are higher-level than C. We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine. This is one of the reasons compilers are ... basically not taught much anymore in the colleges and universities."

-- Fran Allen interview, Excerpted from: Peter Seibel. Coders at Work: Reflections on the Craft of Programming


Hum... We have to move further than this citation. The 1980s C was much more secure than our current one.

The undefined behavior paradoxes were only added by the 90s, when optimizing compilers became a logic inference engine, feed with the unquestionable truth that the developer never exercises UB.

Just because it was a sane language for kernel development at the 1980s, it doesn't mean it is one now.


Yeah, because the only way to achieve performance in 1980's C on 16 bit platforms was to litter it with inline Assembly, thus UB based optimisation was born to win all those SPEC benchmarks in computer magazines.


Funny thing in that the same thing stopping people to write those crazy¹ optimizers at the 1980s was exactly the lack of capacity of the computers to run them.

What means that they appeared exactly at the time the need for them became niche. And yet everybody adopted them due to the strong voodoo bias we all have at computer-related tasks.

1 - They are crazy. They believe the code has no UB even though they can prove it has.


A great quote. She's absolutely right. C has absolutely polluted people's understanding of what compiler optimizations should be. Compilers should make a program go faster, invisibly. C makes optimization everyone's problem because the language is so absolutely terrible at defining its own semantics and catching program errors.


> It appears the author is a Go advocate

A bit of an understatement. The author is the current Golang project lead and a member since it’s inception


He’s more like the BDFL.


The undefined behavior I struggle with keeps me from better performance though. I have something like [(uint32_t value) >> (32 - nbits)] & (lowest nbits set). For the case of nbits=0, I would expect it to always return 0, even if the right shift of a 32-bit value by 32 bits is undefined behavior, then bit-wise and with 0 should make it always result in 0. But I cannot leave it that way because the compiler thinks that undefined behavior may not happen and might optimize out everything.


Exactly. The irony in all of this is that C is not a portable assembler. It'd be better if it were[1]!

If you want the exact semantics of a hardware instruction, you cannot get it, because the compiler reasons with C's abstract machine that assumes your program doesn't have undefined behavior, like signed wraparound, when in some situations you in fact do want signed wraparound, since that's what literally every modern CPU does.

[1] If the standard said that "the integer addition operator maps to the XYZ instruction on this target", that'd be something! But then compilers would have to reason about machine-level semantics to make optimizations. In reality, C's spec is designed by compiler writers for compiler writers, not for programs, and not for hardware.


I think that the undefinde behaviour should be partially specified. In the case you describe, it should require that it must do one of the following:

1. Return any 32-bit answer for the right shift. (The final result will be zero due to the bitwise AND, though, regardless of the intermediate answer.) The intermediate answer must be "frozen" so that if it is assigned to a variable and then used multiple times without writing to that variable again then you will get the same answer each time.

2. Result in a run-time error when that code is reached.

3. Result in a compile-time error (only valid if the compiler can determine for sure that the program would run with a shift amount out of range, e.g. if the shift amount is a constant).

4. Have a behaviour which depends on the underlying instruction set (whatever the right shift instruction does in that instruction set when given a shift amount which is out of range), if it is defined. (A compiler switch may be provided to switch between this and other behaviours.) In this case, if optimization is enabled then there may be some strange cases with some instruction sets where the optimizer makes an assumption which is not valid, but bad assumptions such as this should be reduced if possible and reasonable to do so.

In all cases, a compiler warning may be given (if enabled and detected by the compiler), in addition to the effects above.


I wanted to reply that your point 3 should already be possible with C++ constexpr functions because it doesn't allow undefined behavior. But I it seems I was wrong about that or maybe I'm doing it wrong:

    [[nodiscard]] constexpr uint64_t
    getBits( uint8_t nBits )
    {
        return BITBUFFER >> ( 64 - nBits ) & ( ( 1ULL << nBits ) - 1U );
    }
 
    int main()
    {
        std::cerr << getBits( 0 ) << "\n";
        std::cerr << getBits( 1 ) << "\n";
        return 0;
    }
The first output will print a random number, 140728069214376 in my case, while the second line will always print 1. However, when I put the ( ( 1ULL << nBits ) - 1U ) part into a separate function and print the values for that, then getBits( 0 ) suddenly always returns 0 as if the compiler understands suddenly that it will and with 0.

    template<uint8_t nBits>
    [[nodiscard]] constexpr uint64_t
    getBits2()
    {
        return BITBUFFER >> ( 64 - nBits ) & ( ( 1ULL << nBits ) - 1U );
    }
In this case, the compiler will only print a warning when trying to call it with getBits2<0>. And here I kinda thought that constexpr would lead to errors on undefined behavior, partly because it always complains about uninitialized std::array local variables being an error. That seems inconsistent to me. Well, I guess that's what -Werror is for ...

Compiled with -std=c++17 and clang 16.0.0 on godbolt: https://godbolt.org/z/qxxWW93Tx


Unfortunately constexpr doesn't imply constant evaluation. Your function can still potentially be executed at runtime.

If you use the result in an expression that requires a constant (an array bound, a non-type template parameter, a static_assert, or, in c++20, to initialize a constinit variable), then that will force constant evaluation and you'll see the error.

Having said that, compilers have bugs (or simply not fully implemented features), so it is certainly possible that both GCC and clang will fail to correctly catch constant time evaluation UB in some circumstances.


Ah thanks, I was not aware that these compile-time checks are only done when it is evaluated in a compile-time evaluating context.

To add to your list, using C++20 consteval instead of constexpr also triggers the error.


Eh. The existence of Rust (and Zig, to a lesser extent) prove that you can, in fact, have both: Highest performance and safe, properly error checked code without any sort of UB.

UB is used for performance optimizations, yes, but all of these difficult to diagnose UB issues and bugs happen because C++ makes it laughably easy to write incorrect code, and (as shown by Rust) this is by no means a requirement for fast code.


The Computer Language Benchmarks Game has C++ outperforming Rust by around 10% for most benchmarks. Binary trees is 13% faster in C++, and it's not the best C++ binary tree implementation I've seen. k-nucleotide is 32% faster in C++. Rust wins on a few benchmarks like regex-redux, which is a pointless benchmark as they're both just benchmarking the PCRE2 C library, so it's really a C benchmark.

> because C++ makes it laughably easy to write incorrect code

I was going to ask how much you actually program in C++, but I found a past comment of yours:

> I frankly don't understand C++ well enough to fully judge about all of this


> Rust wins on a few benchmarks like regex-redux, which is a pointless benchmark as they're both just benchmarking the PCRE2 C library, so it's really a C benchmark.

The Rust #1 through #6 entries use the regex crate, which is pure-Rust. Rust #7[rust7] (which is not shown in the main table or in the summary, only in the "unsafe" table[detail]) uses PCRE2, and it is interestingly also faster than the C impl that uses PCRE2[c-regex] as well (by a tiny amount). C++ #6[cpp6], which appears ahead of Rust #6 in the summary table (but isn't shown in the comparison page)[comp], also uses PCRE2 and is closer to Rust #7.

[comp]: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

[detail]: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

[rust7]: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

[c-regex]: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

[cpp6]: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...


I mean, it's outperforming C as well in that particular benchmark.

Lies, damn lies, and benchmarks?

I can at least say, the performance difference between C, C++, and Rust, is splitting hairs.

If you want to write something performant, low level, with predictable timing, all three will work.

I'm spending a lot of time building projects with Rust & C++ these days. The issue/tradoff isn't performance with C++, but that C++ is better for writing unsafe code than Rust.

https://www.p99conf.io/2022/09/07/uninitialized-memory-unsaf...


> C++ makes it laughably easy to write incorrect code

It also provides a lot of mechanisms and tools to produce correct safe code, especially modern C++. Most codebases you're not seeing a lot of pointer arithmetic or void pointer or anything of that nature. You hardly even see raw pointers anymore, instead a unique_ptr or a shared_ptr. So yes, you can write incorrect code because it's an explicit design goal of C++ not to treat you like a baby, but that doesn't mean that writing C++ is inherently like building a house of cards.*


You can do any rust optimization yourself in C++ (ie. aliasing assumptions), whereas rust makes the other way around very difficult, often forcing you to use multiple layers of indirection where c++ would allow a raw pointer, or forcing an unwrap on something you know is infallible when exceptions would add no overhead, etc. Rust programmers want people to believe that whatever appeases the supposedly zero cost borrow checker is the fastest thing to do even though it has proven to be wrong time and time again. I can’t tell you how many times I’ve seen r/rust pull the “well why do you want to do that” or “are you sure it even matters” card every time rust doesn’t allow you to write optimized code.


> You can do any rust optimization yourself in C++ (ie. aliasing assumptions)

I don’t think that is entirely true. C++ doesn’t have any aliasing requirements around pointers, so if the compiler sees two pointers it has to assume they might alias (unless the block is so simple it can determine aliasing itself, which is usually not the case), but in Rust mutable references are guaranteed to not alias.

This was part of the reason it took so long to land the “noalias” LLVM attribute in Rust. That optimization was rarely used in C/C++ land so it had not been battle tested. Rust found a host of LLVM bugs because it enables the optimization everywhere.


While standard C++ has no equivalent of a noalias annotation, it's wrong to say that it has no aliasing requirements. To access an object behind a pointer (or a glvalue in general), the type of the pointer must be (with a few exceptions) similar to the type of the pointee in memory, which is generally the object previously initialized at that pointer's address. This enables type-based alias analysis (TBAA) in the compiler, where if a pointer is accessed as one type, and another pointer is accessed as a dissimilar type, then the compiler can assume that the pointers don't alias.

Meanwhile, Rust ditches TBAA entirely, retaining only initialization state and pointer provenance in its memory model. It uses its noalias-based model to make up for the lack of type-based rules. I'd say that this is the right call from the user's standpoint, but it can definitely be seen as a tradeoff rather than an unqualified gain.


Why couldn’t the Rust compiler also assume that dissimilar types don’t alias?


Because existing unsafe code written in stable Rust depends on the ability to convert raw pointers and references from one type to another, as long as their memory layouts match. That's the whole premise of the bytemuck crate [0], and it's the basis for things like the &str-to-&[u8] or &[u8]-to-&str conversions in the standard library.

[0] https://docs.rs/bytemuck/latest/bytemuck/


Isn't that the point of unsafe blocks in rust? So you can write optimized code when you need to and the rust borrow checker won't let you?


Unsafe blocks are subject to the same borrow checking that the rest of the language is.


That is correct. However, raw pointers are not borrow checked, in safe Rust they're largely useless, but in unsafe Rust you can use raw pointers if that's what you need to do to get stuff done.

As an example inside a String is just a Vec<u8> and inside the Vec<u8> is a RawVec<u8> and that is just a pointer, either to nothing in particular or to the bytes inside the String if the String has allocated space for one or more bytes - plus a size and a capacity.


> when exceptions would add no overhead

Isn't the overhead for C++ exceptions quite significant, especially if an exception is thrown?

Exception handling can also increase the size of the binary because of the additional data needed to handle stack unwinding and exception dispatch.

I think a number of optimizations are made quite a bit more complex by exception handling as well.


The argument for the Exception price is that we told you Exceptions were for Exceptional situations. This argument feels reasonable until you see it in context as a library author.

Suppose I'm writing a Clown Redemption library. It's possible to Dingle a Clown during redemption, but if the Clown has already dingled that's a problem so... should I raise an exception? Alice thinks obviously I should raise an exception for that, she uses a lot of Clowns, the Clown Redemption library helps her deliver high quality Clown software and it's very fast, she has never dingled a Clown and she never plans to, the use of exceptions suits Alice well because she can completely ignore the problem.

Unfortunately Bob's software handles primarily dingling Clowns, for Bob it's unacceptable to eat an exception every single damn time one of the Clowns has already been dingled, he demands an API in which there's just a return value from the dingling function which tells you if this clown was already dingled, so he can handle that appropriately - an exception is not OK because it's expensive.

Alice and Bob disagree about how "exceptional" the situation is, and I'm caught in the middle, but I have to choose whether to use exceptions. I can't possibly win here.


Like I said, this argument doesn’t work because you can use options in c++ but you can’t use exceptions in rust. So when there’s an occasion where you want to avoid the overhead of an option or result in rust - well too bad.


Yes you can. They're called panics


Exceptions cost performance when thrown whereas return values always cost performance.

If all you care about is outright performance, having the option for exceptions is easily the superior choice. The binary does get bigger but those are cold pages so who cares (since exceptions are exceptional, right?)


Given who invented it, Go can be thought of as, "What C might have been if we could have done it."

Go really is in many ways more similar to early C in spirit than modern C is.


Expanding this for those not familiar with Go's history: Ken Thompson, formerly of Bell Labs and co-creator of B and Unix, was deeply involved in Go's early days. Rob Pike, also ex-Bell Labs, was also one of Go's principal designers.

I can't find a source now, but I believe Rob described Russ Cox as "the only programmer I've met as gifted as Ken." High praise indeed.


Go it's modelled after plan9'C [1-9]c = go cross compiling, and Limbo so it has a lot of sense. Both come from the same people after all. Unix->Unix8 -> Plan9 -> Go.


I think this allowance is a mistake.

I suspect that there is some huge number of developer hours that have been wasted, and huge amount of money wasted, on cleaning up after security breaches and finding and fixing security issues. I suspect that those numbers dwarf any losses that might have arisen due to reduced developer productivity or reduced performance when using a (hypothetical) C-like language that doesn't allow the compiler to do these sorts of things.


"a (hypothetical) C-like language that doesn't allow the compiler to do these sorts of things."

It's not very hypothetical in 2023. There are plenty of languages whose compilers don't do this sort of thing and attain C-like performance. There isn't necessarily a single language that exactly and precisely replaces C right now, but for any given task where you would reach for C or C++ there's a viable choice, and one likely to be better in significant ways.

I also feel like this is missed by some people who defend this or that particularly treatment of a particular undefined behavior. Yeah, sure, I concede that given the history of where C is and how it got there and in your particular case it may make sense. But the thing is, my entire point is we shouldn't be here in the first place. I don't care about why your way of tapping a cactus for water is justifiable after all if you consider the full desert context you're living in, I moved out of the desert a long time ago. Stop using C. To a perhaps lesser but still real extent, stop using C++. Whenever you can. They're not the best option for very many tasks anymore, and if we discount "well my codebase is already in that language and in the real world I need to take switching costs into account" they may already not be the best option for anything anymore.


I'm sorry, but the world doesn't really align with these ideas, sometimes that is. It's understandable, but at the same time it really isn't.


I assume you're referring to "my codebase is already in C/C++"? Which I did acknowledge?

Because otherwise, what the world is increasingly not aligning with is using C when you shouldn't be. Security isn't getting any less important and C isn't getting any better at it.


This reasoning is why software keeps getting slower and more bloated, build times increase, and latency goes up despite having orders of magnitude more compute power.


If whatever language you're thinking of does that, it isn't one of the ones I'm talking about. I sure as heck aren't talking about Python here. Think Rust, D, Nim, in general the things floating along at the top of the benchmarks (that's not a complete list either).


I don't see how this solves anything, Nim's backend is C, which means it should suffer from the same pitfalls. They probably clean it up and eliminate UB, but it should still exist.


Yeah, but with less bugs, more features, and faster development. I mean I hate Electron with a passion but it means everybody gets a client. Let’s not pretend that it’s all worse rather than a set of tradeoffs.


I'll stop using C when there is a faster alternative. When nanoseconds count, there is no competition.


You don't need one faster. You need one as fast. These options generally exist. Rust seems to have crept its way right up to "fast as C"; it isn't really a distinct event, but https://benchmarksgame-team.pages.debian.net/benchmarksgame/... (it tends to better on the lower ones so scroll a bit). There are some other more exotic options.

C isn't the undisputed speed king any more. It hasn't necessarily been "roundly trounced", there isn't enough slack in its performance for that most likely, but it is not the undisputed speed king. It turns out the corners it cuts are just corners being cut; they aren't actually necessary for performance, and in some cases they can actually inhibit performance. See the well-known aliasing issues with C optimizations for an example. In general I expect Rust performance advantages to actually get larger as the programs scale up in size and Rust affords a style that involves less copying just to be safe; benchmarks may actually undersell Rust's advantages in real code on that front. I actually wouldn't be surprised that Rust is in practice a straight-up faster language than C on non-trivial code bases being developed in normal ways; it is unfortunately a very hard assertion to test because pretty much by definition I'm talking about things much larger than a "benchmark".


The allowance wasn't "tradeoff performance time with developer time".

The point of undefined behavior is "if we specify this, we cause problems for some of the compile targets for this language".


Yes, but then complier vendors started abusing UB to increase performance and while silently decreasing safety/correctness. If the compiler creates a security bug by optimizing away a bounds check the programmer explicitly put there, that's a problem.

https://thephd.dev/c-undefined-behavior-and-the-sledgehammer...


The author, Russ Cox, was one of the inventors of Go.


  res, err := InterceptNuke(enemyNuke)
  if err != nil {
    fmt.Println("NUCLEAR LAUNCH DETECTED!")
    log.Fatal(err)
  } else {
   fmt.Printf("Phew, we're safe. Nuke intercepted after %d seconds.\n", res)
  }
Very clearly defined errors indeed.


On one hand this is sort of "duh" (none of the examples in the article were surprises to me). But I think it's very useful to phrase it this way. So many people seem to shrug off the dangers in using C (C++ of course has its issues, though I'd argue it's easier these days to use C++ correctly), especially for security-critical code. It may be a helpful argument to point out that C & C++ were not designed with the correctness of programs in mind.


I think DJB at some point expressed the desire for a boring compiler with an absolute minimum of UB, and I concur.

I want some sort of flag that disables all this nonsense.

* Uninitialized variable? Illegal, code won't compile.

* Arithmetic overflow? Two's complement

* Null pointer call? Either won't compile, or NULL pointer dereference at runtime worst case.

Yeah, I'm aware -fwrapw and the like exist. But it'd be nice to have a -fno-undefined-behavior for the whole set.


> Arithmetic overflow? Two's complement

I don't really want this either; I'd rather the program abort. The vast majority of situations where I'm using signed arithmetic, I never want a positive number to overflow to negative (and vice versa).

Unsigned arithmetic is already designed to wrap back to zero; that's useful for things like sequence numbers where wrapping is ok after you "run out".


This is possible, but unless processor accelerated comes at a great cost. You'd need a branch after every math, unless the compiler could prove the math wouldn't overflow.

The ARM mode described elsewhere in the thread where there's an overflow flag that persists across operations would help; then you could do a check less frequently.

A mode were you get a processor exception would be great, if adding that doesn't add significant cost to operations that don't overflow. Assuming the response to such an exception is expected to be a core dump, the cost of generating such an exception can be high; of course if someone builds their bignumber library around the exception, that won't be great.


Trapping on integer overflow was completely standard before C came along. Fortran and Pascal compilers did it. (Lisp compilers transparently switched to bignums :)


> You'd need a branch after every math

A predictable branch is basically free. In your case such a branch is almost never taken.


"basically free" is very different then actually "free". Adding a ton of extra branches comes at a cost. Even though it is cheap it still takes an instructions that could be used for something else. You fill your branch prediction target cache must faster, ejecting branches you actually care about. It also makes it harder for the compiler to move code around and harder for the prefetcher to break data dependencies. This all adds up to non-trival overhead. You can tell your C compiler to always check arithmetic, but most don't because of the cost.


Stuff like this sounds great but if you dig into the details of actually trying to implement it, it's not at all simple, and it would inevitably slow down the already-terrible compilation speed of C++.

We have static analysis tools which do a decent job. But like, there's a reason Clang has UBSan to detect undefined behavior at runtime. It's a hard problem.


Not really, at least for two of the three examples mentioned.

Failing compilation on an uninitialized variable is easy. The compiler can already warn about this situation (and gcc and clang at least allow you to promote individual warnings to errors). Making this default would be simple, and not at all a performance concern.

Allowing signed arithmetic to overflow (in a defined 2's complement manner) would be just exactly what the hardware does on modern machines, so there'd be no slowdown there. Sure, the compiler would no longer be allowed to omit code that checks for overflow, but that's fine: if the programmer truly doesn't care, they won't write that check in the first place.

(Changing these two behaviors might have backward-compatibility concerns, though.)

You are of course correct that NULL dereference checking is would incur a performance penalty at runtime. However, the compiler should be able to catch some subset of cases at compile time. At the very least, there could be a mode where it could at least warn you at compile-time that some dereferences could result in SIGSEGV. Unfortunately, I think it would be hard to get that warning to a point where there weren't a lot of false positives, so such a warning would be routinely ignored to the point of uselessness.


> You are of course correct that NULL dereference checking is would incur a performance penalty at runtime. However, the compiler should be able to catch some subset of cases at compile time. At the very least, there could be a mode where it could at least warn you at compile-time that some dereferences could result in SIGSEGV.

SIGSEGV on null pointer deference is not the problem. It's actually fine, in the sense that it predictably terminates the program. The problem is that modern optimizing compilers don't guarantee that SIGSEGV will happen; they may rewrite your program to do insane shit instead, as the example in the blog post shows. So we don't need NULL checks at runtime; we just need the compiler to stop doing insane optimizations.


> Failing compilation on an uninitialized variable is easy.

On uninitialised variable, sure, but impossible to tell at compile time if the program is using an uninitialised value.

Compiler authors also appear, to me, to be malicious.

In the past, they worked hard to provide warnings when they detected uninitialised variables.

In the rare case, now, that the compiler is able to tell that a line is using an uninitialised value, instead of issuing a warning, it simply removes the offending code altogether.

If compiler authors now were more like compiler authors of the past, they'd value the user interface enough to make any and all dead code a compile error.

Imagine if past compiler authors had their compiler, when seeing the use of an uninitialised variable, simply go ahead and remove any code that used that variable.

Imagine how poor a user experience that would be considered, and then ask yourself why they feel it is okay now to simply remove dead code.

There is no situation where actual source code lines should be removed rather than warned.

Typing on phone so not going to go into all the data flow explanations.


This already mostly exists. It means building with asan/msan/tsan/ubsan enabled. And... you'll slow your code down by several factors.


Sanitizers are debugging tools; they don't guarantee you'll actually catch issues when you run into them.


Right. This is why I said "mostly" in my post. If you want guaranteed detection of all forms of UB you'll need even higher overheads. Like, I really don't know how you'd design a system that is guaranteed to detect all data races.


With gcc you can use e.g. "-fsanitize=undefined,address -fsanitize-undefined-trap-on-error".

In my opinion, any C or C++ program must always be compiled with such options by default, both for debug and release builds.

Only when an undesirable influence on performance is measured, the compilation options can be relaxed, but only for those functions where this matters.


Just turn on all warnings in your C++ compiler, and make warnings the same as error. For example, uninitialized variables are easy to catch as a warning and then turned into an error. More sophisticated compilers can warn about the other issues too.


This handles a few cases but nowhere near all of them. Null pointer dereference, use-after-free, data races, and much much more are all global properties with no hope of the compiler protecting you.


We're only considering UB conditions here, not errors in general that may be impossible to detect. Every UB condition can be detected by the compiler because, after all, the compiler needs to check for UB to generate code. All it takes is for the compiler to generate warning/errors when this occurs. If this is not done by your compiler, ask them to add this feature instead of just complaining about UB in general.


> , after all, the compiler needs to check for UB to generate code.

No the compiler does not check for UB to generate code. A lot of UB (runtime out of bounds, use after free) are very difficult to detect statically (at least without specialized annotations).

The compiler applies transformations to the code that are valid (in the sense that they produce an equivalent program) only in the absence of UB.

So if the input program contains UB, the transformed program produced by the compiler may or may not be valid


All of the things I described are UB.


Correctness can come from the programmer in ways that performance cannot.

We usually aim for correctness, and in small programs we often achieve it 100%.

100% maximum performance is rarely achieved in a high level language.

If you try hard at absolute correctness, you can get there; not so with performance.

So, obviously, that must mean performance is harder than correctness.

If you turn a performance aspect of a system into a correctness requirement (e.g. real time domain, whether soft or hard) you have hard work ahead of you.

In programming, we often make things easier for ourselves by sacrificing performance. Oh, this will be executed only three times over the entire execution lifetime of the program, so doesn't have to run fast. Thus I can just string together several library functions in a way that is obviously correct, rather than write loops that require proof.

That said, if you have certain kinds of correctness requirements, C becomes verbose. What is easy to express in C is an operation on inputs which have already been sanitized (or in any case are assumed to) so that the operation will correctly execute. It's not a priority to make it convenient to do that validating, or to just let the operation execute with inputs and catch the problem.

E.g. say that we want to check whether there would be overflow if two numeric operands were multiplied. It gets ugly.


Performance can absolutely come from the programmer. It's a question of defaults. One can default to correctness and opt-in to performance (in the places where critically needed, often a small portion of the program) much more easily than one can default to performance and opt-in to correctness.


> Performance can absolutely come from the programmer.

To a point, sure. But your language and tools dictate a upper bound on speed. You are never going to get a trading app written in Python to be faster than your competitor's app written in C.


Yes, performance does come from how a program is written, like choice of data structures and algorithms.

But performance is also

- defined by fairly simple, measurable parameters that everyone agrees on, like execution time, code size or memory use;

- subject to automation: compilers can take a program which, if run literally as written, is slow, and speed it up. (Or make it smaller or require less memory.)

Correctness is not like this.

- There is no universal definition of correctness; it is the degree to which a program matches its arbitrary specification.

- Compilers cannot take a program that is incorrect and optimize it to have more correctness. They don't read the specification.

If I ask someone to write a program which demonstrates how a null pointer deference results in a predictable segmentation fault, then this will be a correct program:

  int main(void} { *((char *) NULL) = 0; }
A correct program could crash in this way even none of its requirements say that it should exercise null dereference or any other error on purpose.

A correct program could be given incorrect data, for which its behavior is not specified. The program is then being misused; the ill behavior doesn't demonstrate that it's incorrect.

A program which has no requirements for handling unexpected inputs, and does not do so, cannot be safely deployed in certain situations, like when it runs in one security domain, but inputs are coming from a different one. That situation implicitly adds requirements, which that program does not meet; effectively the program is interpreted as incorrect. That shows that there can be misunderstandings and disagreements about what is correct, which boils down to disagreeing on what is and isn't a requirement.

We would never disagree that a program which calculates something in 3 seconds is faster (on that input case) than one which does the same in 5 seconds. (All else being same; hardware, OS, ...)


C = Control

I think in the world of C, the compiler assumes the author knows what they're doing, and historically, you were probably supposed to use a separate tool, a linter [1] (static analysis tool), to help catch mistakes.

HN's hard disks were a recent (2022-07) victim of a firmware developer's lack of understanding of integer overflow exceptions. [2] The firmware was likely written in C, or "C++ C". A fix was released in 2020. Another reminder to update the firmware of these disks. [3]

++crash;

[1] https://en.wikipedia.org/wiki/Lint_(software)

[2] https://en.wikipedia.org/wiki/Integer_overflow

[3] https://www.thestack.technology/ssd-death-bug-40000-hours-sa...


Undefined behavior is the opposite of programmer control. As the examples in the blog post show, you can write code that explicitly deferences a NULL pointer, or enters an infinite loop, and the compiler will think surely the programmer didn't mean to do that, and literally remove code that you wrote.

It's true that historically C has not done much to protect programmers from their mistakes, but historically mistakes just meant suffering the natural consequences (such as a SIGSEGV on NULL pointer dereference). But these days, when you make a mistake, C compilers will exploit your mistake to the maximum extent possible, including changing the meaning of your programs.


IMHO, having an additional debug mode where all optimisations steps are performed but with asserts inserted to validate all preconditions would mitigate a lot of problems.

For example adding a "assert(x < MAX_INT - 100)" in first example.

Running test suites on this debug binaries would find a lot of problems.


The issue of UB has everything to do with how compilers implement it. If people are having problems, they should complain to compiler writers. They always have the option of creating slower code that checks for obvious problems like initialized variables. However, if a company/project writes a compiler that is a little slower than the competitor, people with almost always complain that it is a bad compiler. So the result is what you have nowadays: they're always looking for every small opportunity to generate faster code at the expense of safety.


Why complain to the compiler writers? As you say, people want the fastest compilers for their language, so compilers will prioritize that over other concerns. Users may rant and complain, but they won't use a slower compiler.

If you really want less UB, switch to a different language or change the language! Complain to the standards committee, have them define behavior that is currently undefined, or impose restrictions on allowable behaviors. Compilers are always going to optimize to the extent that the language standard allows, so change the standard.


> However, if a company/project writes a compiler that is a little slower than the competitor, people with almost always complain that it is a bad compiler.

That is bullshit. There are plenty of project that would gladly trade performance for more correctness. I would go as far to say most projects would make that choice if articles like this get mindshare.

“It’s mostly as fast as clang but errors upon UB” is an easy sell


"[..] this International Standard imposes no requirements on the behavior of programs that contain undefined behavior. "

What you describe fits for "implementation defined behavior". If you want to write code that works with different compilers, a single compiler with every UB defined, gains you nothing. If you don't need that flexibility, you can just use a language without UB in the first place.


You do. Take the EraseAll example

Ideally it's a compile time error. Less ideally it jumps to a NULL pointer and immediately crashes.

Either means the programmer must fix the code, which also stops being a problem on other compilers.


"Ideally it's a compile time error."

I agree with that, but the standard does not agree with both of us. My point is that choosing C or C++ makes sense if you see an advantage in programming against an ubiquitous and almost universal standard. If you have the freedom to implement against a particular compiler implementation there is no good argument to not also taking the freedom to choose a different language altogether.

I know in the real world this is not so easy. I worked in automotive with their certified C compilers long enough and I wouldn't have had the choice to select another language. Doesn't mean everyone wouldn't have been better off with a language without UB in the first place and I think we are getting there.


Nonsense. The standard literally doesn't bind us in any way whatsoever. UB means there are no rules, anything goes. If the compiler is free to make demons fly out of my nose, it's equally free to produce a nice error message to the effect of "Don't do that".

I'd much prefer if the standards people started defining every possible instance of UB as a fatal error or as an implementation defined behavior, but that's not strictly required.


Some instances of UB are data-dependent, and some would require solving the halting problem to statically distinguish them from non-UB. What you propose is therefore not generally possible at compile time, and at runtime only with considerable performance impact.


Thanks for pointing that out. Detecting UB is hard and sometimes impossible in C and C++. If you design your language accordingly you can avoid that. At least I think that is how Rust deals with it, but I'm happy to be convinced otherwise.


If some form of UB is impossible to detect, then the compiler cannot do anything about it either, making the whole debate useless. Any action taken by a compiler relating to UB must be for some cause that is detectable at compilation time.


That’s not quite true, the compiler could add code to detect all UB at runtime, and then abort or whatever. It’s just that this would also pessimize UB-free code, and most compilers opt to not do that, at least in release mode.


That is not true and very important to understand. The optimizations a compiler does by eliding UB stuff are often very effective and I think no one in the thread has questioned that. It is a fallacy, though, to think that these optimizations could be replaced by diagnostics. That would be hard in most cases and sometimes impossible.


So fix it to the extent possible. Eg, this: https://t.co/Z1Ib2fIyu6

Is easily fixable.

A. static Function Do; -- This is a syntax error, initialization is mandatory.

B. It's initialized to nullptr, and compiled to jmp 0x0


Yes, you can fix it in some cases, but the real errors usually happen in complex code where it is much harder or impossible to detect at compile time. The examples in TFA are very simplified for exposition purposes, they are not representative of ease-of-detection.


I don't see how this particular case could ever be hard to deal with.

Using uninitialized memory is UB, so initialize every byte of memory allocated to NULL. Dereference NULL pointers as-is. Done.

I'd be quite happy with a best effort approach. If you can't be perfect in every case, at least handle the low handling fruit.


This approach would also impact UB-free code that initializes the memory later. C compilers don’t want to pessimize such code. The difficulty lies in identifying only code with actual UB.


That's perfectly fine with me. If the compiler can prove the code is initialized twice, then it's free to deduplicate it. Otherwise I'll happily eat the cost and preserve more of my sanity when debugging.


You’ll happily eat the cost, but the target audience of C compilers traditionally doesn’t. That’s the point made by TFA. C prioritizes performance over safety.


I agree with you more than you think.

"if the standards people started defining every possible instance of UB as a fatal error or as an implementation defined behavior,"

We can wish for that, but it does not and never will.

Writing C or C++, first and foremost, means writing code against the standard and we have to deal with what the standard says.

Of course we are free to give up on the standard and target a particular implementation of C or C++ in a particular compiler. If you also like to to call that writing C or C++ I would not argue with you. It is still a very different thing because you give up the primary advantage of C and C++ that you can compile basically everywhere. My point now is that if you give that up it makes no sense to stick with C or C++ in the first place and in this day and age.


> Writing C or C++, first and foremost, means writing code against the standard

This was never true and never will be. There are tons of C and C++ code that cannot be compiled in more than a few compilers. The standard is the minimum denominator, but all compilers have something beyond the standard that is used in practice. Just check the Linux code, for example.

> if you give that up it makes no sense to stick with C or C++

No, the whole point of UB is that every compiler can do what it wants in that situation. C/C++ actively embrace differences between compilers, while trying to standardize the core meaning of the language.


Linux is an excellent example. There has been a fork that made it compile with the Intel compiler and if I remember correctly it was moderately faster. Of course it went nowhere.

Writing standard conformant code is hard and you do it if you have a good reason to do it. A lot of software has.

The Linux kernal hasn't and that is fine. If your project also doesn't have that constraint, good for you. 30 years ago you would still choose C and a particular compiler and you could get things done. Nowadays, why bother? You could choose a language that has no UB, compiles on any hardware that is reasonably common and is still performant.


> My point now is that if you give that up it makes no sense to stick with C or C++ in the first place and in this day and age.

You wouldn't be giving anything up. The standard defining something as UB means you absolutely shouldn't be doing that. So a compiler doing something defined in case of UB can't harm you in any way, and doesn't deviate from the standard, because the standard prescribes no rules in the case you're invoking.


With giving up I meant giving up portability by targeting a particular implementation of C or C++ in a particular compiler. The standard is what it is and discussing a hypothetical standard is moot.


You don't give up any portability.

If the standard says that say, dereferencing a NULL pointer is UB it means you're not supposed to do that ever, on any OS or compiler. A compiler can choose to do something sensible like producing a fatal error message without any downside, since per standard that's not ever supposed to happen anyway.


To not giving up portability all compilers would have to do what you propose. They won't as long as the standard doesn't mandate it. So we are back at the point where we agree that the standard contains some unfortunate things.

Now, standard conformant compilers do exactly what you propose, they are just not C or C++ compilers and the standard is not the C or C++ standard.


> To not giving up portability all compilers would have to do what you propose.

Not in a lot of cases. Eg, let's suppose that GCC defines a NULL pointer dereference to act exactly like a normal one. That is, the compiler dereferences the pointer, and whatever the CPU/OS says is going to happen when you read from 0x0, happens.

Meanwhile, Clang continues with behaviors that can say lead to a function being entirely replaced with "return true".

There's no problem whatsoever with this. You're still not supposed to dereference null pointers. It's just that on GCC in particular you get a predictable error. And being an error you can't really rely on it -- your program still crashes and crashes are still undesirable, and therefore you will fix whatever leads to that outcome. And on Clang maybe you don't crash but the program doesn't do what you expect it to, which is still a bug to be fixed. You have a bug in both versions which manifests in different ways (but that is fine, because UB says anything goes so there's no requirement whatsoever for both compilers to have the program break identically), but it's the same bug that needs the same fix.

After the bug fix, your GCC compiled version doesn't crash and runs correctly, and your Clang compiled version doesn't crash and runs correctly. The behavior in the end is identical.


> If the standard says that say, dereferencing a NULL pointer is UB it means you're not supposed to do that ever, on any OS or compiler.

Says who?

The standard doesn't say that. At most it says that if you're trying to write portable C, you shouldn't do that. But not everyone is trying to write something portable.


The standard does say it. UB means there are absolutely no rules regarding what happens. It's not implementation defined, it's "absolutely anything can happen and it can change unpredictably".


> UB means there are absolutely no rules regarding what happens.

I will acknowledge that the standard says this, but the original post said "you're not supposed to do that ever".

Those are not the same thing.


They are.

First, because the compiler isn't bound to any promise. If in version 2.0, you can still use memory immediately after freeing it, version 2.0.1 is perfectly free to make the program misbehave. No consistency has been promised.

Second, because per the linked article, compilers treat UB in a particular, weird way. Compilers assume you will not invoke it. Hence this example:

    #include <stdio.h>

    int main() {
        for(int i; i < 10; i++) {
            printf("%d\n", i);
        }
        return 0;
    }
Which a real compiler compiles to this:

    #include <stdio.h>

    int main() {
        return 0;
    }

Why? Because usage of an uninitialized variable is UB. UB is something a programmer doesn't intend to invoke. Therefore the compiler concludes "(undefined)i < 10" never happens. Therefore the loop never runs. Therefore the loop can be removed.

UB is not just non-portable, modern compilers treat it as a weird, radioactive state they avoid touching.


I acknowledge all of your points, and agree to everything you've said.

Except for the actual point that the compiler not being bound by any promise means you should never do it.

If you want to walk it back to "it's very rare that intentionally using UB is a good idea", I could agree with that. But never is a bridge too far.


> Except for the actual point that the compiler not being bound by any promise means you should never do it.

> If you want to walk it back to "it's very rare that intentionally using UB is a good idea", I could agree with that. But never is a bridge too far.

No. The point of writing code is having the computer do the thing you want. If there's no predictable outcome, then it should never be done.

BTW, the outcome can also change depending on optimization level. So this can produce code that works in debug mode and breaks in release mode or viceversa. If there's something I don't ever look forward to is a program that's impossible to debug because debugging changes its behavior.


> The point of writing code is having the computer do the thing you want.

Correct. We definitely agree here. Most importantly, we didn't talk about the specification at all in this sentence. The behavior of the computer is what we care about. And UB only exists in the spec, it's not a thing compilers or computers do (it's an absence of specification about what a theoretical compiler would do).

> If there's no predictable outcome, then it should never be done.

I also agree here. But, with a particular compiler there is almost always a predictable outcome.

> BTW, the outcome can also change depending on optimization level.

And here, I think you're actually being too conservative. It can change with much more than just the optimization level. If you're doing intentional UB, you'd want to freeze not only your optimization level, but your entire flag set. And changing any of them could lead to months of debugging and fixing things.

This trade-off is almost never worth it. But I maintain that the "almost" belongs in that sentence.


> I also agree here. But, with a particular compiler there is almost always a predictable outcome.

It's potentially variable depending on other things involved, like system headers you might be including, and system updates.

> And here, I think you're actually being too conservative. It can change with much more than just the optimization level. If you're doing intentional UB, you'd want to freeze not only your optimization level, but your entire flag set. And changing any of them could lead to months of debugging and fixing things.

Well, isn't that a pleasant prospect there. As somebody who reviews PRs my answer to anything along those lines is: over my dead body.

> This trade-off is almost never worth it. But I maintain that the "almost" belongs in that sentence.

Okay, I'll grant you it's a fine idea if you want to sabotage a project and drive the other people to madness. Which if you're being paid for it may be actually illegal.

Otherwise, I still go with "never". If you need to do something this bizarrely specific, if it's worth doing at all, then it's time to do it in assembly, where you can do any random weird thing you want without the compiler getting in your way.


I mean, just as an example, if you're working on a weird embedded system (so you can't really change compilers) where certain parts that are UB in the spec are guaranteed to have certain behavior?

Additionally assuming function pointers are the same size as void *, which isn't guaranteed by the spec, when you're working on platform specific code. Although they may start optimizing that one at some point.


"Performance versus correctness" is the same design tradeoff as "Worse is Better":

https://www.dreamsongs.com/RiseOfWorseIsBetter.html

However our tolerance to accept "worse" over "better" is waning since we have more capable hardware, better tools, and "worse" leads to more problems later such as security vulnerabilities.


>For example, a common thing programmers expect is that you can test for signed integer overflow by checking whether the result is less than one of the operands, as in this program:

    #include <stdio.h>

    int f(int x) {
        if(x+100 < x)
            printf("overflow\n");
        return x+100;
    }
>Clang optimizes away the if statement. The justification is that since signed integer overflow is undefined behavior, the compiler can assume it never happens, so x+100 must never be less than x. Ironically, this program would correctly detect overflow on both ones'-complement and two's-complement machines if the compiler would actually emit the check.

My god…


What the article misses is that the code wouldn't work on all hardware. Take MIPS for example, where the signed overflow would generate a hardware exception that the OS might have implemented to do anything from nothing to killing the generating process.

C was never standardized solely on the basis of what's most performant. The vast majority of explicit UB is there because someone knew of a system or situation where that assumption wasn't true.


> C was never standardized solely on the basis of what's most performant. The vast majority of explicit UB is there because someone knew of a system or situation where that assumption wasn't true.

So? They could have called it implementation defined. The reasoning you present was broken then, as it is broken now.

Your MIPS example displays no reason for UB to exist.


Implementation defined means it has a defined behavior on every implementation. In the MIPS case defining that behavior would force the compiler to generate strictly signed instructions for signed values and define how the runtime platform handles these interrupts, rather than leaving the compiler free to generate "whatever works".

Look, a lot of UB is frankly stupid and shouldn't be in a modern language spec, including signed overflow. I'm not defending that, only giving an example where you have to break compatibility to eliminate it.


> In the MIPS case defining that behavior would force the compiler to generate strictly signed instructions for signed values and define how the runtime platform handles these interrupts,

I respectfully disagree: in this particular scenario, it's enough for the compiler vendor to write "generates an interrupt on overflow" with no indication of what the handling should be, and still be well within implementation-defined behaviour.

After all, the standard specifies what `raise()` does, and what `signal()` does, but doesn't specify how the runtime is going to handle `raise(signum)` when the program has not yet called `signal(signum, fptr)`.

This scenario you presented displays exactly why UB should be replaced with IB in the standard. I've yet to see one good reason (including performance reasons) for why the dereferencing of a NULL pointer (for example) cannot be documented for the implementation.

With UB, we have real-world examples of a NULL pointer dereference causing compilers to omit code resulting in a program that continued to run but with security implications. If changed to IB, the compiler would be forced to emit that code and let the dereference crash the program (much better).


I'm not convinced a "common" C program would ever do such a thing. Even the "Effective C" book (one of the best books on C, imo) discusses why this is wrong and why you should take advantage of `limits.h` for bounds checking.

This is just bad programming. The compiler, like usual, is correct because you're not in reality checking anything. You've made an obviously non-sensical statement that the overflowed value will be less than the value. Compiler optimizes it away. You can argue the semantics of UB here but this particular UB is borderline a solved problem in any practical sense.

To be fair, a static analyzer should be able to catch this.


>You've made an obviously non-sensical statement—

Therefore it should fail to compile.

You can even spin it as a performance enhancement: the whole program can be optimized away!


The problem is, that’s not how the logical inference in a compiler/optimizer works. It’s very difficult to translate such an optimization back to “this statement has been optimized away”, in the general case.


If it's so difficult to figure out if the consequences of their optimization game are sensible or not, then they shouldn't play it in the first place.

Who actually wants it to behave this way, other than compiler engineers competing on microbenchmarks? Who is this language even for?


It’s a side effect of desirable optimizations for UB-free code. You can’t have both at the same time.


Desirable for whom? Are the beneficiaries of it going to pay for the externalities they are generating, like polluters should?

You can say "ordinary workaday programmers benefit from speed improvements en passant", but they weren't really given a choice in the matter, were they? When programmers are given an explicit binary choice of "correct, but slightly slower", and "wrong, but slightly faster", they pick the former in practically all cases (or they should, at any rate). But they can't make this choice; the compiler and spec writers go behind their backs and construct these inscrutable labyrinths, then blame everyone else for getting lost in them.


-fwrapv is there for people to use. People don't use it.

Yes defaults matter, but they matter both ways. People benchmark with the default options and a compiler with -fwrapv turned on by default will lose those benchmarks and the "ordinary workaday programmers" still end up with trading correctness for speed, since one person at their workplace ran one benchmark once and picked the compiler that won in speed.


You cannot check for overflow like this, because you're causing it. There are other ways to do this without adding first:

  #include <limits.h>

  int safe_add(int a, int b) {
    if (a > 0 && b > INT_MAX - a) {
        /* deal with overflow... */
    } else if (a < 0 && b < INT_MIN - a) {
        /* deal underflow.. */
    }
    return a + b;
  }


This is why I hate it when people describe C as "portable assembler". The Usenet comp.lang.c FAQ was already thirty years ago warning people not to write overflow checks like this.


I think what really annoys me is that this looks like actual malice on the part of the standards writers, and less severe malice on the part of the compiler authors.

I know I should attribute it to stupidity instead but ...

The standards writers could have made all UB implementation defined, but they didn't. The compiler authors could have made all Uab implementation defined, and they didn't.

Take uninitialised memory as an example, or integer overflow:

The standard could have said "will have a result as documented by the implementation". They didn't.

The implementation can choose a specific behaviour such as "overflow results depend on the underlying representation" or "uninitialised values will evaluate to an unpredictable result."

But nooooo... The standard keeps adding more instances of UB in each revision, and the compiler authors refuse to take a stand on what overflow or uninitialised values should result in.

Changing the wording so that all UB is now implementation defined does not affect blegacy code at all, except by forcing the implementation to document the behaviour and preventing a compiler from simply optimising out code.


The standard does make all definable UB implementation defined. The compiler writers intentionally misread the standard to allow these kinds of optimizations.

C17 Draft, § 3.4.3: "Possible undefined behavior ranges from...

. . . ignoring the situation completely with unpredictable results, to . . .

. . . behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message) . . .

. . . to terminating a translation or execution (with the issuance of a diagnostic message)."

Compiler writers like to language-lawyer this in two ways. First, they interpret "possible" to mean that this is a non-exhaustive list, but that isn't how interpretation of such a document typically works. Expressio unius est exclusio alterius -- the express inclusion of one is the exclusion of others -- means that any list is exhaustive unless explicitly indicated otherwise. In other words, the reality is that, according to the standard, these are the only possible options.

Second, they interpret "ignoring the situation completely" to mean "actively seeking out the situation and deleting whole sections of code based on that." This is quite self-evidently a dishonest interpretation.


The optimization opportunities do indeed come from the first option, "ignoring the situation completely with unpredictable results". That is: the compiler assumes that the undefined behavior will not happen, and optimizes based on it. They do not "actively seek out the situation", it's in fact the opposite, they assume that the situation simply won't happen. And "deleting whole sections of code" is just the normal "unreachable code" optimization: code which cannot be reached on any feasible execution path will be deleted.


That is an obvious misreading.

Assume you have an unsigned char array r[] declared with 100 elements and the compiler encounters the following line of code:

r[200] = 0xff;

"ignoring the situation completely with unpredictable results" means translating that to the appropriate object code without bounds checking and no guarantees are made about whether that's a segfault or what your environment does.

"assum[ing] that the situation simply won't happen" means eliding the code entirely.

The first is required by the standard. The second is prohibited by the standard.

There is no section of any version of the standard that permits compilers to assume that anything which might be UB or predicated on UB is dead code.

Compilers must either ignore the fact that it's UB, or behave in a documented implementation-defined manner, or stop translation.


While you have convinced me that the compiler authors are more culpable than the standards committee, I still feel that they are ultimately responsible - the buck stops with them.

They're free to, and easily able to, add the word "exhaustively" when listing possible behaviours, but they don't. They can even do away with it entirely and replace it with implementation-defined, which makes compilers non-conformant if they optimise out code.


You're quoting from a footnote. The footnotes are not part of the normative text, so how you choose to interpret it is irrelevant.

The actual definition of undefined behavior given by § 3.4.3 is: "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements."

Since no requirement is imposed, compilers can do as they want.


> You're quoting from a footnote. The footnotes are not part of the normative text

I am not quoting from a footnote.

Footnote 1 pertains to § 1.

Footnote 2 pertains to § 3.19.5.

There are no footnotes pertaining to § 3.4.3.

And do you have a citation for the proposition that footnotes or notes of any other kind are not normative?


Apologies, it's a note, not a footnote. But notes in ISO standards aren't normative either.

This is a general principle of ISO standards, see https://share.ansi.org/Shared%20Documents/Standards%20Activi...:

> 6.5 Other informative elements

> 6.5.1 Notes and examples integrated in the text

> Notes and examples integrated in the text of a document shall only be used for giving additional information intended to assist the understanding or use of the document. They shall not contain requirements ("shall"; see 3.3.1 and Table H.1) or any information considered indispensable for the use of the document, e.g. instructions (imperative; see Table H.1), recommendations ("should"; see 3.3.2 and Table H.2) or permission ("may"; see Table H.3). Notes may be written as a statement of fact.


I don't view "You shall not do this" and "If you do this, it's meaningless" as identical statements.

Besides, clang does the same thing with -std=c89, where this provision was emphatically not a note.


I think it's a bit out of line to call this malice.

The C standard was written when people wanted a language that was easy to read and write, but could have similar performance to hand-written assembly. Performance, as the article title points out, was a bigger priority than correctness or eliminating foot guns.

Most of us here weren't around when many/most programs were written in assembly, and when machines were very constrained in how much memory they had and how many CPU cycles they could spare. Nowadays, performance concerns usually come behind speed and correctness of development. So it's hard to truly feel what trade offs the original designers of C (and the original C89 standards committee) had to make.

I'm not convinced implementation-defined behavior is really any better, though. Packagers and end-users should not have to ensure that the code their compiling is "compatible" with the compiler or hardware architecture they want to use. Developers shouldn't have to put a "best compiled with FooC compiler" or "only known to work on x86-64". That would not be an improvement.

I do agree that, these days, there's no excuse for writing language specifications that include undefined behavior. But C has a lot of baggage.


> I think it's a bit out of line to call this malice.

I apologise, but note (warning: weasel words follow) I was careful to say it looks like malice to me, not that I have any indication that it actually was malice.

> I'm not convinced implementation-defined behavior is really any better, though. Packagers and end-users should not have to ensure that the code their compiling is "compatible" with the compiler or hardware architecture they want to use.

I think it is better, purely because then the resulting code can't simply be omitted when an integer may overflow, the code still has to be emitted by the compiler.

Right now all the worst bits of UB has to do with the compiler optimising out code. With IB replacing UB, the implementation will have to pick a behaviour and then stick with it. Much safer.


That is usually a myth.

From the people that were there at the time.

"Oh, it was quite a while ago. I kind of stopped when C came out. That was a big blow. We were making so much good progress on optimizations and transformations. We were getting rid of just one nice problem after another. When C came out, at one of the SIGPLAN compiler conferences, there was a debate between Steve Johnson from Bell Labs, who was supporting C, and one of our people, Bill Harrison, who was working on a project that I had at that time supporting automatic optimization...The nubbin of the debate was Steve's defense of not having to build optimizers anymore because the programmer would take care of it. That it was really a programmer's issue.... Seibel: Do you think C is a reasonable language if they had restricted its use to operating-system kernels? Allen: Oh, yeah. That would have been fine. And, in fact, you need to have something like that, something where experts can really fine-tune without big bottlenecks because those are key problems to solve. By 1960, we had a long list of amazing languages: Lisp, APL, Fortran, COBOL, Algol 60. These are higher-level than C. We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine. This is one of the reasons compilers are ... basically not taught much anymore in the colleges and universities."

-- Fran Allen interview, Excerpted from: Peter Seibel. Coders at Work: Reflections on the Craft of Programming


I fail to see to the point of your post in context of this thread.

What, in the parent post, do you consider a myth?


C's original performance.

Optimization taking advantage of UB had to be introduced during the 1990's to make it into a reality.

Until the widespread of 32 bit hardware, even junior Assembly devs could relatively easy outperform C compilers.

Which is what the interview is about, originally C lacked the optimization tools, relying on developers, with tons of inline Assembly.


> Which is what the interview is about, originally C lacked the optimization tools, relying on developers, with tons of inline Assembly.

I'm afraid that is not the conclusion I draw from the snippet you posted.

It's very clear to me that the HLL under question at that time prevented the programmer from performing the low-level optimisations that the C programmer could optionally do. The debate appeared to be be whether to let the language exclusively optimise the code, or just do no optimisation and let the programmer do it.

This is why the sentence ": Oh, yeah. That would have been fine. And, in fact, you need to have something like that, something where experts can really fine-tune without big bottlenecks because those are key problems to solve. " is in there - it's because the HLL languages literally wouldn't let the programmer optimise at the level that C would let the programmer optimise.

Honestly, if the proponent for the HLL language didn't add in "Use the HLL, but not for OS and low-level code where you really need to fine-tune", you'd have a point, but they said it, and so you don't.


Right, that is why C code was polluted with inline Assembly until optimizers started taking advantage of UB.

Inline Assembly isn't C.


Of course inline assembly is C. Most languages don't let you do inline assembly. That isn't because they don't like it, but because the way those languages get translated to machine code is highly variable. C is the only language where the semantic model is simple enough that you can see how every language construct would translate to assembly language and write your own inline assembly language in between.


It definitely isn't.

What page of C standard describes it?

As for the rest of your comment, unless you are using a PDP-11, you will be surprised.


The more important detail is which "standard C" recognises inline asm as a common as dirt C extension.

Page 514 (of version n1256) of the ISO C Standard

Moving target, to be, sure, but it's been there for 20+ years now.

Annex J.5.10 The asm keyword

to quote a stack overflow answer (that sounds a lot like an answer commonly given back in the day on Usenet and #C IRC channels)

     informative, not normative, so an implementation need not provide inline assembly, and if it does it's not prescribed in which form. But it's a widespread extension, though not portable since compilers do indeed implement it differently.
https://www.open-std.org/JTC1/SC22/WG14/

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf


C predates the C standard, which describes a common subset of C implementations. It is useful for writing code which works across a variety of implementations. For example, you can write code that works on the Byzantine optimising compilers, which will attempt to render your code incorrectly if at all possible, but will make strictly conforming programs run very quickly. But it is not necessary to write code to the C standard, and many C implementations do not conform to it. The Plan9 C compiler, for example, expects 'void main(void)' for a program's main function, while the standard requires 'int main(void)'.


Lets make it easier then, what page of K&R C describes inline Assembly?


A book about C isn't a comprehensive guide to every C implementation.


Likewise compiler specific extensions don't define universal truths about languages, that aren't part of the respective standard.

Had you been clever, you would have pointed out that inline Assembly is defined in the ANSI C++ standard, with the mention of asm (content) language construct, leaving the meaning of content implementation defined.

As it is also defined in the Ada language standard, Modula-2 standard revision, and Oakwood Guidelines for Oberon-2 implementation.

That is the thing mixing up C with compiler specific features, confusing the language for its standard and assuming no other language in the planet offers similar capabilities.


I am not remotely interested in what the ISO standard says about C, and I am even less interested in what the ANSI standard says about C++, a completely different language.

C is not defined by or constrained to the standard. It predates it and every implementation extends it, as is intended by the standard.

The person confusing the standard with the language is you.


My recollection is it was trying to keep up with Fortran's performance on numeric code as an impetus for taking advantage of UB (variations on what eventually became the "restrict" keyword were discussed but rejected for c89 and that idea comes directly from Fortran, which can assume arguments do not alias)


> The standards writers could have made all UB implementation defined, but they didn't.

This is not possible without performance impact, because, in the general case, whether a program constitutes UB depends on data and/or previous program flow (halting problem), and thus would require additional runtime checks even for code that happens to never run into UB.


> less severe malice on the part of the compiler authors.

This is optimization run amok and I'd call it malice. If C/C++ are designed to let me shoot myself in the foot, the compiler should let me shoot myself in the foot instead of erasing my code.


This is why I implemented two's-complement with unsigned types, that don't have UB on overflow.


like x > 2147483547u in this case?


Correct. It will still overflow, but it's not UB and won't be optimized away.


by 'overflow' do you mean 'set the high bit'


Correct.

I have routines that take the unsigned values and interpret them as signed, two's-complement values.

So you would write this:

    val = temp1 + temp2;
    if (y_ssize_lt(val, temp1)) y_trap();
That `y_ssize_lt()` function computes whether `val` is less than `temp1` as though they were signed, two's-complement integers. But because they are actually unsigned, the compiler cannot be malicious and delete that code.


i see, thanks

maybe we should develop a nonmalicious compiler



This has to be the worst part of UB: That the compiler can assume it never happens. This way UB can affect your program even if the code that would exhibit the behavior is never executed.


If anyone tried to pull this kind of language lawyering in my job they would be fired

Maybe it's about time to fire the C compiler developers


-fwrapv and it's all good :)


It really isn't optional as a C developer to turn on warnings and get your code warning-free. This will eliminate the most common unbounded behavior issues, like uninitialized variables, though unfortunately not all of them.


linux, qemu, postgres and many other projects written by "C developers" use and rely on fwrapv, fno-strict-aliasing and others.


They also turn on warnings, so they catch uninitialized variables and the like.


This is why people who put "C/C++" on their resume don't usually know either language. In C++, integers are defined to be two's complement now, so that check is acceptable.


That, or they know that C23 includes two’s complement[1].

[1]: https://en.cppreference.com/w/c/23


Integer overflow is still undefined behavior in C23.


Not true, as the post explains.


C++20 made the switch. See P1236. It appears C23 has followed suit.


If P1236 (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p12...) is really what it says ("alternative wording") then I don't believe that's true. Certainly P0907R3 (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p09...) is clear:

> Status-quo If a signed operation would naturally produce a value that is not within the range of the result type, the behavior is undefined. The author had hoped to make this well-defined as wrapping (the operations produce the same value bits as for the corresponding unsigned type), but WG21 had strong resistance against this.

My understanding is that the 2s complement change in C++20 ended up defining bitwise operations but not arithmetic.


Two's complement does not mean signed overflow became defined. So the check is still wrong. And no, I do not think making it defined would lead to more correct programs. You then simply have really difficult wrap-around bugs instead of signed overflow which you can be found more easily with UB sanitizer or turn into traps at run-time.


It made the switch, sure, but overflow remains undefined.


Maybe I'm completely misunderstanding things, but this article mentions arithmetic overflow as being a problem in C and C++ but isn't this exactly a problem in Go as well? There's no checked arithmetic in Go, right?

I guess the difference between Go and C/C++ is that Go at least won't optimize away the overflow check? But it still doesn't insert it for you either.


In Go, overflow is not checked, but it is well-defined in the spec including `programs may rely on "wrap around"`: https://go.dev/ref/spec#Integer_overflow


>The problem is that C++ defines that every side-effect-free loop may be assumed by the compiler to terminate. That is, a loop that does not terminate is therefore undefined behavior. This is purely for compiler optimizations, once again treated as more important than correctness.

It is crazy that people decided to do it. It not only weird, but also wrong.

There's no such a thing like "side-effect-free" code.

There's always side-effect like CPU usage, temperature, mem usage, etc.

There is whole class of attacks based on those.


> There's no such a thing like "side-effect-free" code.

This kind of like “Ceci n'est pas une pipe.” In that it’s useful to remind people that there is no such thing as side-effect-free code, strictly speaking, but as soon as you remind people, you go back to talking about whether a particular piece of code has side effects. Someone points at object depicted in the painting and asks what it is, and you say “It’s a pipe.”

Basically, to reason about code, it’s useful to ignore certain classes of side effects, most of the time.


> There's always side-effect like CPU usage, temperature, mem usage, etc.

If you are writing a loop in your code with no classic side effects hoping to regulate temperature, that’s probably a bad and unreliable design to start with, whether or not the compiler optimizes it away.

If you read a temperature sensor to validate that the loop is doing the right thing and adjust frequency based on the result, its still a little crazy but its less completely insane and no longer free of classix side effects.


All I'm saying is that code doesn't have to perform stuff like prints, file write, http call in order to generate side effect

You can run heavy computation and do not do anything with the result just in order to increase CPU usage, right? (which then should result in temp increase ;))


The C abstract machine isn't a physical machine, so temperature is not a side effect :)


What is meant are side effects as defined by the C abstract machine. It has no concept of CPU speed, temperature, etc.


Side-effect is a Word of Power in the context of the standard with very specific meaning that doesn't include, for example, temperature increase. Even if you use it as a Control key replacement.


This is a cool summary of UB, but speaking to the title... Well yeah. Did anyone suggest otherwise?


It should be "C and C++ Implementations" because nothing in the standard requires UB to be exploited for optimization instead of adding run-time checks.


The standard also doesn't require UB to have run-time checks, which is probably for the goal of performance. There was an implementation in mind when it was designed.


There were C implementations with bounds checking or implementations that trap on use of an invalid pointer or on read of uninitialized variables etc. A large part of UB was introduced for such reasons and not for performance. The standard was (and still is) about allowing a wide range of implementations.


To me this article is interesting because

* it is thorough, detailed and very thoughtful

* rsc’s essays usually end up with a major Go language change, so there’s a good chance this article is the seed of some change to Go and undefined behavior, or correctness or performance.

Even if it’s not a major upcoming change (I bet it is, tho) rsc is an extremely insightful dev.


There is no major upcoming Go change related to this post. This is just a much longer version of https://research.swtch.com/plmm#ub that I drafted years ago and finally cleaned up enough to post.


It’s fine, I’m not sad or anything


Yeah, I agree. I was editing my comment to praise the article's body as you were replying.


A periodic reminder is good.


Clearly we can have compilers and static analyzers that can catch a lot of what's UB in the standards.

To me the only real question is: what should the default be for your compiler? I don't want to be flooded with warnings when I'm working on a small file of handcrafted near-assembly code but I'm willing to change my compiler/tool options away from some more conservative, non-standard default settings.


It isn't super obvious as someone interested in C or C++ what compiler flags and static analyzers I should turn on.

I mean (and I'm asking because I'd genuinely like to bookmark someplace) is there a group that keeps an up-to-date list of aggressive flags and static analyzers to use?


I'm not sure about static analyzers, but here are the clang warnings I use for my personal C++20 projects. You will get an enormous number of warnings from typical C++ code. I fix all of them, but doubt it would make sense to do so in a commercial environment.

-Weverything - this enables every clang warning

-Wno-c++98-compat - warns about using newer C++ features

-Wno-c++98-compat-pedantic - warns about using newer C++ features

-Wno-padded - warns about alignment padding. I optimize struct layout, so this warning only reports on cases I couldn't resolve.

-Wno-poison-system-directories - I'm not cross-compiling anything.

-Wno-pre-c++20-compat-pedantic - warns about using newer C++ features


For static analysis I use CodeChecker, it's a wrapper on top of the Clang static analyzer and Clang tidy (linter). Now also supports cppcheck, but I disabled it (too many false alarms). It's free and open source, and I find it useful. Make sure you use it with a version of LLVM/Clang with support of Microsoft z3 enabled (it's the case in Debian stable, so should be OK in most distros).

For the flags I would start with "-Wall -Werror", then maybe disable some warnings based on the code base / use.

All this assuming a GCC/Clang friendly code base.


Not that I know of, probably something you'll have to dig into based on your use case. I've used Coverity for static analysis, which is great but pricey. We have a corporate license so no-brainer for me.


-Wall -Werror (in cland and in think on GCC) is a good way to start. In general, clang is great at warning at UB. Also, address-san is a great tool


Regarding uninitialized variables, there is a proposal to make them default to zero-initialized:

http://wg21.link/P2723

Under "5. Performance" section, it claims the performance impact to be negligible, and there is also a mechanism to opt-out.


I guess we are rediscovering the 40 year old wisdom: https://www.dreamsongs.com/RiseOfWorseIsBetter.html


There are a lot of warning and error options. Turn the guiderails and sanitizers on during development and testing. Turn the optimizations on when you ship.

Check out: -wall -werror https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html


This is likely in response to this relevant related submission discussed yesterday:

A Guide to Undefined Behavior in C and C++ (2010)

https://news.ycombinator.com/item?id=37165042 (166 comments)

A worthwhile read, IMO. Cheers.


This needs a NSFW tag! Good lord...


The C++ standard library prioritises backward compatibility first and fore-most over performance or correctness. Thats why some elements of the C++ standard library are slower than their counterparts in slow, interpreted languages.


rsc should really know better, he has been around long enough.

C and C++ have as a basic design rationale

    "as close to machine performance as possible"

    "don't pay for what you don't use"

    "don't break backwards compatibility"
The last one is what is what is responsible for most of the issues. It would be nice to fix stuff like UB, but the reason that C/C++ is so popular and still alive is precisely because it does care more about compatibility than fixing stuff likethat.

Breaking existing code is extremely uncool for long-lived codebases.


Undefined behavior permits backwards compatibility to be broken, because when a compiler adds a new optimization that exploits UB, existing programs can change behavior, breaking code that previously worked. The blog post cites infinite loops as an example. Thus, eliminating UB would improve backwards compatibility.


Compatibility is defined in terms of a contract between program and language implementation. If the program violates the contract by containing UB, then compatibility does not apply.

Taking your argument, compilers couldn’t even add any benign optimization without breaking compatibility, because that would change the runtime of the affected code.


> Compatibility is defined in terms of a contract between program and language implementation. If the program violates the contract by containing UB, then compatibility does not apply.

Indeed, furthering my point that having UB in a language undermines backwards compatibility.

> Taking your argument, compilers couldn’t even add any benign optimization without breaking compatibility, because that would change the runtime of the affected code.

That does not follow from what I said.


The point is, you have to define compatibility with regards to what. Otherwise nothing can ever change. The purpose of the C standard is to provide that reference point, via its definition of what constitutes a conforming program and a conforming language implementation. You only get compatibility problems when you step out of that scope.


the basic premise is wrong: it's not a tradeoff, which would require someone to say "doing this correct and expected thing blows up because another correct and expected thing can go faster".

the problem is mixing in incorrect or unexpected things (like someone altering a global via a debugger). to some extent, it's fair to complain about archaic nonportabilities (such as whether dereferencing NULL will fault, or long-gone numeric representations) - the standards committees should take a deep breath and DTRT in those cases.


> C and C++ do not require variables to be initialized on declaration (explicitly or implicitly) like Go and Java. Reading from an uninitialized variable is undefined behavior.

Took me some time to understand that part :)

I always thought that an unassigned variable of type slice T was deemed uninitialized. (but then, what of assigning nil to a previously non-nil variable? Is it considered deinitialized?)

In fact, at the language level it could be considered uninitialized/deinitialized. That's the variable typestate. (cf definite assignment, ineffectual assignment analysis)

For the compiler, it still "points" to a memory location so it is initialized.

Am I right? (more than 344 comments, no one will find this one question lol :-)


Note that in Zig an unsigned overflow is also an UB in release unchecked mode. Don't like it? Don't use unchecked mode or use wrapping operators..


> To some extent, all languages do this: there is almost always a tradeoff between performance and slower, safer implementations.

Well, probably except Rust, from which you get both performance and safety. Sometimes it's with a cost of ergonomics, though. Some people, including the Rust's creator himself, would trade performance for better ergonomics in the case of asynchronous programming. They prefer having a non-zero-cost green threads to the current state of async prog in Rust.


Rust does tradeoff performance for safety. Example: integer division [1].

[1]: https://godbolt.org/z/71sxnfff5


Not always all of the performance. Rust encourages array access with superfluous bounds checks, unnecessary UTF8 validation, and uses tagged sum types over the more memory efficient raw union types.


Professional C/C++ coders tend to recognize the history of cross-platform compatibility issues, and tend to gravitate to a simpler subset of the syntax to avoid compiler ambiguity (GNU gcc is ugly, but greatly simplified the process).

Try anything fancy, and these languages can punish you for it later on down the life-cycle. However, claiming it lacks correctness shows a level of ignorance.

Best of luck, =)


If only we had a modern language that did both... (which was strangely omitted from this article)


Masterful conclusion in the last paragraph!


one day, a bug like this will take down an airliner or a power grid and kill thousands of people. and there will be a congressional hearing about it. and some ageing compiler engineers will have to explain to a panel of irate and disbelieving senators the concept of UB. they will have to explain that, while computers can add two numbers correctly, we choose to make them not do that. for performance.


I have some of my opinions about how it should do.

> Uninitialized variables

The compiler should be allowed to do one of:

1. Assume any arbitrary value, as long as it remains consistent until a new value is assigned (which might or might not depend on the previous value). (However, the exact value need not be known at compile-time, and might not even be known at run-time if the program doesn't care about the exact value (in the example if the loop is removed then the exact value doesn't matter at run-time). It might be different every time the program runs.) Removing the loop in the example is valid; it would also be valid to start at a negative number (since you did not specify an unsigned type), but if for example you declare i before the loop instead of inside and then you add printf("%d\n",i); after the loop, then the program is required to print a number which is at least 10 and can fit in the int type; if standard library optimizations are enabled then it would be allowed to e.g. replace it with puts("42"); instead (but if the loop is not removed, then the number printed by this must be 10).

2. Read the value from a register or memory which is then used as the storage for that variable (for whatever time it needs to, just as though it was given a value which was returned from a function with no side-effects), but without initializing it.

3. Do whatever the target instruction set does (which seems very unlikely to me to do anything other than the above).

> Arithmetic overflow

The C specification should specify two's complement (it is the only good representation of integers anyways). Fortunately GCC has -fwrapv and -ftrapv, and I often use -fwrapv to avoid problem with signed overflow.

> Infinite loops

There is a rationale with some sense, but maybe better would be nevertheless not optimizing out the loop entirely like that. However, if standard library optimizations are enabled then it should be allowed to replace the loop with:

  for(;;) pause();
> Null pointer usage

In this case, the program should just actually attempt to read, write, or call the null address, which should result in a segfault or other run-time error if possible (instead of being optimized out). However, the compiler is allowed to assume that such a call can never return, if it wishes (whether or not that is actually true on the target computer).

In the case of a read or write that is not marked as volatile, the compiler may substitute any value for a read and may optimize out a write, although it does not have to; it may also do what is specified above.

However, the optimization shown in the article should be allowed if Do is a local variable which is uninitialized, rather than being initialized as null. (You can disable optimizations (or possibly change some options) if you don't like that.)


If you don't like undefined behavior then avoid it. It's not hard, unless you're not paying attention.


This is a really, really misguided take on C and C++...

They don't prioritize anything. They are just bad languages with a random assortment of features mostly reflecting how people thought about computers in the 70's.

Just think about this: vectorization is one of the obvious ways to get better performance in a wide range of situations. Neither C nor C++ have any support for that. Parallelism -- C++ kinda has something... C has zilch. Memory locality anyone?

I mean, Common Lisp has a bunch of tools to aid compiler in optimizing code in the language itself, whereas C has like... "inline" and the ability to mess with alignment in structs, and I cannot really think about anything else. Stuff like UB isn't making the language perform better or easier to optimize. It's more of a gimmick that compiler authors for C and C++ found to produce more efficient code. It's a misfeature, or a lack of feature, that allowed for accidental beneficial side-effects. Intentional optimization devices in a language are tools that prove the code to be intentionally correct first, and that proof allows the compiler to elide some checks or to generate a more efficient equivalent code, based on the knowledge of the correctness of the proof (or, at least, upon explicit instructions from the code author to generate "unsafe" code).


> vectorization is one of the obvious ways to get better performance in a wide range of situations.

Or, well, in a narrow range of situations where you have certain kinds of hardware with vectorization support.

> C has zilch

C has operating systems written in it which use parallel processing internally and make parallel processing available to applications. True, that may be done using techniques that are not described in ISO C.

Common Lisp has undefined behavior, a familiar example being (rplacd '(1 . 2) 3): clobbering a literal object.

Optimizations in Common Lisp are exactly like UB in C: you make declarations which specify that certain things hold true, and in an unsafe mode like (declare (optimize (safety 0) (speed 3))), the compiler blindly trusts your assertions and generates code accordingly. If you asserted that some variable is a fixnum, but it's actually a character string, that character string value's bit pattern will likely be treated as a fixnum. Or other consequences.

Common Lisp is nice in that you can control speed and safety tradeoffs on a fine granularity, but you do trade safety to get speed.

Common Lisps (not to mention Schemes) have extensions, just like C implementations. In plenty of code you see things like #+allegro (do this) #+sbcl (do that).


> certain kinds of hardware

This certain kind of hardware exists on virtually all consumer-grade and server-grade CPUs produced in the last decade. This is where optimizations matter in the first place.

It doesn't matter that operating system written in C: it has no concurrency in the language. Here are examples of concurrency in the language: in Java, you have "synchronized", in Go you have "select" in Erlang you have "receive", in Ada you have "task" and so on. These languages provide some way to deal with concurrency and language implementations must have concurrency primitives.

There's nothing like that in C. You don't have to implement C in such a way that it supports concurrency in any way. No part of the standard requires it (unlike in C++, where you have threads since C++11).

> Optimizations in Common Lisp are exactly like UB in C:

My guess is that you have never even seen Common Lisp. You just wrote absolute nonsense.


> This is where optimizations matter in the first place.

It's also where optimizations often don't matter.

> My guess is that you have never even seen Common Lisp

(invoke-restart 'next-guess)

> C: it has no concurrency in the language.

That is now literally false. ISO C now has threads and atomics.

Also, C has supported signals for eons: C code can be interrupted to run a signal handler, which requires reentrancy.

However, it is correct that concurrent systems have been written in C without support in the language. In C, concurrency can be in a library/module, and you can develop that as part of the system from the ground up.

This has enabled OS researchers to explore new concurrency mechanisms. The stagnant backwaters of Eiffel and Ada would not have come up with the Futex, for instance. Or RCU.

In C, when you write a function that works only with its local variables, that is understood to be reentrant (with pretty much every compiler on the planet except maybe something for an 8 bit microcontroller with a kilobyte of RAM.)

So that is actually one very important, basic piece of language-level support for concurrency.

Code can be written in C to run on bare hardware such that with only a minimal amount of glue, C functions can handle interrupts.

When a C function accesses external resources, that is obvious, and those accesses can be protected using external means also. That's generally how it works.


If optimization doesn't matter, then why are you talking about optimization?

This is just plain stupid... Vectorization is a huge subject in many different kinds of software, RDBMs, scientific computing / HPC, even advertising... and then comes you and tells all those people spending billions of dollars a year on optimization that it... doesn't matter? How much does it have to matter before it "really" matters?

Well, then you've never understood what you were dealing with. Or your reading-comprehension skills are so bad, that you cannot understand it when someone describes the difference between something that happens intentionally and something that happens through oversight... But, my guess is that you are just pretending to be stupid, because you enjoy trolling, and that's your way to do it.

C signals aren't a required part of the language... (a "standard compliant" implementation doesn't have to have them) but C language is just poorly designed. There shouldn't have been a thing like "not required part of the language", but it happened out immediate convenience of someone who didn't want to apply themselves more to making a better design.

Exactly the same is true for threads and atomics. A "standard compliant" implementation doesn't have to have them. Portable code shouldn't use this stuff.

Read it again: this discussion is about language, not libraries. If you think that ability to make a system call makes a language concurrent, then it's a worthless definition, because this blurs the important distinction between languages that are designed to be concurrent and those which aren't. Everything short of stuff like HTML becomes a concurrent language, but you cannot have any meaningful discussion about benefits or downsides of an approach to concurrency, because it's "outsourced" to something outside the language.

> C functions can handle interrupts.

What... like processor interrupts? What on Earth are you talking about... Are you getting your text from a chatbot or something?


> Vectorization is a huge subject in many different kinds of software, RDBMs, scientific computing

A lot of which is written in C. C compilers have support for vector instructions.

Vector instructions can also be used via language extensions (that can look like a macro library and can be made to work without vectorization).

> C signals aren't a required part of the language.

ISO C defines signals, which are a required part of a hosted implementation; same as printf or malloc. It is not a "full blown" signal specification like in POSIX. sigprocmask isn't in C, neither is sigsetjmp/siglongjmp.

> like processor interrupts?

Yes. Interrupt handling requires some processor-specific code to which the interrupt is vectored first; it does certain necessary things like save the machine state and acknowledge the interrupt. From that, C code can be dispatched. The meat of the interrupt handling is in C. Lots of systems are written on the "bare hardware" like that.

> this discussion is about language, not libraries.

Since I'm an actual Lisp person, I do not care about this difference.

"Language" is just the core special forms handled by an interpreter or compiler; library is everything else, including the run-time support needed by those special forms.

The "special forms" part of C brings little to the table in terms of concurrency other than the ability to write reentrant code. From that departure point, you can do a lot.

> There shouldn't have been a thing like "not required part of the language"

Interfaces and constructs that are described in a standard, but optional, still exist. There is a difference between that and not standardizing it at all.

Optional or not is mainly just a game of words: what can be implemented and what can be left out such that the implementation can still call itself "conforming".

If you make, say, threads required, then what it means is that C implementations for small systems will opt out of implementing that anyway, and just not be able to call themselves conforming.

Nothing changes; just the game of words and labels plays out differently.

> Everything short of stuff like HTML becomes a concurrent language

That is false. For instance Awk isn't a concurrent language. We can't write a re-entrant function in Awk that can be called safely at interrupt time.

Most of what defines a concurrent language is in the model of the run-time. For instance, the concurrency in Java actually comes from the JVM, and there are other languages on the JVM which borrow the concurrency model.

> Exactly the same is true for threads and atomics. A "standard compliant" implementation doesn't have to have them. Portable code shouldn't use this stuff.

If a program uses ISO C threads or atomic and needs to ported to some system/compiler where they are not available, it is likely possible to implement them in the program itself. I could probably knock it off in under a week.


Wow, don’t get so invested in hating a technology, it’s just a tool.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: