Hacker News new | past | comments | ask | show | jobs | submit login
Clang vs. Clang (cr.yp.to)
248 points by dchest 5 months ago | hide | past | favorite | 399 comments



> compiler writers refuse to take responsibility for the bugs they introduced, even though the compiled code worked fine before the "optimizations". The excuse for not taking responsibility is that there are "language standards" saying that these bugs should be blamed on millions of programmers writing code that bumps into "undefined behavior"

But that's not an excuse for having a bug; it's the exact evidence that it's not a bug at all. Calling the compiler buggy for not doing what you want when you commit Undefined Behavior is like calling dd buggy for destroying your data when you call it with the wrong arguments.


I think this is actually a mistake by the author since the rant is mostly focused on implementation defined behavior, not undefined.

The examples they give are all perfectly valid code. The specific bugs they're talking about seem to be compiler optimizations that replace bit twiddling arithmetic into branches, which isn't a safe optimization if the bit twiddling happens in a cryptographic context because it opens the door for timing attacks.

I don't think it's correct to call either the source code or compiler buggy, it's the C standard that is under specified to the author's liking and it creates security bugs on some targets.

Ultimately though I can agree with the C standard authors that they cannot define the behavior of hardware, they can only define the semantics for the language itself. Crypto guys will have to suffer because the blame is on the hardware for these bugs, not the software.


The blog post does, at the very end, mention the thing you should actually do.

You need a language where you can express what you actually meant, which in this case is "Perform this constant time operation". Having expressed what you meant, now everybody between you and the hardware can co-operate to potentially deliver that.

So long as you write C (or C++) you're ruined, you cannot express what you meant, you are second guessing the compiler authors instead.

I think a language related to WUFFS would be good for this in the crypto world. Crypto people know maths already, so the scariest bits of such a language (e.g. writing out why you believe it's obvious that 0 <= offset + k + n < array_length so that the machine can see that you're correct or explain why you're wrong) wouldn't be intimidating for them. WUFFS doesn't care about constant time, but a similar language could focus on that.


> what you actually meant, which in this case is "Perform this constant time operation".

This notion is not expressible in most (or almost all) high-level programming languages. Or, to put it another way: If you wrote C or C++ you never mean to say "perform this guaranteed-constant-time CPU operation" (ignoring asm instructions and hardware-aware builtins).


not only does it mention that, the author also implemented that language, jasmin, with some collaborators. he does mention that, but he doesn't mention that he'd already done it previously, the unreleased qhasm


> You need a language where you can express what you actually meant, which in this case is "Perform this constant time operation". Having expressed what you meant, now everybody between you and the hardware can co-operate to potentially deliver that.

Yeah, even with assembly, your timing guarantees are limited on modern architectures. But if you REALLY need something specific from the machine, that's where you go.


Most architectures have a subset of their instructions that can be called in constant time (constant time meaning data-independent time). Things like non-constant pointer loads and branches are obviously out, and so are div/mod in almost all chips, but other arithmetic and conditional moves are in that set.

CPUs are actually much better at making those guarantees than compilers are.


That's no longer true: https://www.intel.com/content/www/us/en/developer/articles/t... x86 hardware does not guarantee anything unless you enable "Data Operand Independent Timing Mode", which is disabled by default and AFAIK can only be enabled by kernel-level code. So unless operating systems grow new syscalls to enable this mode, you're just out of luck on modern hardware.


In the most purely theoretical sense, you are correct, but everything Intel and AMD has said indicates they will still offer strong guarantees on the DOIT instructions:

https://lore.kernel.org/all/851920c5-31c9-ddd9-3e2d-57d379aa...

In other words, they have announced the possibility of breaking that guarantee years before touching it, which is something the clang developers would never do.


If you tell the compiler, it can generate code with only those instructions.


And besides, Assembly should not be a scary thing.

Even most managed languages provide a way to get down into Assembly dumps from their AOT and JIT compiler toolchains.

Maybe we need some TikTok videos showing how to do such workflows.


There's nothing terribly difficult to writing specific routines in asm. It's kinda fun, actually. Assembly _in the large_ is hard, just because you need to be next-level organized to keep things from being a mess.


I find it rather easy as long as you don’t have to interact with the OS. That’s where it becomes messy, with inflexible data structures and convoluted arguments in ABIs designed to be used from higher level languages.

If you are doing really low level stuff, sometimes it’s worth rolling out your own little RTOS with just the bits and pieces you need.

Not that long ago, I realised most 8-bit (and MS-DOS) computer games had their own RTOS woven into the game code, with multi-tasking, hardware management, IO, and so on.


Yeah...Old 8-bit computers didn't exactly have any OS services. :)


Some very minimal things such as reading or writing to disk or files, reading the keyboard, or other very simple activities


> it's the C standard that is under specified to the author's liking

Isn't this unreasonable? Here we are, 52, years down the road with C et al. and suddenly it's expected that compiler developers must consider any change in the light of timing attacks? At what point do such new expectations grind compiler development to a halt? What standard would a compiler developer refer to to stay between the lines? My instincts tell me that this would be a forever narrowing and forever moving target.

Does timing sensitivity ever end? Asked differently: is there any code that deals in sensitive information that can't, somehow, be compromised by timing analysis? Never mind cryptographic algorithms. Just measuring the time needed to compute the length of strings will leak something of use, given enough stimuli and data collection.

Is there some reason a cryptographic algorithm developer must track the latest release of a compiler? Separate compilation and linking is still a thing, as far as I know. Such work could be isolated to "validated" compilers, leaving the insensitive code (if that concept is even real...) to whatever compiler prevails.

Also, it's not just compilers that can "optimize" code. Processing elements do this as well. Logically, therefore, must we not also expect CPU designers to also forego changes that could alter timing behavior? Forever?

I've written a lot of question marks above. That's because this isn't my field. However, invoking my instincts again: what, short of regressing to in-order, non-speculative cores and freezing all compiler development, could possibly satisfy the expectation that no changes are permitted to reveal timing differences where it previously hadn't?

This all looks like an engineering purity spiral.


> Isn't this unreasonable? Here we are, 52, years down the road with C et al. and suddenly it's expected that compiler developers must consider any change in the light of timing attacks?

We already went through a similar case to this: when the C++11 multithreaded memory model was introduced, compiler authors were henceforth forced to consider all future optimizations in light of multithreading. Which actually forced them to go back and suppress some optimizations that otherwise appeared reasonable.

This isn't to say the idea is good (or bad), but just that "compiler development will grind to a halt" is not a convincing against it.


It is completely unreasonable though to assume that a compiler should now preserve some assumed (!) timing of source operations.

It would be reasonable to implement (and later standardize) a pragma or something that specifies timings constraint for a subset of the language. But somebody would need to do the work.


An attribute for functions that says "no optimisations may be applied to the body that would change timings" seems like a reasonable level of granularity here, and if you were conservative about which optimisations it allowed in version zero it'd probably not be a vast amount of work.

I'm sort of reminded of the software people vs. hardware people stuff in embedded work, where ideally you'd have people around who'd crossed over from one to another but (comparing to crypto authors and compiler authors) there's enough of a cultural disconnect between the two groups that it doesn't happen as often as one might prefer.


Why not just specify "all branches of this code must execute in the same amount of time", and let the compiler figure it out for the architecture being compiled for?


Instruction duration isn't constant even within the same arch. You cannot have branches in constant-time code.

I do wonder though how often cpu instructions have data-dependent execution times....


(Correction: You cannot have data-dependent branches. You can of course have branches to make a fixed-iterations for loop, for example)


The difference is that threads have enormously wider applicability than timing preservation does.


That is orthogonal to whether compiler development would grind to a halt though, was my point.


The author is somewhat known for his over-the-top rants. When you read his writing in the context of a 90's flamewar, you'll find them to be quite moderate and reasonable. But it comes from a good place; he's a perfectionist, and he merely expects perfection from the rest of us as well.


> he's a perfectionist, and he merely expects perfection from the rest of us as well.

Nicely put, but at the end perfectionism is a flaw.


Not when computer security is concerned.


In all things, moderation. Security must be evaluated as a collection of tradeoffs -- privacy, usability, efficiency, etc. must be considered.

For example, you might suspect that the NSA has a better sieve than the public, and conclude that your RSA key needs to be a full terabyte*. We know that this isn't perfect, of course, but going much beyond that key length will prevent your recipient from decrypting the message in their lifetime.

* runtime estimates were not performed to arrive at this large and largely irrelevant number


Security is a tradeoff. Perfect security is not using a computer.


Security is a constant tradeoff, and trading off time for perfectionism is not a good one to take.


We had ~30 years of "undefined behaviour" practically meaning "do whatever the CPU does". It is not new that people want predictable behaviour, it simply wasn't a talking point as we already had it.


When were those decades? Any decent optimizing compiler built in the last couple of decades exploits undefined behavior.


You pretty much answered your own question. ~20 years ago and back. But I think it is also worth pointing out that it has gotten worse, those 20 years has been a steady trickle of new foot guns.


It's not even that. Yes, in case of signed intger overflow, usually yoj get whatever the CPU gives as answer for the sum. But you also have the famous case of an if branch checking for a null pointer being optimized away. And even in the case of integer overflow, the way to correctly check for it isn't intuitive at first, because you need to check for integer overflow without the check itsef falling under UB.

EDIT: just to make my point clear: the problem with UB isn't just that it exists, it is also that compiler optimizations make it hard to check for it.


When do NULL checks get optimised away, apart from the case where the pointer has already been accessed/is known at compile time to be invalid?


I couldn't find again the example I saw many years ago, I could only find a GCC bug that has been fixed now. So you're likely right and I just didn't remember correctly. But we still can have the zeroing out of memory before deallocation that can get silently optimized away. Maybe also naive attempts at checking for signed integer overflow could be silently optimized away. My general point is that, if the compiler determines that the code has dead instructions/unneded checks, it is very likely that the programmer's mental model of the code is wrong. So, just like the case of using an integer as a pointer or vice versa, we should have a warning telling the programmers that some code is going to not be compiled. Also in the case of the null pointer check: this would make the programmer realize that the check is happening too late and should instead be performed earlier


We have ill defined behaviour, implementation defined behaviour, erroneous behaviour, unspecified behaviour, undefined behaviour.

Undefined behaviour isn't exactly what most people think it is.


> Is there some reason a cryptographic algorithm developer must track the latest release of a compiler?

Tracking the latest release is important because:

1. Distributions build (most? all?) libraries from source, using compilers and flags the algorithm authors can't control

2. Today's latest release is the base of tomorrow's LTS.

If the people who know most about these algorithms aren't tracking the latest compiler releases, then who else would be qualified to detect these issues before a compiler version bearing a problematic optimization is used for the next release of Debian or RHEL?

> Logically, therefore, must we not also expect CPU designers to also forego changes that could alter timing behavior?

Maybe? [1]

> freezing all compiler development

There are many, many interesting areas of compiler development beyond incremental application of increasingly niche optimizations.

For instance, greater ability to demarcate code that is intended to be constant time. Or test suites that can detect when optimizations pose a threat to certain algorithms or implementations. Or optimizing the performance of the compiler itself.

Overall I agree with you somewhat. All engineers must constantly rail against entropy, and we are doomed to fail. But DJB is probably correct that a well-reasoned rant aimed at the community that both most desires and most produces the problematic optimizations has a better chance at changing the tide of opinion and shifting the rate at which all must diminish than yelling at chipmakers or the laws of thermodynamics.

[1]https://en.m.wikipedia.org/wiki/Spectre_(security_vulnerabil...


> This all looks like an engineering purity spiral.

To get philosophical for a second, all of engineering is analyzing problems and synthesizing solutions. When faced with impossible problems or infinite solution space, we must constrain the problem domain and search space to find solutions that are physically and economically realizable.

That's why the answer to every one of your questions is, "it depends."

But at the top, yes, it's unreasonable. The C standard specifies the observable behavior of software in C. It does not (and cannot) specify the observable behavior of the hardware that evaluates that software. Since these behavior are architecture and application specific, it falls to other tools for the engineer to find solutions.

Simply put, it isn't the job of the C standard to solve these problems. C is not a specification of how a digital circuit evaluates object code. It is a specification of how a higher level language translates into that object code.


> short of regressing to in-order, non-speculative cores

I guess you are referring to a GPU cores here.

It is a joke but can hint that in-order non-speculative cores are powerful computers nonetheless.


They are a totally different kind of powerful computer though, you can't compare them for sequential workloads.


You make a fine point. If you follow this regression to its limit you're going to end up doing your cryptography on a MCU core with a separate, controlled tool chain. TPM hardware has been a thing for a while as well. Also, HSMs.

This seems a lot more sane than trying to retrofit these concerns onto the entire stack of general purpose hardware and software.


Where suffer means "not be lazy, implement the assembly for your primitives in a lib, optimize it as best as you can without compromising security, do not let the compiler 'improve' it"


But then you're not writing C, except maybe as some wrappers. Wanting to use C isn't laziness. Making it nearly unfeasible to use C is the most suffering a C compiler can inflict.


There's no reason that C should be suitable for every purpose under the sun.


Fiddling some bits cross-platform is supposed to be one of them.


As was pointed out elsewhere, fiddling bits with constant time guarantees isn't part of the C specification. You need a dedicated implementation that offers those guarantees, which isn't clang (or C, to be pedantic).


It's not in the spec right now but it still feels solidly in C's wheelhouse to me.


Fortunately you don't have to go 100% one way or the other. Write your code in C, compile and check it's correct and constant time, then commit that assembly output to the repo. You can also clean it up yourself or add extra changes on top.

You don't need to rely on C to guarantee some behaviour forever.


I agree that an extension (e.g. a pragma or some high-level operations similar to ckd_{add,sub,mul}) that allows writing code with fixed timing would be very useful.

But we have generally the problem that there are far more people complaining that actually contributing usefully to the ecosystem. For example, I have not seen anybody propose or work on such an extension for GCC.


The problem doesn't stop there, if you want to ensure constant time behaviour you must also be able to precisely control memory loads/stores, otherwise cache timings can subvert even linear assembly code. If you have to verify the assembly, might as well write it in assembly.


The cryptographic constant time requirement only concerns operations that are influenced by secret data. You can't learn the contents of say a secret key by how long it took to load from memory. But say we use some secret data to determine what part of the key to load, then the timing might reveal some of that data.


Not when the precise timing of operations matters.


The optimizations are valid because the C standard and the semantics of its abstract machine don't have a concept of timing.


The problem is that c and c++ have a ridiculous amount of undefined behavior, and it is extremely difficult to avoid all of it.

One of the advantages of rust is it confines any potential UB to unsafe blocks. But even in rust, which has defined behavior in a lot of places that are UB in c, if you venture into unsafe code, it is remarkable easy to accidentally run into subtle UB issues.


It’s true that UB is not intuitive at first, but “ridiculous amount” and “difficult to avoid” is overstating it. You have to have a proof-writing mindset when coding, but you do get sensitized to the pitfalls once you read up on what the language constructs actually guarantee (and don’t guarantee), and it’s not that much more difficult than, say, avoiding panics in Rust.


In my experience it is very easy to accidentally introduce iterator invalidation: it starts with calling a callback while iterating, add some layers of indirection, and eventually somebody will add some innocent looking code deep down the call stack which ends up mutating the collection while it's being iterated.


I can tell you that this happens in Java as well, which doesn’t have undefined behavior. That’s just the nature of mutable state in combination with algorithms that only work while the state remains unmodified.


Okay, but that’s the point. This is UB in C/C++, but not in Java, illustrating the fact that C and C++ have an unusually large amount of UB compared to other languages.


That is not UB. That is simply mutable data. The solution here is static analysis (Rust) or immutable persistent collections.


Depending on your collection iterator invalidation _is_ UB. Pushing to a vector while iterating with an iterator will eventually lead to dereferencing freed memory as any push may cause the vector to grow and move the allocation. The standard iterator for std::vector is a pointer to somewhere in the vector's allocation when the iterator is created, which will be left dangling after the vector reallocates.


In the context of C++ and STL it is UB.

They are in the process of rewording such cases as erroneous instead of UB, but it will take time.


…what’s the difference?


If it is UB, the compiler is allowed to optimize based on the assumption that it can't happen. For example, if you have an if in which one branch leads to UB and the other doesn't, the compiler can assume that the branch that led to UB will never happen and remove it from the program, and even remove other branches based on "knowing" that the branch condition didn't happen.

If it's simply erroneous, then it behaves like in every other language outside C and C++: it leaves memory in a bad state if it happens at runtime, but it doesn't have any effect at compile time.


What exactly does "a bad state" mean?


A state in which memory is not expected to be based on the theoretical semantics of the program.

For example, if you do an out of bounds write in C, you can set some part of an object to a value it never should have according to the text of the program, simply because that object happened to be placed in memory next to the array that you wrote past the end of.

According to the semantics of the C abstract machine, the value of a non-volatile object can only change when it is written to. But in a real program, writes to arbitrary memory (which are not valid programs in C's formal semantics) can also modify C objects, which would be called a "bad state".

For example, take this program, and assume it is compiled exactly as written, with no optimizations at all:

  void foo() {
    int x = 10;
    char y[3];
    y[4] = 1;
    printf("x = %d", x);
  }
In principle, this program should print "10". But, the OOB write to y[4] might overwrite one byte of x with the value 1, leading to the program possibly printing 1, or printing 266 (0x010A), or 16777226 (0x0100000A), depending on how many bytes an int has and how they are laid out in memory. Even worse, the OOB write may replace a byte from the return address instead, causing the program to jump to a random address in memory when hitting the end of the function. Either way, your program's memory is in a bad state after that instruction runs.


This is just UB.


Yes, this is an example of UB leaving memory in a bad state.

If you want an example of something that is not UB leaving memory in a bad state, here is some Go code:

  global := 7;
  func main () {
    go func() {
      global = 1000000;
    }()
    go func() {
      global = 10
    }()
    fmt.Printf("gloabl is now %d")
  } 
The two concurrent writes may partially overlap, and global may have a value that is neither 7 nor 10 nor 1000000. The program's memory is in a bad state, but none of this is UB in the C or C++ sense. In particular, the Go compiler is not free to compile this program into something entirely different.

Edit: I should also note that a similar C program using pthreads or win32 threading for concurrent access is also an example of a program which will go into a bad state, but that is not UB per the C standard (since the C standard has no notion of multithreading).


The C standard definitely has opinions on races


Please share a link.


I mean like C11 has a whole atomics addition and memory model to go with it


I'm familiar with that very sort of bug, but I don't see how it's a failure of the language. To be convinced of that I think I'd at least need to be shown what a good solution to that problem would look like (at the level of the language and/or standard library).


Rust statically enforces that you have exclusive access to a collection to mutate it. This prevents also having an active iterator.

You also have languages using immutable or persistent data structures in their std lib to side-step the problem.


So surely you know by hear the circa 200 use cases documented in ISO C, and the even greater list documented in ISO C++ standard documents.

Because, me despite knowing both since the 1990's, I rather leave that to static analysis tools.


I've spent hours debugging a memory alignment issues. Its not fun. The problem is that you don't know (at first) the full space of UB. So you spend the first 10 years of programming suffering through all kinds of weird UBs and then at the end of the pipeline claims "pftt, just git gud at it. C is perfect!".


Maybe I got lucky, because on my first C job I got told to make sure to stick to ISO C (by which they probably mostly meant not to use compiler-specific extensions), so I got down the rabbit hole of reading up on the ISO specification and on what it does and doesn’t guarantee.

Making sure you have no UB certainly slows you down considerably, and I strongly prefer languages that can catch all non-defined behavior statically for sure, but I don’t find C to be unmanageable.

Memory alignment issues only happen when you cast pointers from the middle of raw memory to/from other types, which, yes, is dangerous territory, and you have to know what you are doing there.


It isn't so much that it is unintuitive, for the most part[1], but rather that there are a lot of things to keep track of, and a seemingly innocous change in one part of the program can potentially result in UB in somewhere far away. And usually such bugs are not code that is blatantly undefined behavior, but rather code that is well defined most of the time, but in some edge case can trigger undefined behavior.

It would help if there was better tooling for finding places that could result in UB.

[1]: although some of them can be a little surprising, like the fact that overflow is defined for unsigned types but not signed types


I think the reason that signed integer overflow is undefined is that it wasn't uncommon at the time to have architectures that didn't represent signed integers with 2's complement.

Even today you may find issues with signed integers. The ESP32 vector extensions seem to saturate signed integer overflow [1].

[1]: https://bitbanksoftware.blogspot.com/2024/01/surprise-esp32-...


I agree. I do not find UB very problematic in practice. It is still certainly a challenge when writing security sensitive code to fully make sure there is no issue left. (also of course, model checker, or run-time verified code such as eBPF etc. exist).

But the amount of actual problems I have with UB in typical projects is very low by just some common sense and good use of tools: continuous integration with sanitizers, using pointers to arrays instead of raw pointers (where a sanitizer then does bounds checks), avoiding open coded string and buffer operations, also abstracting away other complicated data structures behind safe interfaces, and following a clear policy about memory ownership.


Would you mind sharing how you became sensitized to UB code? Did you just read the C spec, carefully digest it, and then read/write lots of C? Or do you have other recommendations for someone else interested in intuiting UB as well?


I hung out in comp.std.c, read the C FAQ (https://c-faq.com/), and yes, read the actual language spec.

For every C operation you type, ask yourself what is its "contract", that is, what are the preconditions that the language or the library function expects the programmer to ensure in order for their behavior to be well-defined, and do you ensure them at the particular usage point? Also, what are the failure modes within the defined behavior (which result in values or states that may lead to precondition violations in subsequent operations)? This contractual thinking is key to correctness in programs in general, not just in C. The consequences of incorrect code are just less predictable in C.


What helped me was to instrument older game engine version build with Clang's UB sanitizer and attempt to run it for few weeks. Granted I had to approve the research with management to have that much time but I have learned some things I have never seen in twentyish years of using C++.


I'm sorry but OP seems to be vastly overestimating their abilities. Every study about bugs related to UB show that even the best programmers will make mistakes, and often mistakes that are nearly impossible to have prevented without static tools because of the action-at-a-distance nature of the harder ones (unless you had the whole code base in your head, and you paid enormous attention to the consequences of every single instruction you wrote, you just couldn't have prevented UB).


> Every study about bugs related to UB

Are about C++. There's an order of magnitude difference in the cognitive level to visually spot UB in C code vs visually spotting UB in C++ code.


You mean studies from Google, which explicitly has a culture of dumbing down software development, and heavily focuses on theoretical algorithmic skills rather than technical ones?


Google hires the best developers in the world. They pay well beyond anyone else except the other big SV tech giants, who compete for the best. I don't work for them but if money was my main motivator and they had jobs not too far from me I would totally want to. My point is: don't pretend you're superior to them. You're very likely not, and even if you are really good, they're still about the same level as you. If you think they're doing "dumb" development, I can only think you're suffering from a very bad case of https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect , without meaning disrespect.


Ignoring the fact this isn't even true, you're completely misunderstanding my point.

As I said, Google does not prioritize for technical expertise, primarily because that's quite individual-centric. They're a large organization and their goal is to make sure people are formatted and replaceable.

They hire generally smart people and mold them to solve problems with the frameworks that they've previously built, with guidelines that are set such that anyone can come and pick it up and contribute to it, in order to facilitate maintenance in case of turnover or changing teams.

They also hire a lot of people straight out of university, with many people spending large portions of their career there without seeing much of the outside world.

As a result, their workforce is not particularly adept about using third-party tools in a variety of situations; they're optimized to know how things work at Google, which is its own sub- (or arguably mono-)culture.

Being an expert requires using a tool in many diverse codebases for widely different applications and domains. A large organization like this is not a particularly good data point to assess whether people can become good experts knowledgeable about the gotchas of a programming language.


> their goal is to make sure people are formatted and replaceable.

Why else would corporations be the exclusive authors of "middle-level" languages like Java, C#, and Go? ;p JS and Python are too slow, but we can't find enough C, C++, or Rust developers! Let's invent these abominations instead with which we can churn out more good-enough mediocrity per quarter than ever before!


You say this, and yet every single major project written in C has undefined behavior issues. The Linux kernel even demanded and uses a special flag in GCC to define some of this UB (especially the most brain dead one, signed integer overflow).


The linux kernel nowadays uses the fact that signed overflow is UB to detect problems using sanitizers. It turns out the defined unsigned wraparound is now the hard problem.


> but “ridiculous amount” and “difficult to avoid” is overstating it

Maybe you can argue that C doesn't have a “ridiculous amount” of UB (even though the number is large), but C++ is so much worse I don't think saying it's “ridiculous” is off the mark.

And not only the amount is already ridiculous but every new feature introduced in modern versions of C++ adds its own brand new UB!


If you count the number of UB in the standard, then yes, 200 cases is high. There is some ongoing effort to eliminate many of them. But it should also be noted, that almost all of those cases are not really problematic in practice. The problematic ones are signed overflow, out-of-bounds, use-after-free, and aliasing issues. Signed overflow is IMHO not a problem anymore because of sanitizers. In fact, I believe that unsigned wraparound is much more problematic. Out-of-bounds and use-after-free can be dealt with by having good coding strategies and for out-of-bounds issues I expect that we will have full bounds safety options in compilers soon. Aliasing issues can also mostly be avoided by not playing games with types. User-after-free is more problematic (and where the main innovation of Rust is). But having a good ownership model and good abstractions also avoids most problems here in my experience. I rarely have actual problems in my projects related to this.


> Signed overflow is IMHO not a problem anymore because of sanitizers

IIRC the overflow in SHA3's reference implementation was hard to catch also for ststic analisys tools, and had the practical impact of making it easy to generate collisions.


A sanitizer could have transformed this into a run-time trap.


> Out-of-bounds and use-after-free can be dealt with by having good coding strategies

You're basically saying that every project in the wild has bad “coding strategy”…

> I expect that we will have full bounds safety options in compilers soon

Which will be disabled in most places because of the overhead it incurs.

> But having a good ownership model and good abstractions also avoids most problems here in my experience. I rarely have actual problems in my projects related to this.

It's easier said than done when you have no way of enforcing the ownership, and practically intractable when not working alone on a codebase.


No, I am not saying that every project in the wild has a bad "coding strategy". Some of the most reliable software I use everyday is written in C. Some of this I use for decades without every encountering a crash or similar bug. So the meme that "all C code crashes all the time because of UB" is clearly wrong. It is not intractable, in my experience you just have to document some rules and occasionally make sure they are followed. But I agree that a formal system to enforce ownership is desirable.


> So the meme that "all C code crashes all the time because of UB" is clearly wrong.

It's not about crash at all, but “all software has security vulnerabilities because of UB” is unfortunately true.

> It is not intractable, in my experience you just have to document some rules and occasionally make sure they are followed.

If even DJB couldn't get that part perfectly I'm pretty certain you cannot either.


Not all software is security relevant. I run a lot of large simulations. I do not care at all if that software would crash on specially prepared inputs. I do care that it's as fast as possible.


> Not all software is security relevant

You're right, only software that has actual users cares about security.

> I run a lot of large simulations. I do not care at all if that software would crash on specially prepared inputs.

But it's not the 50s anymore and digital simulation is a tiny fraction of the code ever written nowadays so it's not a very good argument.

> I do care that it's as fast as possible.

You don't realize it but it ruins your entire argument. If speed is all that matters for your use-case then:

- there's no way you can use runtime bound-checks, and you unconditionally need the compiler to optimize as much as possible around UB, even if it breaks your program every once in a while.

- you likely can't afford dynamic memory allocation, which makes the UAF/double free kind of bugs irrelevant. Not out of “good coding strategy” but because you never free in the first place…

These are hypothesis that don't apply to software industry at large.


Right and there are other ways to achieve strong security guarantees than memory safety, e.g. at the OS level by sandboxing critical operations.


1. It's much more expensive that using a memory-safe language in the first place (maybe cheaper it if you have a big codebase already, but still very expensive and not worth it at all for new code)

2. Sandbox escapes are commonplace, and not everything can even be sandboxed at all.


if anything, I think the wording “ridiculous amount” of UB in the context of C is understating things.


This is about asking the compiler for a constexpr and receiving a runtime evaluation, not ownership semantics.


I think the biggest problem is people conflating "undefined" with "unknowable". They act like because C doesn't define the behavior you can't expect certain compilers to behave a certain way. GCC handles signed overflows consistently, even though the concept is undefined at a language level; as goes many other UBs. And the big compilers are all pretty consistent with each other.

Is it annoying if you want to make sure your code compiles the same in different compiler sets? Sure, but that's part of the the issue with the standards body and the compiler developers existing independent of each other. Especially considering plenty of times C/C++ have tried to enforce certain niche behaviors and GCC/ICC/Clang/etc have decided to go their own ways.


This is dead wrong, and a very dangerous mindset.

All modern C and C++ compilers use the potential of UB as a signal in their optimization options. It is 100% unpredictable how a given piece of code where UB happens will actually be compiled, unless you are intimately familiar with every detail of the optimizer and the signals it uses. And even if you are, seemingly unrelated changes can change the logic of the optimizer just enough to entirely change the compilation of your UB segment (e.g. because a function is now too long to be inlined, so a certain piece of code can no longer be guaranteed to have some property, so [...]).

Your example of signed integer overflow is particularly glaring, as this has actually triggered real bugs in the Linux kernel (before they started using a compilation flag to force signed integer overflow to be considered defined behavior). Sure, the compiler compiles all signed integer operations to processor instructions that result in two's complement operations, and thus overflow on addition. But, since signed integer overflow is UB, the compiler also assumes it never happens, and optimizes your program accordingly.

For example, the following program will never print "overflow" regardless of what value x has:

  int foo(int num) {
    int x = num + 100;
    if (x < num) {
      printf("overflow occured");
    }
    return x;
  }
In fact, you won't even find the string "overflow" in the compiled binary, as the whole `if` is optimized away [0], since per the standard signed integer overflow can't occur, so x + 100 is always greater than x for any (valid) value of x.

[0] https://godbolt.org/z/zzdr4q1Gx


> This is dead wrong, and a very dangerous mindset.

It's "dead wrong" that compilers independently choose to define undefined behavior?

Oh, ok; I guess I must just be a stellar programmer to never have received the dreaded "this is undefined" error (or it's equivalent) that would inevitably be emitted in these cases then.


I've explained and showed an example of how compilers behave in relation to UB. They don't typically "choose to define it", they choose to assume that it never happens.

They don't throw errors when UB happens, they compile your program under the assumption that any path that would definitely lead to UB can't happen at runtime.

I believe you think that because signed integer addition gets compiled to the `add` instruction that overflows gracefully at runtime, this means that compilers have "defined signed int overflow". I showed you exactly why this is not true. You can't write a C or C++ program that relies on this behavior, it will have optimizer-induced bugs sooner or later.


No, I said it had consistent behavior to the compiler.

You seem to think I'm saying "undefined behavior means nothing, ignore it"; when what I'm actually saying is "undefined behavior doesn't mean the compiler hasn't defined a behavior and doesn't necessarily = 'bad'". There's dozens of "UB" that C (and C++) developers rely on frequently, because the compilers have defined some behavior they follow; to the point critical portions of the Linux Kernel rely on it (particularly in the form of pointer manipulations).

TL;DR - Is UB unsafe to rely on generally? Yes. Should you ignore UB warnings? Definitely not. Does UB mean that the compiler has no idea what to do or is lacking some consistent behavior? Also, no.

Know your compiler, and only write code that you know what it does; especially if it's in a murky area like "UB".


Isn't this a terrible failure of the compiler though? Why is it not just telling you that the `if` is a noop?? Damn, using IntelliJ and getting feedback on really difficult logic when a branch becomes unreachable and can be removed makes this sort of thing look like amateur hour.


    if(DEBUG) {
       log("xyz")
    }
Should the compiler emit a warning for such code? Compilers don't behave like a human brain, maybe a specific diagnostic could be added by pattern matching the AST but it will never catch every case.


There's a world of difference between code that's dead because of a static define, and code that's dead because of an inference the compiler made.

A dead code report would be a useful thing, though, especially if it could give the reason for removal. (Something like the list of removed registers in the Quartus analysis report when building for FPGAs.)


> There's a world of difference between code that's dead because of a static define, and code that's dead because of an inference the compiler made.

Not really, that’s the problem. After many passes of transforming the code through optimization it is hard for the compiler to tell why a given piece of code is dead. Compiler writers aren’t just malicious as a lot of people seem to think when discussions like this come up.


Yeah, I know the compiler writers aren't being deliberately malicious. But I can understand why people perceive the end result - the compiler itself - as having become "lawful evil" - an adversary rather than an ally.


  #define DEBUG i<num+100
The example is silly, but you should get the point. DEBUG can be anything.


Fair point, however your example is a runtime check, so shouldn't result in dead code.

(And if DEBUG is a static define then it still won't result in dead code since the preprocessor will remove it, and the compiler will never actually see it.)

EDIT: and now I realise I misread the first example all along - I read "#if (DEBUG)" rather than "if (DEBUG)".


I am guessing there would be a LOT of false negatives of compilers removing dead code for good reason. For example, if you only use a portion of a library's enum then it seems reasonable to me that the compilers optimizes away all the if-else that uses those enums that will never manifest.


I don't think it is unreasonable to have an option for "warn me about places that might be UB" that would tell you if it removes something it thinks is dead because it assumed UB doesn't happen?


That's the core of the complaints about how modern C and C++ compilers use UB.


The focus was certainly much more on optimization instead of having good warnings (although any commercial products focus on that). I would not blame compiler vendors exclusively, certainly paying customer also prioritized this.

This is shifting though, e.g. GCC now has -fanalyzer. I does not detect this specific coding error though, but for example issues such as dereferencing a pointer before checking for null.


Yes it is. Don't let those who worship the standard gaslight you into thinking any differently.


There are only two models of UB that are useful to compiler users:

1) This is a bad idea and refuse to compile.

2) Do something sensible and stable.

Silently fail and generate impossible to predict code is a third model that is only of use to compiler writers. Hiding behind the spec benefits no actual user.


I think this is a point of view that seems sensible, but probably hasn't really thought through how this works. For example

  some_array[i]
What should the compiler emit here? Should it emit a bounds check? In the event the bounds check fails, what should it do? It is only through the practice of undefined behavior that the compiler can consistently generate code that avoids the bounds check. (We don't need it, because if `i` is out-of-bounds then it's undefined behavior and illegal).

If you think this is bad, then you're arguing against memory unsafe languages in general. A sane position is the one the Rust takes, which is by default, yes indeed you should always generate the bounds check (unless you can prove it always succeeds). But there will likely always be hot inner loops where we need to discharge the bounds checks statically. Ideally that would be done with some kind of formal reasoning support, but the industry is far that atm.

For a more in depth read: https://blog.regehr.org/archives/213


> What should the compiler emit here?

It should emit an instruction to access memory location some_array + i.

That's all most people that complain about optimizations on undefined behavior want. Sometimes there are questions that are hard to answer, but in a situation like this, the answer is "Try it and hope it doesn't corrupt memory." The behavior that's not wanted is for the compiler to wildly change behavior on purpose when something is undefined. For example, the compiler could optimize

  if(foo) {
      misbehaving_code();
      return puppies;
  } else {
      delete_data();
  }
into

  delete_data();


I think the "do the normal" thing is very easy to say and very hard to do in general. Should every case of `a / b` inject a `(b != 0) && ((a != INT_MAX && b != -1))`? If that evaluates to `true` then what should the program do? Or: should the compiler assume this can't happen. Languages with rich runtimes get around this by having an agreed upon way to signal errors, at the expense of runtime checking. An example directly stolen from the linked blog post:

  int stupid (int a) {
    return (a+1) > a;
  }
What should the compiler emit for this? Should it check for overflow, or should it emit the asm equivalent of `return 1`? If your answer is check for overflow: then should the compiler be forced to check for overflow every time it increments an integer in a for loop? If your answer is don't check: then how do you explain this function behaving completely weird in the overflow case? The point I'm trying to get at is that "do the obvious thing" is completely dependent on context.


The compiler should emit the code to add one to a, and then code to check if the result is greater than a. This is completely evident, and is what all C and C++ compilers did for the first few decades of their existence. Maybe a particularly smart compiler could issue a `jo` instead of a `cmp ax, bx; jz `.

The for loop example is silly. There is no reason whatsoever to add an overflow check in a for loop. The code of a standard for loop, `for (int i = 0; i < n; i++)` doesn't say to do any overflow check, so why would the compiler insert one? Not inserting overflow checks is completely different than omitting overflow checks explicitly added in the code. Not to mention, for this type of loop, the compiler doesn't need any UB-based logic to prove that the loop terminates - for any possible value of n, including INT_MAX, this loop will terminate, assuming `i` is not modified elsewhere.

I'd also note that the "most correct" type to use for the iteration variable in a loop used to access an array, per the standard, would be `size_t`, which is an unsigned type, which does allow overflow to happen. The standard for loop should be `for (size_t i = 0; i < n; ++i)`, which doesn't allow the compiler to omit any overflow checks, even if any were present.


The interesting case is what should the code do if inlined on a code path where a is deduced to be INT_MAX.

A compiler will just avoid inlining any code here, since it's not valid, and thus by definition that branch cannot be taken, removing cruft that would impact the instruction cache.


The original code is not invalid, even by the standard. It's not even undefined behavior. It is perfectly well defined as equivalent to `return true` according to the standard, or it can be implemented in the more straightforward way (add one to a, compare the result with a, return the result of the comparison). Both are perfectly valid compilations of this code according to the standard. Both allow inlining the function as well.

Note that also `return 1 < 0` is also perfectly valid code.

The problem related to UB appears if the function is inlined in a situation where a is INT_MAX. That causes the whole branch of code to be UB, and the compiler is allowed to compile the whole context with the assumption that this didn't happen.

For example, the following function can well be compiled to print "not zero":

  int foo(int x) {
    if (x == 0) {
      return stupid(INT_MAX);
    } else {
      printf("not zero");
      return -1;
    } 
  }

  foo(0); //prints "not zero"
This is a valid compilation, because stupid(INT_MAX) would be UB, so it can't happen in a valid program. The only way for the program to be valid is for x to never be 0, so the `if` is superfluous and `foo` can be compiled to only have the code where UB can't happen.

Eidt: Now, neither clang nor gcc seem to do this optimization. But if we replace stupid(INT_MAX) with a "worse" kind of UB, say `*(int*)NULL = 1`, then they do indeed compile the function to simply call printf [0].

[0] https://godbolt.org/z/McWddjevc


I don't know what you're ranting on about.

Functions have parameters. In the case of the previous function, it is not defined if its parameter is INT_MAX, but is defined for all other values of int.

Having functions that are only valid on a subset of the domain defined by the types of their parameters is a commonplace thing, even outside of C.

Yes, a compiler can deduce that a particular code path can be completely elided because the resulting behaviour wasn't defined. There is nothing surprising about this.


The point is that a compiler can notice that one branch of your code leads to UB and elide the whole branch, even eliding code before the UB appears. The way this cascades is very hard to track and understand - in this case, the fact that stupid() is UB when called with INT_MAX makes foo() be UB when called with 0, which can cascade even more.

And no, this doesn't happen in any other commonly-used language. No other commonly-used language has this notion of UB, and certainly not this type of optimization based on deductions made from UB. A Java function that is not well defined over its entire input set will trigger an exception, not cause code calling it with the parameters it doesn't accept to be elided from the executable.

Finally, I should mention that the compiler is not even consistent in its application of this. The signed int overflow UB is not actually used to ellide this code path. But other types of UB, such as null pointer dereference, are.


It is perfectly possible to write a function in pure Java that would never terminate when called with parameter values outside of the domain for which it is defined. It is also possible for it to yield an incorrect value.

Your statement that such a function would throw an exception is false.

Ensuring a function is only called for the domain it is defined on is entirely at the programmer's discretion regardless of language. Some choose to ensure all functions are defined for all possible values, but that's obviously impractical due to combinatorial explosions. Types that encapsulate invariants are typically seen as the solution for this.


I didn't claim that all functions are either correct or throw an exception in Java. I said that UB doesn't exist in Java, in the sense of a Java program that compiles, but for which no semantics are assigned and the programmer is not allowed to write it. All situations that are UB in C or C++ are either well-defined in Java (signed integer overflow, non-terminating loops that don't do IO/touch volatile variables), many others throw exceptions (out of bounds access, divide by 0), and a few are simply not possible (use after free). Another few are what the C++ standard would call "unspecified behavior", such as unsynchronized concurrent access.

And yes, it's the programmer's job to make sure functions are called in their domain of apllication. But it doesn't help at all when the compiler prunes your code-as-written to remove flows that would have reached an error situation, making debugging much harder when you accidentally do call them with illegal values.


If you want the compiler to output exactly the code as written (or as close as possible to it for the target architecture), then most compilers support that. It's called turning off optimizations. You can do that if that's what you want.

Optimizing compilers on the other hand are all about outputting something that is equivalent to your code UNDER THE RULES OF THE LANGUAGE while hopefully being faster. This condition isn't there to fuck you over its there because it is required for the compiler to do more than very very basic optimizations.


> Optimizing compilers on the other hand are all about outputting something that is equivalent to your code UNDER THE RULES OF THE LANGUAGE while hopefully being faster.

The problem here is how far you stretch this "equivalent under the rules of the language" concept. I think many agree that C and C++ compilers have chosen to play language lawyer games to little performance in real world code, but introducing very real bugs.

As it stands today, C and C++ are the only mainstream languages that have non-timing-related bugs in optimized builds that aren't there in debug builds - putting a massive burden on programmers to find and fix these bugs. The performance gain from this is extremely debatable. But what is clear is that you can create very performant code without relying on this type of UB logic.


Ah, but what if it writes so far off the array that it messes with the contents of another variable on the stack that is currently cached in a register? Should the compiler reload that register because the out of bounds write might have updated it? Probably not, let's just assume they didn't mean to do that and use the in-register version. That's taking advantage of undefined behavior to optimize a program.


> Ah, but what if it writes so far off the array that it messes with the contents of another variable on the stack that is currently cached in a register? Should the compiler reload that register because the out of bounds write might have updated it? Probably not, let's just assume they didn't mean to do that and use the in-register version.

Yes, go ahead and assume it won't alias outside the rules of C and hope it works out.

> That's taking advantage of undefined behavior to optimize a program.

I don't know if I really agree with that, but even taking that as true, that's fine. The objection isn't to doing any optimizations. Assuming memory didn't get stomped is fine. Optimizations that significantly change program flow in the face of misbehavior and greatly amplify it are painful. And lots of things are in the middle.


> That's all most people that complain about optimizations on undefined behavior want

If this was true most of them could just adopt Rust where of course this isn't a problem.

But in fact they're often vehemently against Rust. They like C and C++ where they can write total nonsense which has no meaning but it compiles and then they can blame the compiler for not reading their mind and doing whatever it is they thought it "obviously" should do.


I could be wrong here since I don't develop compilers, but from my understanding many of the undefined behaviours in C are the product of not knowing what the outcome will be for edge cases or due to variations in processor architecture. In these cases, undefined behaviour was intended as a red flag for application developers. Many application developers ended up treating the undefined behaviours as deterministic provided that certain conditions were met. On the other hand, compiler developers took undefined behaviour to mean they could do what they wanted, generating different results in different circumstance, thus violating the expectations of application developers.


I think the problem is that some behaviours are undefined where developers expect them to be implementation-defined (especially in C's largest remaining stronghold, the embedded world) - i.e. do what makes sense on this particular CPU.

Signed overflow is the classic example - making that undefined rather than implementation-defined is a decision that makes less to those of us living in today's exclusively two's-complement world than it would have done when it was taken.

It's become more of an issue in recent years as compilers started doing more advanced optimisations, which some people perceived as the compiler being "lawful evil".

What it reminds me of is that episode of Red Dwarf with Kryten (with his behavioural chip disabled) explaining why he thought it was OK to serve roast human to the crew: "If you eat chicken then obviously you'd eat your own species too, otherwise you'd just be picking on the chickens"!


Why not just turn off (or down) optimizations? I mean, optimization is not even activated by default


Unfortunately it's not necessarily specified what counts as "an optimisation". For example, the (DSP-related) compiler I worked on back in the day had an instruction selection pass, and much of the performance of optimised code came from it being a very good instruction selector. "Turning off optimisations" meant not running compiler passes that weren't required in order to generate code, we didn't have a second sub-optimal instruction selector.

And undefined behaviour is still undefined behaviour without all the optimisation passes turned on.


> It should emit an instruction to access memory location some_array + i.

That's definitely what compilers emit. The UB comes from the fact that the compiler cannot guarantee how the actual memory will respond to that. Will the OS kill you? Will your bare metal MCU silently return garbage? Will you corrupt your program state and jump into branches that should never be reached? Who knows. You're advocating for wild behavior but you don't even realize it.

As for your example. No, the compiler couldn't optimize like that. You seem to have some misconceptions about UB. If foo is false in your code, then the behavior is completely defined.


> If foo is false in your code, then the behavior is completely defined.

That's the point. If foo is false, both versions do the same thing. If foo is true, then it's undefined and it doesn't matter. Therefore, assume foo is false. Remove the branch.


Yes! This is exactly the point. It is undefined, so given that, it could do what the other branch does, so you can safely remove that branch.

you get it, but a lot of other people don't understand just how undefined, undefined code is.


We do. We just wish undefined was defined to be a bit less undefined, and are willing to sacrifice a bit of performance for higher debuggability an. ability to reason.


Why not use -fsanitize=signed-integer-overflow ?


It could do what the other branch does, in theory.

But let me put it this way. If you only had the misbehaving_code(); line by itself, the compiler would rightly be called crazy and malicious if it compiled that to delete_data();

So maybe it's not reasonable to treat both branches as having the same behavior, even if you can.


> Silently fail and generate impossible to predict code is a third model that is only of use to compiler writers. Hiding behind the spec benefits no actual user.

A significant issue is that compiler "optimizations" aren't gaining a lot of general benefit anymore, and yet they are imposing a very significant cost on many people.

Lots of people still are working on C/C++ compiler optimizations, but nobody is asking if that is worthwhile to end users anymore.

Data suggests that it is not.


What data?


TFA? Quoting:

    Compiler writers measure an "optimization" as successful if they can find any example where the "optimization" saves time. Does this matter for the overall user experience? The typical debate runs as follows:

    In 2000, Todd A. Proebsting introduced "Proebsting's Law: Compiler Advances Double Computing Power Every 18 Years" (emphasis in original) and concluded that "compiler optimization work makes only marginal contributions". Proebsting commented later that "The law probably would have gone unnoticed had it not been for the protests by those receiving funds to do compiler optimization research."

    Arseny Kapoulkine ran various benchmarks in 2022 and concluded that the gains were even smaller: "LLVM 11 tends to take 2x longer to compile code with optimizations, and as a result produces code that runs 10-20% faster (with occasional outliers in either direction), compared to LLVM 2.7 which is more than 10 years old."

    Compiler writers typically respond with arguments like this: "10-20% is gazillions of dollars of computer time saved! What a triumph from a decade of work!"
We are spinning the compilers much harder and imposing changes on end programmers for roughly 10-20% over a decade. That's not a lot of gain in return for the pain being caused.

I suspect most programmers would happily give up 10% performance on their final program if they could halve their compile times.


> We are spinning the compilers much harder and imposing changes on end programmers for roughly 10-20% over a decade. That's not a lot of gain in return for the pain being caused.

> I suspect most programmers would happily give up 10% performance on their final program if they could halve their compile times.

10% at FAANG scale is around a billion dollars per year. There's a reason why FAANG continues to be the largest contributor by far to LLVM and GCC, and it's not because they're full of compiler engineers implementing optimizations for the fun of it.


> There's a reason why FAANG continues to be the largest contributor by far to LLVM and GCC, and it's not because they're full of compiler engineers implementing optimizations for the fun of it.

And, yet, Google uses Go which is glop for performance (Google even withdrew a bunch of people from the C/C++ working groups). Apple funded Clang so they could get around the GPL with GCC and mostly care about LLVM rather than Clang. Amazon doesn't care much as their customers pay for CPU.

So, yeah, Facebook cares about performance and ... that's about it. Dunno about Netflix who are probably more concerned about bandwidth.


Half of what? I'm not overly concerned about how long a prod build & deploy takes if it's automated. 10 minute build instead of 5 for 10% perf gain is probably worth it. Probably more and more worth it as you scale up because you only need to build it once then you can copy the binary to many machines where they all benefit.


Can't you give it a different -O level?

-O0 gives you what you are after.


You would be very wrong on that last point.


Fun fact you and GP both right. Goals of 'local' build a programmer does to check what he wrote are at odds with goals of 'build farm' build meant for end user. Former should be optimized to reduce build time and latter optimized to reduce run-time. In gamedev we separate them as different build configurations.


Right and if anything, compilers are conservative when it comes the optimizations parameters they enable for release builds (i.e. with -O2/-O3). For most kinds of software even a 10x further increase in compile times could make sense if it meant a couple of percent faster software.


The result of a binary search is undefined if the input is not sorted.

How do you expect the compiler to statically guarantee that this property holds in all the cases you want to do a binary search?


If something is good for compiler developers, it is good for compiler users, in the sense that it makes it easier for the compiler developers to make the compilers we need.


Russ Cox has a nice article about it: C and C++ Prioritize Performance over Correctness (https://research.swtch.com/ub)


I think you're replying to a strawman. Here's the full quote:

> The excuse for not taking responsibility is that there are "language standards" saying that these bugs should be blamed on millions of programmers writing code that bumps into "undefined behavior", rather than being blamed on the much smaller group of compiler writers subsequently changing how this code behaves. These "language standards" are written by the compiler writers.

> Evidently the compiler writers find it more important to continue developing "optimizations" than to have computer systems functioning as expected. Developing "optimizations" seems to be a very large part of what compiler writers are paid to do.

The argument is that the compiler writers are themselves the ones deciding what is and isn't undefined, and they are defining those standards in such a way as to allow themselves latitude for further optimizations. Those optimizations then break previously working code.

The compiler writers could instead choose to prioritize backwards compatibility, but they don't. Further, these optimizations don't meaningfully improve the performance of real world code, so the trade-off of breaking code isn't even worth it.

That's the argument you need to rebut.


Perhaps the solution is also to reign in the language standard to support stricter use cases. For example, what if there was a constant-time { ... }; block in the same way you have extern "C" { ... }; . Not only would it allow you to have optimizations outside of the block, it would also force the compiler to ensure that a given block of code is always constant-time (as a security check done by the compiler).


> Perhaps the solution is also to reign in the language standard to support stricter use cases.

Here's a nine-year-old comment from the author asking for exactly that:

https://groups.google.com/g/boring-crypto/c/48qa1kWignU/m/o8...


That thread already spawned a GCC-wiki page https://gcc.gnu.org/wiki/boringcc, with a quote in bold:

>The only thing stopping gcc from becoming the desired boringcc is to find the people willing to do the work.

And frankly, nine years is enough time to build a C compiler from scratch.


We can debate whether it's reasonable or not to optimize code based on undefined behavior. But we should at least have the compiler emit a warning when this happens. Just like we have the notorious "x makes an integer from a pointer without a cast", we could have warnings for when the compiler decides to not emit the code for an if branch checking for a null pointer or an instruction zeroing some memory right before deallocation (I think this is not UB, but still a source of security issues due to extreme optimizations).


I would say that allowing undefined behavior is a bug in itself. It was an understandable mistake for 1970, especially for such a hacker language as C. But now if a compiler can detect UD, it should warn you about it (and mostly it does by default), and you should treat that warning as an error.

So, well, yes, if the bug is due to triggering UD, some blame should fall on the developer, too.


Plenty of undefined behavior is actually perfectly good code the compiler has no business screwing up in any way whatsoever. This is C, we do evil things like cast pointers to other types and overlay structures onto byte buffers. We don't really want to hear about "undefined" nonsense, we want the compiler to accept the input and generate the code we expect it to. If it's undefined, then define it.

This attitude turns bugs into security vulnerabilities. There's a reason the Linux kernel is compiled with -fwrapv -fno-strict-aliasing -fno-delete-null-pointer-checks and probably many more sanity restoring flags. Those flags should actually be the default for every C project.


Calling the compiler buggy for not doing what you want when you commit Undefined Behavior is like calling dd buggy for destroying your data when you call it with the wrong arguments.

No, it's like calling dd buggy for deliberately zeroing all your drives when you call it with no arguments.

How did we let pedantic brainless "but muh holy standards!!!1" religious brigading triumph over common sense?

The standards left things undefined in the hopes that the language would be more widely applicable and implementers would give those areas thought themselves and decide the right thing. Not so that compiler writers can become adversarial smartasses. It even suggests that "behaving in a manner characteristic of the environment" is a possible outcome of UB, which is what "the spirit of C" is all about.

In my observations this gross exploitation of UB started with the FOSS compilers, GCC and Clang being the notable examples. MSVC or ICC didn't need to be so crazy, and yet they were very competitive, so I don't believe claims that UB is necessary for optimisation.

The good thing about FOSS is that those in power can easily be changed. Perhaps it's time to fork, fix, and fight back.


> The standards left things undefined in the hopes that the language would be more widely applicable and implementers would give those areas thought themselves and decide the right thing.

That sounds like implementation-defined behavior, not undefined behavior.


Same difference. You still have to think about what's right.


They are different by definition.


> The good thing about FOSS is that those in power can easily be changed. Perhaps it's time to fork, fix, and fight back.

Huzzah! Lead on, then.


I do not think there is a reason to fork. Just contribute. I found GCC community very welcoming. But maybe not come in with an "I need to take back the compiler from evil compiler writers" attitude.


From personal experience, they couldn't care less if they can argue it's "undefined". All they do is worship The Holy Standard. They follow the rules blindly without ever thinking whether it makes sense.

But maybe not come in with an "I need to take back the compiler from evil compiler writers" attitude.

They're the ones who started this hostility in the first place.

If even someone like Linus Torvalds can't get them to change their ways, what chances are there for anyone else?


Okay, so you're not up to making a boringcc compiler from the nine-years old proposal of TFA's author, and you don't believe that it's possible to persuade the C implementers to adopt different semantics, so... what do you propose then? It can be only three things, really: either "stop writing crypto algorithms altogether", or "write them in some different, sane language", or "just business as usual: keep writing them in C while complaining about those horrible implementers of C compilers". But perhaps you have a fourth option?

P.S. "All they do is worship The Holy Standard. They follow the rules blindly without ever thinking whether it makes sense" — well, no. Who do you think writes The Holy Standard? Those compiler folks actually comprise quite a number of the members of JTC1/SC22/WG14, and they are also the ones who actually get to implement that standard. So to quote JeanHeyd Meneide of thephd.dev, "As much as I would not like this to be the case, users – me, you and every other person not hashing out the bits ‘n’ bytes of your Frequently Used Compiler — get exactly one label in this situation: bottom bitch".


This was a very good example for the nonconstructive attitude I was talking about.


> They're the ones who started this hostility in the first place.

"How dare these free software developers not do exactly what I want."

Talk about being entitled. If you can't manage to communicate you ideas in a way that will convince others to do the work you want to see done then you need to either pay (and find someone willing to do the work for payment) or do the work yourself.


I don’t really think there is either, but I figured it was a funny way to present the “there never was anything to prevent you from forking in the first place” argument.


Optimizing compilers that don't allow disabling all optimizations makes it impossible to write secure code with them. Must do it with assembly.


Disabling all optimizations isn't even enough- fundamentally what you need is a much narrower specification for how the source language maps to its output. Even -O0 doesn't give you that, and in fact will often be counterproductive (e.g. you'll get branches in places that the optimizer would have removed them).

The problem with this is that no general purpose compiler wants to tie its own hands behind its back in this way, for the benefit of one narrow use case. It's not just that it would cost performance for everyone else, but also that it requires a totally different approach to specification and backwards compatibility, not to mention deep changes to compiler architecture.

You almost may as well just design a new language, at that point.


> You almost may as well just design a new language, at that point.

Forget “almost”.

Go compile this C code:

    void foo(int *ptr)
    {
        free(ptr);
        *ptr = 42;
    }
This is UB. And it has nothing whatsoever to do with optimizations — any sensible translation to machine code is a use-after-free, and an attacker can probably find a way to exploit that machine code to run arbitrary code and format your disk.

If you don’t like this, use a language without UB.

But djb wants something different, I think: a way to tell the compiler not to introduce timing dependencies on certain values. This is a nice idea, but it needs hardware support! Your CPU may well implement ALU instructions with data-dependent timing. Intel, for example, reserves the right to do this unless you set an MSR to tell it not to. And you cannot set that MSR from user code, so what exactly is a compiler supposed to do?

https://www.intel.com/content/www/us/en/developer/articles/t...


It isn't just UB to dereference `ptr` after `free(ptr)` – it is UB to do anything with its value whatsoever. For example, this is UB:

    void foo(int *ptr)
    {
        assert(ptr != NULL);
        free(ptr);
        assert(ptr != NULL);
    }
Why is that? Well, I think because the C standard authors wanted to support the language being used on platforms with "fat pointers", in which a pointer is not just a memory address, but some kind of complex structure incorporating flags and capabilities (e.g. IBM System/38 and AS/400; Burroughs Large Systems; Intel iAPX 432, BiiN and i960 extended architecture; CHERI and ARM Morello). And, on such a system, they wanted to permit implementors to make `free()` a "pass-by-reference" function, so it would actually modify the value of its argument. (C natively doesn't have pass-by-reference, unlike C++, but there is nothing stopping a compiler adding it as an extension, then using it to implement `free()`.)

See this discussion of the topic from 8 years back: https://news.ycombinator.com/item?id=11235385

> And you cannot set that MSR from user code, so what exactly is a compiler supposed to do?

Set a flag in the executable which requires that MSR to be enabled. Then the OS will set the MSR when it loads the executable, or refuse to load it if it won't.

Another option would be for the OS to expose a user space API to read that MSR. And then the compiler emits a check at the start of security-sensitive code to call that API and abort if the MSR doesn't have the required value. Or maybe even, the OS could let you turn the MSR on/off on a per-thread basis, and just set it during security-sensitive processing.

Obviously, all these approaches require cooperation with the OS vendor, but often the OS vendor and compiler vendor is the same vendor (e.g. Microsoft)–and even when that isn't true, compiler and kernel teams often work closely together.


> Set a flag in the executable which requires that MSR to be enabled. Then the OS will set the MSR when it loads the executable, or refuse to load it if it won't.

gcc did approximately this for decades with -ffast-math. It was an unmitigated disaster. No thanks. (For flavor, consider what -lssl would do. Or dlopen.)

> Another option would be for the OS to expose a user space API to read that MSR. And then the compiler emits a check at the start of security-sensitive code to call that API and abort if the MSR doesn't have the required value.

How does the compiler know where the sensitive code starts and ends? Maybe it knows that certain basic blocks are sensitive, but it’s a whole extra control flow analysis to find beginning and ends.

And making this OS dependent means that compilers need to be more OS dependent for a feature that’s part of the ISA, not the OS. Ick.

Or maybe even, the OS could let you turn the MSR on/off on a per-thread basis, and just set it during security-sensitive processing.


> How does the compiler know where the sensitive code starts and ends?

Put an attribute on the function. In C23, something like `[[no_data_dependent_timing]]` (or `__attribute__((no_data_dependent_timing))` using pre-C23 GNU extension)

> And making this OS dependent means that compilers need to be more OS dependent for a feature that’s part of the ISA, not the OS. Ick.

There are lots of unused bits in RFLAGS, I don't know why Intel didn't use one of those, instead of an MSR. (The whole upper 32 bits of RFLAGS is unused – if Intel and AMD split it evenly between them, that would be 16 bits each.) Assuming the OS saves/restores the whole of RFLAGS on context switch, it wouldn't even need any change to the OS. CPUID could tell you whether this additional RFLAGS bit was supported or not. Maybe have an MSR which controls whether the feature is enabled or not, so the OS can turn it off if necessary. Maybe even default to having it off, so it isn't visible in CPUID until it is enabled by the OS via MSR – to cover the risk that maybe the OS context switching code can't handle a previously undefined bit in RFLAGS being non-zero.


I am not talking about UB at all. I am talking about the same constant-time stuff that djb's post is talking about.


Execution time is not considered Observable Behavior in the C standard. It's entirely outside the semantics of the language. It is Undefined Behavior, though not UB that necessarily invalidates the program's other semantics the way a use-after-free would.


This is pretty persnickety and I imagine you're aware of this, but free is a weak symbol on Linux, so user code can replace it at whim. Your foo cannot be statically determined to be UB.


Hmm, not sure, I think it would be possible to mark a function with a pragma as "constant time", and the compiler could make sure that it indeed is that. I think it wouldn't be impossible to actually teach it to convert branched code into unbranched code automatically for many cases as well. Essentially, the compiler pass must try to eliminate all branches, and the code generation must make sure to only use data-constant-time ops. It could warn/fail when it cannot guarantee it.


clang::optnone


"Optimizing compilers that don't allow disabling __all__ optimizations"


It’s not well-defined what counts as an optimization. For example, should every single source-level read access of a memory location go through all cache levels down to main memory, instead of, for example, caching values in registers? That would be awfully slow. But that question is one reason for UB.


Or writing code that relies on inlining and/or tail call optimization to successfully run at all without running out of stack... We've got some code that doesn't run if compiled O0 due to that.


do these exist? who's using them?


If your "secure" code is not secure because of a compiler optimization it is fundamentally incorrect and broken.


There is a fundamental difference of priorities between the two worlds. For most general application code any optimization is fine as long as the output is correct. In security critical code information leakage from execution time and resource usage on the chip matters but that essentially means you need to get away from data-dependent memory access patterns and flow control.


Then such code needs to be written in a language that actually makes the relevant timing guarantees. That language may be C with appropriate extensions but it certainly is not C with whining that compilers don't apply my special requirements to all code.


That argument would make more sense if such a language was widely available but today in practice it isn't so we live in the universe of less ideal solutions. Actually it doesn't really respond to DJB's point anyway, his case here is that the downstream labor cost of compiler churn exceeds the actual return in performance gains from new features and that a change in policy could give security-related code a more predictable target without requiring a whole new language or toolchain. For what it's worth I think the better solution will end up being something like constant-time function annotations (not stopping new compiler features) but I don't discount his view that absent human nature maybe we would be better of focusing compiler dev on correctness and stability.


> his case here is that the downstream labor cost of compiler churn exceeds the actual return in performance gains from new features

Yes but his examples are about churn in code that makes assumptions that neither the language nor the compiler guarantees. It's not at all surprising that if your code depends on coincidental properties of your compiler that compiler upgrades might break it. You can't build your code on assumptions and then blame others when those assumptions turn out to be false. But then again, it's perhaps not too surprising that cryptographers would do this since their entire field depends on unproven assumptions.

A general policy change here makes no sense because most language users do not care about constant runtime and would rather have their programs always run as fast as possible.


I think this attitude is what is driving his complaints. Most engineering work exists in the context of towering teetering piles of legacy decisions, organizational cultures, partially specified problems, and uncertainty about the future. Put another way "the implementation is the spec" and "everything is a remodel" are better mental models than spec-lawyering. I agree that relying on say stability of the common set of compiler optimizations circa 2015 is a terrible solution but I'm not convinced it's the wrong one in the short term. Are we really getting enough perf out of the work to justify the complexity? I don't know. It's also completely infeasible given the incentives at play, complexity and bugs are mostly externalities that with some delay burden users and customers.

Personally I'm grateful the cryptographers do what they do, computers would be a lot less useful without their work.


The problem is that preventing timing attacks often means you have to implement something in constant time. And most language specifications and implementations don't give you any guarantees that any operations hapen in constant time and can't be optimized.

So the only possible way to ensure things like string comparison don't have data-dependent timing is often to implement it in assembly, which is not great.

What we really need is intrinsics that are guaranteed to have the desired timing properties , and/or a way to disable optimization, or at least certain kinds of optimization for an area of code.


Intrinsics which do the right thing seems like so obviously the correct answer to me that I've always been confused about why the discussion is always about disabling optimizations. Even in the absence of compiler optimizations (which is not even an entirely meaningful concept), writing C code which you hope the compiler will decide to translate into the exact assembly you had in mind is just a very brittle way to write software. If you need the program to have very specific behavior which the language doesn't give you the tools to express, you should be asking for those tools to be added to the language, not complaining about how your attempts at tricking the compiler into the thing you want keep breaking.


The article explains why this is not as simple as that, especially in the case of timing attacks. Here it's not just the end-result that matters, but how it's done that matters. If any code can be change to anything else that gives the same results, then this becomes quite hard.

Absolutist statements such as this may give you a glowing sense of superiority and cleverness, but they contribute nothing and are not as clever as you think.


The article describes why you can’t write code which is resistant to timing attacks in portable C, but then concludes that actually the code he wrote is correct and it’s the compiler’s fault it didn’t work. It’s inconvenient that anything which cares about timing attacks cannot be securely written in C, but that doesn’t make the code not fundamentally incorrect and broken.


It's secure code we use.

I'm sure you know who DJB is.


Why is knowing who the author is relevant? Either what he posts is correct or it is not, who the person is is irrelevant.


If you have ub then you have a bug and there is some system that will show it. It isn't hard to write code without ub.


It is, in fact, pretty hard as evidenced by how often programmers fail at it. The macho attitude of "it's not hard, just write good code" is divorced from observable reality.


Staying under the speed limit is, in fact, pretty hard as evidenced by how often drivers fail at it.


It's more complex than that for the example of car speed limits. Depending on where you live, the law also says that driving too slow is illegal because it creates an unsafe environment by forcing other drivers on i.e. the freeway to pass you.

But yeah, seeing how virtually everyone on every road is constantly speeding, that doesn't give me a lot of faith in my fellow programmers' ability to avoid UB...


Some jurisdictions also set the speed limit at, e.g., the 85th percentile of drivers' speed (https://en.wikipedia.org/wiki/Speed_limit#Method) so some drivers are always going to be speeding.

(I'm one of those speeders, too; I drive with a mentality of safety > following the strict letter of the law; I'll prefer speed of traffic if that's safer than strict adherence to the limit. That said, I know not all of my peers have the same priorities on the road, too.)


And to be specific, some kinds of UB are painfully easy to avoid. A good example of that is strict aliasing. Simply don't do any type punning. Yet people still complain about it being the compiler's fault when their wanton casting leads to problems.


People write buffer overflows because and memory leaks they are not coreful. The rest of ub are things I have never seen despite running sanitizers and a large codebase.


Perhaps you’re not looking all that hard.


Sanitizers are very good at finding ub.


Sure. That's just a function of how much UB there is, rather than them catching it all.


Only if developers act as grown ups and use all static analysers they can get hold of, instead of acting as they know better.

The tone of my answer is a reflection of what most surveys state, related to the actual use of such tooling.


Do you know what UBSAN is? Have you used it?


yes, my ci system runs it with a comprehensive test suite.


I like Bernstein but sometimes he flies off the handle in the wrong direction. This is a good example, which he even half-heartedly acknowledges at the end!

A big chunk of the essay is about a side point — how good the gains of optimization might be, which, even with data, would be a use-case dependent decision.

But the bulk of his complaint is that C compilers fail to take into account semantics that cannot be expressed in the language. Wow, shocker!

At the very end he says “use a language which can express the needed semantics”. The entire essay could have been replaced with that sentence.


There's an important point to be made here: those who define the semantics of C and C++ shovel an unreasonable amount of behavior into the bucket of "undefined behavior". Much of this has dubious justifications, while making it more difficult to write correct programs.


To be pedantic, I think you're speaking about unspecified behavior and implementation defined behavior. Undefined behavior specifically refers to things that have no meaningful semantics, so the compiler assumes it never happens.

Unspecified behavior is anything outside the scope of observable behavior for which there are two or more ways the implementation can choose.

Since the timing of instructions on machines with speculative execution is not observable behavior in C, anything that impacts it is unspecified.

There's really no way around this, and I disagree that there's an "unreasonable" amount of it. Truly the problem is up to the judgement of the compiler developers what choice to make and for users to pick implementations based on those choices, or work around them as needed.


I am referring to undefined behavior.

For example, consider the case integer overflow when adding two signed numbers. C considers this undefined behavior, making the program's behavior is undefined. All bets are off, even if the program never makes use of the resulting value. C compilers are allowed to assume the overflow can never happen, which in some cases allows them to infer that numbers must fit within certain bounds, which allows them to do things like optimize away bounds checks written by the programmer.

A more reasonable language design choice would be to treat this as an operation that produces and unspecified integer result, or an implementation-defined result.

Edit: The following article helps clear up some common confusion about undefined behavior:

https://blog.regehr.org/archives/213

Unfortunately this article, like most on the subject, perpetuates the notion that there are significant performance benefits to treating simple things like integer overflow as UB. E.g.: "I've heard that certain tight loops speed up by 30%-50% ..." Where that is true, the compiler could still emit the optimized form of the loop without UB-based inference, but it would simply have to be guarded by a run-time check (outside of the loop) that would fall back to the slower code in the rare occasions when the assumptions do not hold.


Signed integer overflow being undefined has these two consequences for me: 1. It makes my code slightly faster. 2. It makes my code slightly smaller. 3. It makes my code easier to check for correctness, and thus makes it easier to write correct code.

Win, win, win.

Signed integer overflow would be a bug in my code.

As I do not write my own implementations to correctly handle the case of signed integer overflow, the code I am writing will behave in nonsensical ways in the presence of signed integer overflow, regardless of whether or not it is defined. Unless I'm debugging my code or running CI, in which case ubsan is enabled, and the signed overflow instantly traps to point to the problem.

Switching to UB-on-overflow in one of my Julia packages (via `llvmcall`) removed like 5% of branches. I do not want those branches to come back, and I definitely don't want code duplication where I have two copies of that code, one with and one without. The binary code bloat of that package is excessive enough as is.


Agreed. If anything, I'd like to have an unsigned type with undefined overflow so that I can get these benefits while also guaranteeing that the numbers are never negative where that doesn't make any sense.


That's what zig did, and they solved the overflow problem by having seperate operators for addition and subtraction that guarantee that the number saturates/wraps on overflow.


It would also be nice if hardware would trap on signed integer overflow. Of course since the most popular architectures do not, new architectures also do not either.


The point is much of what the C standard currently calls undefined behavior should instead be either unspecified or implementation-defined. This includes the controversial ones like strict aliasing and signed overflow.

Additionally, part of the problem is compiler devs insisting on code transforms that are unsound in the presence of undecidable UB, without giving the programmer sufficiently fine control over such transforms (at best we have a few command line flags for some of them, worst case you'd need to disable all optimizations including the non-problematic ones.)


For example the recent realloc change in C23. I was surprised the previously used behaviour, even if inconsistent across implementations, was declared UB. Why not impdef?


Agreed, C23 screwed over a lot of backwards compatibility


> A big chunk of the essay is about a side point — how good the gains of optimization might be, which, even with data, would be a use-case dependent decision.

I think this was useful context, and it was eye-opening to me.


If you were not aware of this then you might reflect on the part of my comment that he doesn’t bring up: how good/bad are use-case dependent. Every program optimizes for a use case, sometimes pessimizing for others (e.g. an n^2 algo that’s worthwhile because it is believed to only be called on tiny vectors).

IMHO he was overgenerous on the optimization improvement of compilers. Often an optimization will make a difference in a tiny fraction of a percent. The value comes from how often that optimization can be applied, and how lots of optimizations can in aggregate make a bigger improvement just as a sand dune is made of tiny grains of sand.


certainly


Yep. DJB fell flat here. There were a lot of elitist religious opinions espoused without evidence.


C and C++ are unsuitable for writing algorithms with constant-time guarantees. The standards have little to no notion of real time, and compilers don't offer additional guarantees as extensions.

But blaming the compiler devs for this is just misguided.


That was my thought reading this article. If you want to produce machine code that performs operations in constant time regardless of the branch taken, you need to use a language that supports expressing that, which C does not.


Heck, CPUs themselves aren't suitable for constant time operations. At any time, some new CPU can be released which changes how quick some operations are.


It is not a problem that different CPUs have different execution time, the problem is if the same CPU, running the same instruction has a timing difference depending on the data it operates on. In this regard CPUs have actually gotten better, specifically because it is a feature that AMD and Intel has pursued.


That includes branch predictions among other CPU optimizations.


If you have data-dependent branches then you have already lost. If you don't then I fail to see what data the branch predictor could possibly leak.


Not always. At least for RISC-V there is the Zkt extension which guarantees data independent execution time for some instructions. I assume there's something similar for ARM and x86.

It does pretty much require you to write assembly though. I think it would definitely make sense to have some kind of `[constant_time]` attribute for C++ that instructed the compiler to ensure the code is constant time.


If you want to get very paranoid most instructions probably use slightly different amounts of power for different operands which will change thermal output which will affect CPU throttling. I'm not sure there are any true constant time instructions on modern high-performance CPUs. I think we have just agreed that some instructions are as close as we can reasonably get to constant time.


Or microcode updates to existing CPUs!


> If you want to produce machine code that performs operations in constant time regardless of the branch taken

Nobody is asking for that. That's the whole point. Crypto code that needs to be constant time in regards to secret data is needs to avoid branching based on secret data, but the optimizer is converting non-branching code into branching code.


What languages are suitable for writing algorithms with constant-time guarantees?


According to some comments under this submission, even x86 assembly isn't suitable, or only under specific circumstances that are generally not available in userspace.


At this time, the idea of a constant-time operation embedded into a language’s semantics is not a thing. Similar for CPU architectures. Our computing base is about being fast and faster.


It’s worth noting that, on Intel CPUs, neither clang nor anything else can possibly generate correct code, because correct code does not exist in user mode.

https://www.intel.com/content/www/us/en/developer/articles/t...

Look at DOITM in that document — it is simply impossible for a userspace crypto library to set the required bit.


> because correct code does not exist in user mode.

User mode code can run in the correct mode. What it cannot do is toggle the mode on/off. Once toggled on, it works perfectly fine for userspace; this could become e.g. a per-process flag enabled by a prctl syscall, with the MSR adjusted during scheduler task switching.


Couldn't you syscall into the kernel to set the flag, then return back into usermode with it set?


So your compiler is supposed to emit a pair of syscalls each function that does integer math? Never mind that a pair of syscalls that do WRMSR may well take longer than whatever crypto operation is between them.

I have absolutely nothing good to say about Intel’s design here.


What's the alternative?


An instruction prefix that makes instructions constant time. A code segment bit (ugly but would work). Different instructions. Making constant time the default. A control register that’s a user register.


since we already have some reasons to sign in an enclave, why not just design a cryptographic processor which is highly unoptimized and highly predictable. since the majority of codes benefit immensely from the optimizations, it doesn't seem reasonable to cripple them.


So instead of just doing the rather fast elliptic curve math when getting a TLS connection request by using a standard crypto library, I’m supposed to call out to a cryptographic coprocessor that may or may not even support the operation I need? Have you seen what an unbelievable mess your average coprocessor is to use, Intel or otherwise.

CPUs have done just fine doing constant time math for decades. It’s at best a minor optimization to add data dependence, and Intel already knows (a) how to turn it off and (b) that it’s sometimes necessary to let it be turned off. Why can’t they add a reasonable mechanism to turn them off?


The version of this that I want to see is a CPU that gives you a core that doesn't have caches or branch prediction on which you can write custom code without having to worry about timing attacks.


You could just leave it on.

I agree it's not great.


> [..] whenever possible, compiler writers refuse to take responsibility for the bugs they introduced

I have seldomly seen someone discredit their expertise that fast in a blog post. (Especially if you follow the link and realized it's just basic fundamental C stuff of UB not meaning it produces an "arbitrary" value.)


No, I think you're just speaking past each other here. You're using "bug" in reference to the source code. They're using "bug" in reference to the generated program. With UB it's often the case that the source code is buggy but the generated program is still correct. Later the compiler authors introduce a new optimization that generates a buggy program based on UB in the source code, and the finger-pointing starts.

Edit: What nobody likes to admit is that all sides share responsibility to the users here, and that is hard to deal with. People just want a single entity to offload the responsibility to, but reality doesn't care. To give an extreme analogy to get the point across: if your battery caught fire just because your CRUD app dereferenced NULL, nobody (well, nobody sane) would point the finger at the app author for forgetting to check for NULL. The compiler, OS, and hardware vendors would be held accountable for their irresponsibly-designed products, "undefined behavior" in the standard be damned. Everyone in the supply chain shares a responsibility to anticipate how their products can be misused and handle them in a reasonable manner. The apportionment of the responsibility depends on the situation and isn't something you can just determine by just asking "was this UB in the ISO standard?"


> just speaking past each other here

no I'm not

if your program has UB it's broken and it doesn't matter if it currently happen to work correct under a specific compiler version, it's also fully your fault

sure there is shared responsibility through the stack, but _one of the most important aspects when you have something like a supply chain is to know who supplies what under which guarantees taking which responsibilities_

and for C/C++ its clearly communicated that it's soly your responsibility to avoid UB (in the same way that for batteries it's the batteries vendors responsibility to produce batteries which can't randomly cough on fire and the firmware vendors responsibility for using the battery driver/chagrin circuit correctly and your OS responsibility so that a randoms program faulting can't affect the firmware etc.)

> be misused and handle them in a reasonable manner

For things provided B2B its in general only the case in context of it involving end user, likely accidents and similar.

Instead it's the responsibility of the supplier to be clear about what can be done with the product and what not and if you do something outside of the spec it's your responsibility to continuously make sure it's safe (or in general ask the supply for clarifying guarantees wrt. your usage).

E.g. if you buy capacitors rate for up to 50C environmental temperature but happen to work for up to 80C then you still can't use them for 80C because there is 0% guarantee that even other capacitors from the same batch will also work for 80C. In the same way compilers are only "rate"(1) to behave as expected for programs without UB.

If you find it unacceptable because it's to easy to end up with accidental UB, then you should do what anyone in a supply chain with a too risky to use component would do:

Replace it with something less risky to use.

There is a reason the ONCD urged developers to stop using C/C++ and similar where viable, because that is pretty much just following standard supply chain management best-practice.

(1: just for the sake of wording. Through there are certified, i.e. ~rated, compilers revisions)


> your program has UB it's broken and it doesn't matter if it currently happen to work correct under a specific compiler version, it's also fully your fault

Except that compiler writers essentially decide what's UB. Which is a conflict of interest.

And they add UB, making previously non-UB code fall under UB. Would you call such code buggy?


> Except that compiler writers essentially decide what's UB.

No, the C/C++ standards specify what is UB. So, as long as you don't switch targeted standard versions, the brokenness of your code never changes.

Compilers may happen to previously have never made optimizations around some specific UB, but, unless you read in the compiler's documentation that it won't, code relying on it was always broken. It's a bog standard "buggy thing working once doesn't mean it'll work always".


> No, the C/C++ standards specify what is UB.

And the compiler writers have a stranglehold on the standards bodies. They hold more than 50% of the voting power last time I checked.

So yeah, compiler writers decide what's UB.


The vast majority of UB usually considered problematic has been in the standards for decades, long before compilers took as much advantage of it as they do now (and the reasons for including said UB back then were actual hardware differences, not appeasing compiler developers).

Are there even that many UB additions? The only thing I can remember is realloc with size zero going from implementation-defined to undefined in C23.


Yes, but that does not change the fact that compilers writers have control of the standard, have had that control since probably C99, and have introduced new UB along with pushing the 00UB worldview.


What introduced UB are you thinking of? I'll admit I don't know how much has changed, but the usually-complained-about things (signed overflow, null pointer dereferencing, strict aliasing) are clearly listed as UB in some C89 draft I found.

C23's introduced stdc_trailing_zeros & co don't even UB on 0, even though baseline x86-64's equivalent instructions are literally specified to leave their destination undefined on such!

00UB is something one can argue about, but I can't think of a meaningful way to define UB that doesn't impose significant restrictions on even basic compilers, without precisely defining how UB-result values are allowed to propagate.

e.g. one might expect that 'someFloat == (float)(int8_t)someFloat' give false on an input of 1000, but guaranteeing that takes intentional effort - namely, on hardware whose int↔float conversions only operate on ≥32-bit integers (i.e. everything - x86, ARM, RISC-V), there'd need to be an explicit 8-to-32-bit sign-extend, and the most basic compiler just emitting the two f32→i32 & i32→f32 instructions would fail (but is imo pretty clearly within "ignoring the situation completely with unpredictable results" that the C89 draft contains). Sure it doesn't summon cthulhu, but it'll quite likely break things very badly anyway. (whether it'd be useful to not have UB here in the first place is a separate question)

Even for 'x+100 < x' one can imagine a similar case where the native addition & comparison instructions operate on inputs wider than int; using such for assuming-no-signed-wrap addition always works, but would mean that the comparison wouldn't detect overflow. Though here x86-64, aarch64, and RISC-V all do provide instructions for 32-bit arith, matching their int. This would be a bigger thing if it were possible to have sub-int-sized arith.


Which UB upsets you? Can you be specific so we can revert it?


All of it. But especially anything added after C89 that was not already there implicitly.

Edit: okay, not all of it. I was hyperbolic. Race conditions and data races should be UB. But anything that can be implementation-defined should be.


So your issue is not at all any specific thing or action anyone took, but just in general having UB in places not strictly necessary. And "Especially anything [different from The Golden Days]", besides being extremely cliche, is a completely arbitrary cutoff point.

A given compiler is free to define specific behavior for UB (and indeed you can add compiler flags to do that for many things); the standard explicitly acknowledges that with "Possible undefined behavior ranges from […], to behaving during translation or program execution in a documented manner characteristic of the environment".


Sigh...yes, I don't want any UB where it's not necessary.

But if you must have a concrete example, how about realloc?

In C89 [1] (page 155), realloc with a 0 size and a non-NULL pointer was defined as free:

> If size is zero and ptr is not a null pointer, the object it points to is freed.

In C99 [2] (page 314), that sentence was removed, making it undefined behavior when it wasn't before. This is a pure example of behavior becoming undefined when it was not before.

In C11 [3] (page 349), that sentence remains gone.

In C17 [4] (page 254), we get an interesting addition:

> If size is zero and memory for the new object is not allocated, it is implementation-defined whether the old object is deallocated. If the old object is not deallocated, its value shall be unchanged.

So the behavior switches from undefined to implementation-defined.

In C23 [5] (page 357), the wording completely changes to:

> ...or if the size is zero, the behavior is undefined.

So WG14 made it UB again after making implementation-defined.

SQLite targets C89, but people compile it with modern compilers all the time, and those modern compilers generally default to at least C99, where the behavior is UB. I don't know if SQLite uses realloc that way, but if it does, are you going to call it buggy just because the authors stick to C89 and their users use later standards?

[1]: https://web.archive.org/web/20200909074736if_/https://www.pd...

[2]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

[3]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

[4]: https://web.archive.org/web/20181230041359if_/http://www.ope...

[5]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3047.pdf


If SQLite wants exactly C89, it can just require -std=c89, and then people compiling it with a different standard target are to blame. This is just standard backwards incompatibility, nothing about UB (in other languages requiring specific compiler/language versions is routine). Problems would arise even if it was changed from being a defined 'free(x)' to being a defined 'printf("here's the thing you realloc(x,0)'d: %p",x)'. (whether the C standard should always be backwards compatible is a more interesting question, but is orthogonal to UB)

I do remember reading somewhere that a real platform in fact not handling size 0 properly (or having explicitly-defined behavior going against what the standard allowed?) being an argument for changing the standard requirement. It's certainly not because compiler developers had big plans for optimizing around it, given that both gcc and clang don't: https://godbolt.org/z/jjcGYsE7W. and I'm pretty sure there's no way this could amount to any optimization on non-extremely-contrived examples anyway.

I had edited one of my parent comments to mention realloc, so if we both landed on the same example, there's probably not that many significant other cases.


> If SQLite wants exactly C89, it can just require -std=c89, and then people compiling it with a different standard target are to blame.

Backwards compatibility? I thought that was a target for WG14.

> This is just standard backwards incompatibility, nothing about UB

But UB is insidious and can bite you with implicit compiler settings, like the default to C99 or C11.

> whether the C standard should always be backwards compatible is more interesting, but is a question orthogonal to UB

If it's a target, then it should be.

And on the contrary, UB is not orthogonal to backwards compatibility.

Any UB could have been made implementation-defined and still be backwards compatible. But it's backwards-incompatible to make anything UB that wasn't UB. These count as examples of WG14 screwing over its users.

> I do remember some mention somewhere of a real platform in fact not handling size 0 properly being an argument for reducing the standard requirement.

So WG14 just decides to screw over users from other platforms? Just keep it implementation-defined! It already was! And that's still a concession from the pure defined behavior of C89!

> I had edited one of my parent comments to mention realloc, so if we both landed on the same example, there's probably not that many significant other cases.

I beg to differ. Any case where UB was implicit just because it wasn't defined in the standard could have easily been made implementation-defined instead.

Anytime WG14 adds UB that doesn't need to be UB, it is screwing over users.


> Backwards compatibility? I thought that was a target for WG14.

C23 removed K&R function declarations. Indeed backwards-compatibility is important for them, but it's not the be-all end-all.

Having a standard state exact possible behavior is meaningless if in practice it isn't followed. And it wasn't just implementation-defined, it had a specific set of options for what it could do.

> Any case where UB was implicit just because it wasn't defined in the standard could have easily been made implementation-defined instead. Any UB could have been made implementation-defined and still be backwards compatible. But anything that wasn't UB that now is counts as an example of WG14 screwing over its users.

If this is such a big issue for you, you could just name another example. It'd take, like, 5 words to say another feature in question unnecessarily changed. I'll happily do the research on how it changed over time.

It's clear that you don't like UB, but I don't think you've said anything more than that. I quite like that my compiler will optimize out dead null comparisons or some check that collapses to a 'a + C1 < a' after inlining/constant propagation. I think it's quite neat that not being able to assume signed wrapping means that one can run sanitizers that warn on such, without heaps of false-positives from people doing wrapping arith with it. If anything, I'd want some unsigned types with no unsigned wrapping (though I'd of course still want some way to do wrapping arith where needed)


> Having a standard state exact possible behavior is meaningless if in practice it isn't followed.

No, it means that the bug is documented to be in the platform, not the program.

> If this is such a big issue for you, you could just name another example. It'd take, like, 5 words to say another feature in question unnecessarily changed.

Okay, how about `signal()` being called in a multi-threaded program? Why couldn't they define it in C11 such that it could be called? Obviously, such a thing didn't really exist in C99, but it did in POSIX, and in POSIX, it wasn't, and still isn't, undefined. Why couldn't WG14 have simply made it implementation-defined?

> I quite like that my compiler will optimize out dead null comparisons or some check that collapses to a 'a + C1 < a' after inlining/constant propagation.

I'd rather not be forced to be a superhuman programmer.


> No, it means that the bug is documented to be in the platform, not the program.

Yes, it means that the platform is buggy, but that doesn't help anyone wanting to write portable-in-practice code. The standard specifying specific behavior is just giving a false sense of security.

> Okay, how about `signal()` being called in a multi-threaded program? Why couldn't they define it in C11 such that it could be called?

This is even more definitely not a case of compiler developer conflict of interest. And it's not a case of previously-defined behavior changing, so that set remains still at just realloc. (I wouldn't be surprised if there are more, but if it's not a thing easily listed off I find it hard to believe it's a real significant worry)

But POSIX defines it anyway; and as signals are rather pointless without platform-specific assumptions, it's not like it matters for portability. Honestly, having signals as-is in the C standard feels rather useless to me in general. And 'man 2 signal' warns to not use 'signal()', recommending the non-standard sigaction instead.

And, as far as I can tell, implementation-defined vs undefined barely matters, given that a platform may choose to define the implementation-defined thing as doing arbitrary things anyway, or, conversely, indeed document specific behavior for undefined things. The most significant thing I can tell from the wording is that implementation-defined requires the behavior to be documented, but I am fairly sure there are many C compilers that don't document everything implementation-defined.

> I'd rather not be forced to be a superhuman programmer.

All you have to do is not use signed integers for doing modular/bitwise arithmetic just as much as you don't use integers for doing floating-point arithmetic. It's not much to ask. And the null pointer thing isn't even an issue for userspace code (i.e. what 99.99% of programmers write).

I do think think configuring behavior of various things should be more prevalent & nicer to do; even in cases where a language/platform does define specific behavior, it may nevertheless be undesired (e.g. a+1<a might not work for overflow checking if signed addition was implementation-defined (and, say, a platform defines it as saturating), and so portable projects still couldn't use it for such).


If you want a programming language without undefined behaviour, you want something that's not C.


Correct. Which is why I made my own. But C is still better than other languages because it is small.


It looks small, but it's not really -- the C abstract machine differs too much from the actual hardware it's running on.

You could write a "CVM", akin to the JVM, that runs C code in a virtual environment that matches the abstract machine. Or you can let your compiler deal with the differences, which leads to unhappiness such as is exhibited in this discussion thread and the article it's discussing.


Suppliers generally decide what guarantees they are able and willing to give, yes.


> if your battery caught fire just because your CRUD app dereferenced NULL, nobody (well, nobody sane) would point the finger at the app author for forgetting to check for NULL.

I think pretty much anyone sane would and would be right to do so. Incorrect code is, well, incorrect and safety critical code shouldn’t use UB. Plus, it’s your duty as a software producer to use an appropriate toolchain and validate the application produced. You can’t offload the responsibility of your failure to do so to a third party (doesn’t stop people for trying all the time with either their toolchains or a library they use but that shouldn’t be tolerated and be pointed as the failure to properly test and validate it is).

I would be ashamed if fingers were pointed towards a compiler provider there unless said provider certified that its compiler wouldn’t do that and somehow lied (but even then, still a testing failure on the software producer part).


> I think pretty much anyone sane would and would be right to do so. Incorrect code is, well, incorrect and safety critical code shouldn’t use UB

You missed the whole point of the example. I gave CRUD app as an example for a reason. We weren't talking safety-critical code like battery firmware here.


Because your exemple isn’t credible. But even then I don’t think I missed the point, no. You are responsible for what your application does (be it a CRUD app or any others). If it causes damage because you fail to test properly, it is your responsibility. The fact that so many programmers fail to grasp this - which is taken as evidence in pretty much any other domain - is why the current quality of the average piece of software is so low.

Anyway, I would like to know by which magic you think a CRUD app could burn a battery? There is a whole stack of systems to prevent that from ever happening.


> There is a whole stack of systems to prevent that from ever happening.

You've almost got the point your parent is trying to make. That the supply chain shares this responsibility, as they said.

> I would like to know by which magic you think a CRUD app could burn a battery?

I don't know about batteries, but there was a time when Dell refused to honour their warranty on their Inspiron series laptops if they found VLC to be installed. Their (utterly stupid) reasoning? That VLC allows the user to raise the (software) volume higher than 100%. It was their own damn fault for using poor quality speakers and not limiting allowable current through them in their (software or hardware) drivers.


> You've almost got the point your parent is trying to make. That the supply chain shares this responsibility, as they said.

Deeply disagree. Failsafe doesn’t magically remove your responsibility.

I’m so glad I started my career in a safety critical environment with other engineers working on the non software part. The amount of software people who think they can somehow absolve themselves of all responsibility for shipping garbage still shock me after 15 years in the field.

> It was their own damn fault for using poor quality speakers

Yes, exactly, I’m glad to see we actually agree. It’s Dell’s fault - not the speaker manufacturer’s fault, not the subcontractor who designed the sound part’s fault - Dell’s fault because they are the one who actually shipped the final product.


>> ... shares this responsibility

> Deeply disagree. ... doesn't magically remove your responsibility.

??

Literally no-one in this thread is talking about "removing responsibility", except you.

> I'm so glad ... in the field.

I don't know which demon you're trying beat back here, nor why.

> It's Dell's fault - not ...

That it is Dell's fault is not under question, but it also does not automatically absolve the speaker manufacturer or the subcontractor. Hold on, isn't that exactly the drum you've been trying to beat here?

You and I have no idea what actually went down. Maybe the speaker was wrongly rated as being able to take a higher current than it actually could. Or maybe there was a bug in the driver. Either would make someone other than Dell also responsible for the failure.

And that's what we've been trying to tell you. That responsibility is shared.


I think the author knows very well what UB is and means. But he’s thinking critically about the whole system.

UB is meant to add value. It’s possible to write a language without it, so why do we have any UB at all? We do because of portability and because it gives flexibility to compilers writers.

The post is all about whether this flexibility is worth it when compared with the difficulty of writing programs without UB.

The author makes the case that (1) there seem to be more money lost on bugs than money saved on faster bytecode and (2) there’s an unwillingness to do something about it because compiler writers have a lot of weight when it comes to what goes into language standards.


Even stipulating that part of the argument, the author then goes on a tear about optimizations breaking constant-time evaluation, which doesn’t have anything to do with UB.

The real argument seems to be that C compilers had it right when they really did embody C as portable assembly, and everything that’s made that mapping less predictable has been a regression.


But C never had been portable assembly.

Which I think is somewhat the core of the problem. People treating things in C in ways they just are not. Weather that is C is portable assembly or C the "it's just bit's in memory" view of things (which often is double wrong ignoring stuff like hardware caching). Or stuff like writing const time code based on assuming that the compiler probably, hopefully can't figure out that it can optimize something.

> The real argument seems to be that C compilers had it right when they really did embody C as portable assembly

But why would you use such a C. Such a C would be slow compared to it's competition while still prone to problematic bugs. At the same time often people seem to forgot that part of UB is rooted in different hardware doing different things including having behavior in some cases which isn't just a register/mem address having an "arbitrary value" but more similar to C UB (like e.g. when it involves CPU caches).


> But C never had been portable assembly.

The ANSI C standards committee disagrees with you.

"Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler:”

https://www.open-std.org/JTC1/SC22/WG14/www/docs/n897.pdf

p 2, line 39. (p10 of the PDF)

"C code can be portable. "

line 30


The full quote is:

> Although it strove to give programmers the opportunity to write truly portable programs, the C89 Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler:” the ability to write machine-specific code is one of the strengths of C. It is this principle which largely motivates drawing the distinction between strictly conforming program and conforming program (§4).

This doesn't say that C is a high-level assembly.

It just says that the committee doesn't (at that point in time) wants to force the usage of "portable" C as a mean to prevent the usage of C as high-level assembler. But just because some people use something as high level assembler doesn't mean it is high level assembly (like I did use a spoon as a fork once, it's still a spoon).

Furthermore the fact that they explicitly mention forcing portable C with the terms "to preclude" and not "to break compatibility" or similar I think says a lot about weather or not the committee thought of C as high level assembly.

Most importantly the quote is about the process of making the first C standard which had to make sure to ease the transition from various non standardized C dialects to "standard C" and I'm pretty sure that through the history there had been C dialects/compiler implementations which approached C as high level assembly, but C as in "standard C" is not that.


It specifically says that the use of C as a "portable assembler" is a use that the standards committee does not want to preclude.

Not sure how much clearer this can be.


That statement means the comittee does not want to stop it from being developed. The question is, has it? They mean a specific implementation could work as portable assembler, mirroring djb's request for an 'unsurprising' C compiler. Another interpretation would be in the context of CompCert, which has been developed to achieve semantic preservation between assembly and its source. Interestingly this of course hints at verifying an assembled snippet coming from some other source as well. Then that alternate source for the critical functions frees the rest of compiler internals from the problems of preserving constant-timeness and leakfreedom through their passes.


No.

C already existed prior to the ANSI standardization process, so there was nothing "to be developed", though a few changes were made to the language, in particular function prototypes.

C was being used in this fashion, and the ANSI standards committee made it clear that it wanted the standard to maintain that use-case.


These are aspiration statements, not a factual judgment of what that standard or its existing implementations actually are. At least they do not cover all implementations nor define precisely what they cover. Note the immediate next statement: "C code can be non-portable."

In my opinion, C has tried to serve two masters and they made a screw-hammer in the process.

The rest of the field has moved on significantly. We want portable behavior, not implementation-defined vomit that will leave you doubting whether porting introduces new UB paths that you haven't already fully checked against (by, e.g. varying the size of integers in such a way some promotion is changed to something leading to signed overflow; or bounds checking is ineffective).

The paragraph further down about explicitly and swiftly rejecting a validation test suite should also read as a warning. Not only would the proposal of modern software development without a test suite get you swiftly fired today, but they're explicitly acknowledging the insurmountable difficulties in producing any code with consistent cross-implementation behavior. But in the time since then, other languages have demonstrated you can reap many of the advantages of close-to-the-metal without compromising on behavior consistency in cross-target behavior, at least for many relevant real-word cases.

They really knew what they were building, a compromise. But that gets cherry-picked into absurdity such as stating C is portable in present-tense or that any inherent properties make it assembly-like. It's neither.


These are statements of intent. And the intent is both stated explicitly and also very clear in the standard document that the use as a "portable assembler" is one of the use cases that is intended and that the language should not prohibit.

That does not mean that C is a portable assembly language to the exclusion of everything and anything else, but it also means the claim that it is definitely in no way a portable assembly language at all is also clearly false. Being a portable assembly (and "high level" for the time) is one of the intended use-cases.

> In my opinion, C has tried to serve two masters and they made a screw-hammer in the process.

Yes. The original intent for which it was designed and in which role it works well.

> The rest of the field has moved on significantly. We want portable behavior, not implementation-defined vomit that will leave you doubting whether porting introduces new UB paths that you haven't already fully checked against

Yes, that's the "other" direction that deviates from the original intent. In this role, it does not work well, because, as you rightly point out, all that UB/IB becomes a bug, not a feature.

For that role: pick another language. Because trying to retrofit C to not be the language it is just doesn't work. People have tried. And failed.

Of course what we have now is the worst of both worlds: instead of either (a) UB serving its original purpose of letting C be a fairly thin and mostly portable shell above the machine, or (b) eliminating UB in order to have stable semantics, compiler writers have chosen (c): exploiting UB for optimization.

Now these optimizations alter program behavior, sometimes drastically and even impacting safety (for example by eliminating bounds checks that the programmer explicitly put in!), despite the fact that the one cardinal rule of program optimization is that it must not alter program behavior (except for execution speed).

The completely schizophrenic "reasoning" for this altering of program behavior being somehow OK is that, at the same time that we are using UB to optimize all over the place, we are also free to assume that UB cannot and never does happen. This despite the fact that it is demonstrably untrue. After all UB is all over the C standard, and all over real world code. And used for optimization purposes, while not existing.

> They really knew what they were building, a compromise.

Exactly. And for the last 3 decades or so people have been trying unsuccessfully to unpick that compromise. And the result is awful.

The interests driving this are also pretty clear. On the one hand a few mega-corps for whom the tradeoff of making code inscrutable and unmanageable for The Rest of Us™ is completely worth it as long as it shaves off 0.02% running time in the code they run on tens or hundreds of data centers and I don't know how many machines. On the other hand, compiler researchers and/or open-source compiler engineers who are mostly financed by those few megacorps (the joy of open-source!) and for whom there is little else in terms of PhD-worthy or paid work to do outside of that constellation.

I used to pay for my C compiler, thus there was a vendor and I was their customer and they had a strong interest in not pissing me off, because they depended on me and my ilk for their livelihood. This even pre-dated the first ANSI-C standard, so all the compiler's behavior was UB. They still didn't pull any of the shenanigans that current C compilers do.


Back in 1989, when C abstract machine semantics were closer to being a portable macro processor, and stuff like the register keyword was actually something compilers cared about.


And even then there was no notion of constant-time being observable behavior to the compiler. You cannot write reliably constant-time code in C because execution time is not a property the C language includes in its model of computation.


But having a straightforward/predictable mapping to the underlying machine and its semantics is included in the C model of computation.

And that is actually not just compatible with the C "model of computation" being otherwise quite incomplete, these two properties are really just two sides of the same coin.

The whole idea of an "abstract C machine" that unambiguously and completely specifies behavior is a fiction.


> But having a straightforward/predictable mapping to the underlying machine and its semantics is included in the C model of computation.

While you can often guess what the assembly will be from looking at C code given that you're familiar with the compiler, exactly how C is to be translated into assembly isn't well-specified.

For example, you can't expect that all uses of the multiplication operator "*" results in an actual x86 mul instruction. Many users expect constant propagation, so you can write something like "2 * SOME_CONSTANT" without computing that value at runtime; there is no guarantee of this behavior, though. Also, for unsigned integers, when optimizations are turned on, many expect compilers to emit left shift instructions when multiplying by a constant power of two, but again, there's no guarantee of this. That's not to say this behavior couldn't be part of a specification, but it's just an informal expectation right now.

What I think people might want is some readable, well-defined set of attribute grammars[0] for translation of C into assembly for varying optimization levels - then, you really would be able to know exactly how some piece of C code under some context would be translated into assembly. They've already been used for writing code generator generators in compilers, but what I'm thinking is something more abstract, not as concrete as a code generation tool.

[0]: https://en.wikipedia.org/wiki/Attribute_grammar


> exactly how C is to be translated into assembly isn't well-specified.

Exactly! It's not well-specified so the implementation is not prevented from doing a straightforward mapping to the machine by some part of the spec that doesn't map well to the actual machine.


> But having a straightforward/predictable mapping to the underlying machine and its semantics is included in the C model of computation.

not rally, or at least not in a way which would count as "high level assembler". If it would the majority of optimizations compilers do today would not be standard conform.

Like there is a mapping to behavior but not a mapping to assembly.

Which is where the abstract C machine as a hypothetical machine formed from the rules of the standard comes in. Kinda as a mind model which runs the behavior mappings instead of running any specific assembly. But then it not being ambiguous and complete doesn't change anything about C not being high level assembly, actually it makes C even less high level assembly.


> If it would the majority of optimizations compilers do today would not be standard conform.

They aren't.


So you can easily tell, just by looking to the C source code, if plain Assembly instructions are being used from four books of ISA manual, if the compiler is able to automatically vectorize a code region including which flavour of vector instructions, or completely replace specific math code patterns for a single opcode.


Nobody says that implementation-defined behavior must be sane or safe. The crux of the issue is that a compiler can assume that UB never happens, while IB is allowed to. Does anyone have an example where the assumption that UB never happens actually makes the program faster and better, compared to UB==IB?


The issue is that you’d have to come up with and agree on an alternative language specification without (or with less) UB. Having the compiler implementation be the specification is not a solution. And such a newly agreed specification would invariably either turn some previously conforming programs nonconforming, or reduce performance in relevant scenarios, or both.

That’s not to say that it wouldn’t be worth it, but given the multitude of compiler implementations and vendors, and the huge amount of existing code, it’s a difficult proposition.

What traditionally has been done, is either to define some “safe” subset of C verified by linters, or since you probably want to break some compatibility anyway, design a separate new language.


> UB is meant to add value. It’s possible to write a language without it, so why do we have any UB at all? We do because of portability and because it gives flexibility to compilers writers.

Implementation-defined behavior is here for portability for valid code. Undefined behavior is here so that compilers have leeway with handling invalid conditions (like null pointer dereference, out-of-bounds access, integer overflows, division by zero ...).

What does it mean that a language does not have UBs? There are several cases how to handle invalid conditions:

1) eliminate them at compile time - this is optimal, but currently practical just for some classes of errors.

2) have consistent, well-defined behavior for them - platforms may have vastly different way how to handle invalid conditions

3) have consistent, implementation-defined behavior for them - usable for some classes of errors (integer overflow, division by zero), but for others it would add extensive runtime overhead.

4) have inconsistent behavior (UB) - C way


> It’s possible to write a language without it

Whenever you do that, programmers deride the language for being "excessively academic" or something


Fwiw clang has a `clang::optnone` attribute to disable all optimizations on a per-function basis, and GCC has the fantastic `gnu::optimize` attribute which allows you to add or remove optimizations by name, or set the optimization level regardless of compiler flags. `gnu::optimize(0)` is similar to that clang flag. Clang also has `clang::no_builtins` to disable specifically the memcpy and memset optimizations.


"The optimize attribute should be used for debugging purposes only. It is not suitable in production code. "

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attribute...


That's an interesting note. I wonder why they claim this. As far as I know, `[[gnu::optimize("-fno-tree-loop-distribute-patterns")]]` (or the equivalent #pragma) is required for implementing a memcpy function in C unless you do something funky with the build system.


Because these optimizations are not supposed to be a part of the user interface. They're internal passes and may one day go away or be merged or subsumed etc.


Maybe that's applied to the TU that defines it? I don't see it in the glibc sources.


It's indeed used:

Definition of macro: https://sourceware.org/git/?p=glibc.git;a=blob;f=include/lib...

Use: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/memm...

For a bunch of other places -fno-builtin-* seems to be used.


My guess is that the attribute interacts weirdly with optimizations spanning function calls, like inlining. It might be safe for TUs that define a single function, like with memmove above. Having said that applying the option to the TU itself would be equivalent, and not discouraged by GCC docs.


As far as I can tell it's due to some bugs


I'm vaguely sympathetic to these crypto people's end goals (talking about things like constant time evaluation & secret hiding), but it's really not what general purpose compilers are even thinking about most of the time so I doubt it'll ever be more than a hack that mostly works.

They'll probably need some kind of specialized compiler of their own if they want to be serious about it. Or carry on with asm.


The author has written such a compiler: https://cr.yp.to/qhasm.html (or at least, a prototype for one)


Jasmin has largely replaced qhasm.


Jasmin is also an assembler for JVM bytecode, love overloaded names.


I can't help but feel we're going to think of these as the bad old years, and that at some point we'll have migrated off of C to a language with much less UB. It's so easy to express things in C that compile but that the compiler couldn't possibly guess the intent of because C doesn't have a way to express it.

For instance, in Python you can write something like:

  result = [something(value) for value in set_object]
Because Python's set objects are unordered, it's clear that it doesn't matter in which order the items are processed, and that the order of the results doesn't matter. That opens a whole lot of optimizations at the language level that don't rely on brilliant compilers implying what the author meant. Similar code in another language with immutable data can go one step further: since something(value1) can't possibly affect something(value2), it can execute those in parallel with threads or processes or whatever else makes it go fast.

Much of the optimization of C compilers is looking at patterns in the code and trying to find faster ways to do what the author probably meant. Because C lacks the ability to express much intent compared to pretty much any newer language, they have the freedom to guess, but also have to make those kinds of inferences to get decent performance.

On the plus side, this might be a blessing in disguise like when the Hubble telescope needed glasses. We invented brilliant techniques to make it work despite its limitations. Once we fixed its problems, those same techniques made it perform way better than originally expected. All those C compiler optimizations, applied to a language that's not C, may give us superpowers.


On the Python example, the downside is that, even though the order is unspecified, people may still rely on some properties, and have their code break when an optimizer changes the order. Basically the same as UB really, though potentially resulting in wrong results, not necessarily safety issues (at least not immediately; but wrong results can turn into safety issues later on). And, unlike with UB, having a "sanitizer" that verifies that your code works on all possible set orders is basically impossible.

gcc/clang do have a variety of things for providing low-level hints to the compiler that are frequently absent in other languages - __builtin_expect/__builtin_unpredictable, __builtin_unreachable/__builtin_assume, "#pragma clang loop vectorize(assume_safety)"/"#pragma GCC ivdep", more pragmas for disabling loop unrolling/vectorizing or choosing specific values. Biggest thing imo missing being some "optimization fences" to explicitly disallow the compiler to reason about a value from its source (__asm__ can, to an extent, do this, but has undesired side-effects, and needs platform-specific register kind names).

There's certainly potential in higher-level intent-based optimization though. Things coming to mind being reserving space in an arraylist before a loop with n pushes, merging hashmap lookups in code doing contains→get→put with the same key, simplifying away objects/allocations from ability to locally reason about global allocation behavior.


While all that makes sense in theory none of it has actually demonstrated to be faster than C. The compiler doesn't need to guess what the programmer is trying to do because C is close enough to the actual hardware that the programmer can just tell it what to do.


Note that C code has hardly fast outside big iron UNIX, during the 1980's and up to the mid 1990's, any half clever developer could easily outperform the generated machine code, with manually written Assembly code.

Hence why games for 8 and 16 bit home computers were mostly written in Assembly, and there were books like the Zen of Assembly Programming.

It was the way that optimizating compilers started to exploit UB in C, that finally made it fast enough for modern times.

Modern hardware has nothing to do with C abstract machine.


Funnily part of why Python is well, one of the slowest widely used languages is that any AOT compiler has a really hard time guessing what it does ("slowest" for pure python only, it's still often more then fast enough).

Through then the "cognitive"/"risk" overhead of large complicated C code bases in typical company use cases (*1) makes it so that you have to be very strict/careful about doing any optimizations in C at all. In which case ironically your perf. can easily be below that of e.g. go, Rust, C#, Java etc. (depending on use case). I.e. in the typical code base the additional optimizations the compiler can do due to better understanding as well as less risky but limited/simple ad-hoc human optimizations beat out C quite often.

In a certain way it's the same story as back then with ASM, in theory in some use-cases it's faster but in practice for a lot of real world code with real world constraints of dev-hours and dev-expertise writing C was the better business choice.

(1) I.e. hardly any resources for optimization for most code. Potentially in general a few to little devs for the tasks/deadlines. Definitely no time to chase UB bugs.



It's certainly true that there's room for semantic optimization, but my observation is that such optimization is largely around memory allocation.

And AFAIK the only languages which tend to implement such memory optimizations are Java-like, and the only reason they bother is because of their up-front aggressive pessimization ... which the optimization can't make up for.

Edit: my point is: yes C sucks, but everybody else is worse


If you don't like C's semantics then how about using a different programming language instead of getting angry at compiler engineers.


I'm honestly unsure whether djb would actually find anything other than his qhasm tolerable (yes, even Zig). I find this particular commentary from him unsurprising.


Zig will remove many UB but it will add a new nasty one in case of pass by value parameter aliasing with a parameter passed by pointer..

*: https://ziglang.org/documentation/master/#toc-Pass-by-value-...


Refreshing post that conveys a perspective I haven't seen voiced often. See also: https://gavinhoward.com/2023/08/the-scourge-of-00ub/


It's free software, they are completely free to fork it make it have whatever semantics they want if they don't like the ISO C semantics. They can't really expect someone else to do that for them for free, and especially this sort of post is not exactly the sort of thing that would any of the compiler people to come to djbs side


Demonstrating how some languages and some compilers are bad at tasks such as writing constant-time crypto routines is fine. Concluding that all compilers and non-asm languages are bad is a non sequitur. Just because you don't want non-branching code to change into branching code doesn't mean you should have to do register allocation by hand. Write simple domain-specific compilers and languages people.


> (As a side note, I would expect this conditional branch to slow down more code than it speeds up. But remember that compiler writers measure an "optimization" as successful if they can find any example where the "optimization" saves time.)

Wildly false, and I have no idea where the author is getting this idea from. If you regress people's code in LLVM, your patch gets reverted.


Very interesting article and much-needed criticism of the current standard of heuristic optimization.

Before reading this, I thought that a simple compiler could never usefully compete against optimizing compilers (which require more manpower to produce), but perhaps there is a niche use-case for a compiler with better facilities for manual optimization. This article has inspired me to make a simple compiler myself.


You don't need to get rid of all optimizations though, just the "unsafe" ones. And you could always make them opt-in instead of opt-out.

Now I'm definitely closer to a noob, but compilers already have flags like no-strict-overflow and no-delete-null-pointer-checks. I don't see why we can't make these the default options. It's already "undefined behavior" per the spec, so why not make it do something sensible. The only danger is that some pedant comes along and says that with these assumptions what you're now writing isn't "portable C" and relies on compiler-defined behavior, but in the real world if it does the correct thing I don't think anyone would care: just call your dialect "boringC" instead of C99 or something (borrowing Gavin Howard's term), and the issue disappears.


> And you could always make them opt-in instead of opt-out.

> The only danger is that some pedant comes along and says that with these assumptions what you're now writing isn't "portable C" and relies on compiler-defined behavior, but in the real world if it does the correct thing I don't think anyone would care: just call your dialect "boringC" instead of C99 or something (borrowing Gavin Howard's term), and the issue disappears.

My idea is to make a new language with some simple syntax like S-expressions. Compilation would be (almost entirely) done with lisp-like macros, but unlike lisp it would be an imperative language rather than a functional language. The main data structure would have to be a hierarchy (analogous to CONS) to facilitate these macros.

Optimizations (and Specializations) would be opt-in and would depend on the intrinsics and macros you allow in compilation. For example, you could start writing code with this default data structure, and later swap it out for some more specific data structure like a linked list or a hashtable. The most daunting problem is the issue of how the compiler selects what optimization or specializations to use; Optimizing for something like code size is straightforward, but optimizing for code speed will depend on what branches are taken at runtime. Now I suppose that the language should simply allow the programmer to manually express their preferences (which could be discovered through benchmarks/code studies).

I think that this could have a niche for manually-optimized code that requires strict static analysis and straight-forward compilation. It also could have a niche in decompilation/recompilation/reverse-engineering (I think that a similar process can run in reverse to disassemble even obfuscated code, because you could easily write a macro to reverse an ad-hoc obfuscation mechanism).

Here is another application of the language: By controlling the macros and intrinsics available at compilation, you could ensure compile-time security of userspace programs. For example, you could have a setup such that speculative execution vulnerabilities and the like are impossible to compile. I think you could safely enforce cooperative multitasking between programs.

I'll probably start with a simple assembly language like WASM, then LLVM-IR. Eventually it would have JS/C/Rust bindings to interoperate with normal libraries.

Lastly, I would like to make it so you can write routines that are portable between CPUs, GPUs, and even FPGAs, but this would be very difficult and this functionality may better realized with a functional language (e.g. CLASP https://github.com/clasp-developers/clasp) or may require programmers to work at an uncomfortably high level of abstraction.


Why does the code need to rely on hacks to get around optimizations? Can't they be disabled per-unit by just compiling different files with different optimization flags?


You can’t realistically have a C compiler that doesn’t do any optimizations.

For one thing, CPU caches are wonders of technology, but a C compiler that only uses registers for computations but stores all results in memory and issues a load for every read will be unbearingly slow.

So, you need a register allocator and if you have that, you either need (A) an algorithm to spill data to memory if you run out of registers, or (B) have to refuse to compile such code.

If you make choice A, any change to the code for spilling back to memory can affect timings and that can introduce a timing bug in constant-time code that isn’t branch-free.

Also, there still is no guarantee that code that is constant-time on CPU X also will be on CPU Y. For example, one CPU has single-cycle 64-bit multiplication, but another doesn’t.

If you make choice B, you don’t have a portable language anymore. Different CPUs have different amounts of registers, and they can have different features, so code that runs fine in on one CPU may not do so on another one (even if it has the exact same amount of registers of the same size).

Phrased differently: C isn’t a language that supports writing constant-time functions. If you want that, you either have to try hard to beat a C compiler into submission, and you will fail in doing that, or choose a different language, and that likely will be one that is a lot like the assembly language of the target CPU. You could make it _look_ similar between CPUs, but there would be subtle or not so subtle differences in semantics or in what programs the language accepts for different CPUs.

Having said that: a seriously dumbed down C compiler (with a simple register allocator that programmers can mostly understand, no constant folding, no optimizations replacing multiplications by bit shifts or divisions by multiplications, 100% honors ‘inline’ phrases, etc.) probably could get close to what people want. It might even have a feature where code that requires register spilling triggers a compiler error. I am not aware of any compiler with that feature, though.

I wouldn’t call that C, though, as programs written in it would be a lot less portable.


In my experience from writing a toy compiler, the speedup you get with a reasonable set of optimizatios compared to spilling each temporary result to memory is in the ballpark. There are vastly different situations of course, and very inefficient ways to write C vode that would require some compiler smartness, but 2x is a number that you'd have to contend with actual measured data to make the claims you made.

In many cases I'd suspect the caches are doing exactly what you alluded to, masking inefficiencies of unnecessary writes to memory, at least to an extent. You might be able to demonstrate a speedup of 100x but I suspect it would take some work or possibly involve an artificial usecase.


I recommend doing some experiments before considering no register allocation unbearingly slow. I once tried running Gentoo with everything compiled -O0 and the user experience with most software wasn't significantly different. The amount of performance critical C code on a modern PC is surprisingly low. Stuff like media decoding is usually done in assembly.


> I recommend doing some experiments before considering no register allocation unbearingly slow. I once tried running Gentoo with everything compiled -O0

AFAIK, register allocation is one of the few optimization passes which are always enabled on all compilers, even with -O0, so your experiment proves nothing.


It's decided by function use_register_for_decl in gcc: https://github.com/gcc-mirror/gcc/blob/releases/gcc-12/gcc/f... With -g -O0 register is only used in special cases like using the register keyword.

The memory accesses are also easily visible by disassembling the compiled binary. Performance of resulting binary at -O0 is also rougly similar to performance of binary produced by Tiny C Compiler, which doesn't implement register allocation at all.


That would bring us back to the days of 8 and 16 bit home computers, where the quality of C compilers outside UNIX, was hardly anything to be impressed about.


What is an optimization?

You wrote some code. It doesn't refer to registers. Is register allocation that minimized spillage an optimization? How would you write a compiler that has "non-optimizing" register allocation?


Surprised to see such an incoherent and trite rant from djb.

Compilers are not your enemy. Optimizing compilers do the things they do because that's what the majority of people using them want.

It also mixes in things that have nothing to do with optimizing compilers at all like expecting emulation of 64-bit integers on 32-bit platforms to be constant time when neither the language nor the library in question have ever promised such guarantees. Similar with the constant references to bool as if that's some kind of magical data type where avoiding it gives you whatever guarantees you wish. Sounds more like magical thinking than programming.

I'd file this under "why can't the compiler read my mind and do what I want instead of just what I asked it to".


What I'd really like is a way to express code in a medium/high level language, and provide hand-optimized assembly code alongside it (for as many target architectures as you need). For a first-pass, you could machine-generate that assembly, and then manually verify that it's constant time (for example) and perform additional optimizations over the top of that, by hand.

The "compiler"'s job would then be to assert that the behaviour of the source matches the behaviour of the provided assembly. (This is probably a hard/impossible problem to solve in the general case, but I think it'd be solvable in enough cases to be useful)

To me this would offer the best of both worlds - readable, auditable source code, alongside high-performance assembly that you know won't randomly break in a future compiler update.


A point of the post that I didn't see discussed here is this:

> LLVM 11 tends to take 2x longer to compile code with optimizations, and as a result produces code that runs 10-20% faster (with occasional outliers in either direction), compared to LLVM 2.7 which is more than 10 years old.

Yes, C code is expected to benefit less from optimizations, since it is already close to assembly. But compiler optimizations in the past decades had enormous impact - because they allowed better languages. Without modern optimizations, C++ would have never been as fast as C, and Rust wouldn't be possible at all. Same arguments apply to Java and JavaScript.


Rust is possible and proves that you don’t need "optimizations" to optimize, but that optimizations are actually possible. Now that's kind of irrelevent for most of the article focusing about constant versus variable time which is not really an "optimization" problem but already an optimization one, but at least putting appart this rust proves that a langage doesn't need to allow nasal daemons to get good perfs. You just apply the technics when you actually know they are correct, not when you speculate the existence of the mythical perfect programmer (where this hypothesis has actually be disproven by studies on the subject)


I specifically addressed the claim that compiler optimization are worthless. I did not addressed the other claims in the article.

In particular, however, Rust relies a lot on Undefined Behavior to optimize well. It manages to hide it (mostly) in the surface language, but in the IR they are necessary to perform well.


Let's consider this function:

  char* strappend(char const* input, size_t size) {
    char* ptr = malloc(size + 2);
    if (!ptr) return 0;
    memcpy(ptr, input, size);
    ptr[size] = 'a';
    ptr[size + 1] = 'b';
    return ptr;
  }
This function is undefined if size is SIZE_T_MAX.

Many pieces of code have these sorts of "bugs", but in practice no one cares, because the input required, while theoretically possible, physically is not.


It does something unexpected if size is SIZE_T_MAX-1, too. And it's also undefined if input is null and size is zero, which seems more likely to surprise that function's author. This is because memcpy requires valid pointer arguments even if the size is zero.

In particular, this usage invokes UB:

  const char *input = "";
  size_t len = strlen(input);
  char *buf = malloc(len);  // may return null if len is zero
  if (len) memcpy(buf, input, len);
  char \*buf2 = strappend(buf, len);
(Edited for formatting.)


I was already rolling my eyes but then I saw the unironic link to “The Death of Optimizing Compilers” and they might as well have fell out of my head. Someone please explain to the crypto people that designing a general-purpose language around side-channel resistance is actually stupid since most people don’t need it, optimizations actually do help quite a lot (…if they didn’t, you wouldn’t be begging for them: -O0 exists), and the model of UB C(++) has is not going away. If you want to make your own dedicated cryptography compiler that does all this stuff I honestly think you should and I would support such a thing but when you think the whole world is conspiring against your attempts to write perfect code maybe it’s you.


The crypto people really want to write slow code. That's what constant time means - your best case is as slow as your worst case. Noone else wants that so there's direct tension when they also want to work in some dialect of a general purpose language.


> The crypto people really want to write slow code. That's what constant time means - your best case is as slow as your worst case.

At least for a hot path constant-time algo, they want all cases to run as slow as the worst case. But just as important-- they want that algo to be fast as is feasible. AFAICT that's the only reason we're talking about C/C++ here.

The problem with writing "slow code" would have been that all the big companies who need to go fast would have chosen to roll their own hot-shot implementations for speed of it. That would introduce more risk into the most widely-used cases, while the "safe" version would have been relegated to the least used software.

Instead, the guy complaining in the article about compiler determinism wrote fast crypto things in C. AFAICT everybody just uses that. And he continues to complain about the potential of compiler indeterminacy-- indeterminacy in the name of optimization-- breaking the fast crypto things.

He also points out that in the cases where optimization really matters-- like ffmpeg-- the hot path code is hand-optimized and not left up to the compiler optimizer. I'd add audio plugins to that.

I'd also add fftw, which apparently has a runtime (method-space?) heuristic that checks which of its buttload of hand-optimized routines win the race on your particular cpu.


> It would be interesting to study what percentage of security failures can be partly or entirely attributed to compiler "optimizations".

I bet it's roughly none.


Deleting null pointer checks in the Linux kernel is the first one to come to mind


That's one CVE, right? How many other vulnerabilities were caused by compiler optimizations, whether they were bugs in the compiler or allowed by the spec?


You can probably enumerate them by searching for GCC compiler flags in the corresponding bug tracker. Start with ftrapv and fno-strict-aliasing. Those diverge-from-c flags exist to make code slower in exchange for not being broken.


The article links to an attack that extracts a 512-bit secret key in 5-10 minutes:

https://pqshield.com/pqshield-plugs-timing-leaks-in-kyber-ml...

https://github.com/antoonpurnal/clangover


Oh yeah, because no security failure was ever related to undefined behavior


This is a different question though. A lot of UB issues are related to out-of-bounds accesses and use-after-free. But those are problematic also without optimization. The cases where optimization introduce security issues are more subtle and less common. Signed overflow related issues come to mind, but there I think UB isnow part of the solution via sanitizers (and errors related to unsigned wraparound which is defined is the far more vexing problem) and similar for dereferencing null pointers which can also easily be catched by sanitizers.


Similarly to not checking array bounds, undefined behavior was once introduced in the name of efficiency - back in the ages when the performance difference really mattered.

And both are just a major headache now, and belong to reasons why few people start new projects in C.

I wonder how many such design decisions, relevant today, but with a potential to screw up future humanity, we are making right now.


Ok, as far as the efficacy/importance/tradeoff of optimizing compilers...

How do Firefox and Chrome perform if they are compiled at -O0?


The author's Clang patch is interesting, but I wonder if what he really wants is, like, a new optimization level "-Obranchless" which is like O2/O3 but disables all optimizations which might introduce new conditional branches. Presumably optimizations that _remove_ branches are fine; it's just that you don't want any deliberately branchless subexpression being replaced with a branch.

Basically like today's "-Og/-Odebug" or "-fno-omit-frame-pointers" but for this specific niche.

I'd be interested to see a post comparing the performance and vulnerability of the mentioned crypto code with and without this (hypothetical) -Obranchless.


... except that even my idea fails to help with software math. If the programmer writes `uint128 a, b; ... a /= b` under -Obranchless, does that mean they don't want us calling a C++-runtime software division routine (__udiv3 or however it's spelled) that might contain branches? And if so, then what on earth do we do instead? — well, just give an error at compile time, I guess.


Not branchless, they just need it to be constant-time. That is definitely doable with pure software division.


Yes, a compile failure would IMHO be the only useful result in that case.


Timing attacks are a very specialized problem. If you don't care about performance, why not wrap the critical section in:

  #pragma GCC push_options #pragma GCC optimize ("O0")
Exploiting UB in the optimizer can be annoying, but most projects with bad practices from the 1990s have figured it out by now. UBsan helps of course.

I'm pretty grateful for aggressive optimizations. I would not want to compile a large C++ codebase with g++ that has itself been compiled with -O0. Even a 20% speedup helps.

The only annoying issue with C/C++ compilers is the growing list of false positive warnings (usually 100% false positives in well written projects).


> The bugs admitted in the compiler changelogs are just the tip of the iceberg. Whenever possible, compiler writers refuse to take responsibility for the bugs they introduced, even though the compiled code worked fine before the "optimizations".

This makes it difficult to read the rest of the article. Really? All compiler authors, as a blanket statement, act in bad faith? Whenever possible?

> As a cryptographic example, benchmarks across many CPUs show that the avx2 implementation of kyber768 is about 4 times faster than portable code compiled with an "optimizing" compiler.

What? This is an apples to oranges comparison. Compilers optimize all code they parse; optimizing a single algorithm will of course speed up implementations of that specific algorithm, but what about the 99.9999999% of code which is not your particular hand-optimized algorithm?


> This makes it difficult to read the rest of the article. Really? All compiler authors, as a blanket statement, act in bad faith? Whenever possible?

When I saw the link was to DJB’s site, I figured the post would contain a vitriolic and hyperbolic rant. It’s pretty on-brand for him (although, to be fair, he’s usually right.)


I don't think DJB is right here, but I do think he is one of the few "ugh compilers taking advantage of UB" people who is actually serious about it. DJB wants absolute certainty in predicting the compiled code so that he can make significantly stronger guarantees about his programs than almost anybody else needs.

The bad news for him is that the bulk of clang users aren't writing core cryptographic primitives and really DJB just needs a different language and compiler stack for his specific goals.


The bad news for us is that plenty of cryptographic (or otherwise critical) code is already written in C or C++, and when compiler writers play with their optimizations, they cause real-world problems to a good portion of the population


> (although, to be fair, he’s usually right.)

This is worth emphasizing. I actually can't think of any articles of his other than this one that miss the mark.


The "debunking NIST's calculation" one was, if I'm remembering this right, refuted by Chris Peikert on the PQC mailing list immediately after it was posted.


I’m not sure this one is wrong, especially if you’ve been bitten by underdocumented compiler or framework changes that modify behavior of previously working code.

For example, I have a small utility app built against SwiftUI for macOS 13. Compiling on macOS 14 while still linking against frameworks for 13 results in broken UI interaction in a particular critical use case. This was a deliberate change made to migrate devs away from a particular API, but it fails silently at compile time and runtime. Moving the code back to a macOS 13 machine would produce the correct result.

As a dev, I can no longer trust that linking against specific library version will produce the same result and now need to think of some tuple of compile host and library version

At one point should working code be considered correct and complete when compiler writers change code generation that doesn’t depend on UB? I’m sure it’s worse for JITed languages where constant time operations work in test and for the first few hundred iterations and then are “optimized” into variable time branching instructions on a production host somewhere.


No, he’s wrong. What you’re talking about is completely different than what you are: your code doesn’t work because Apple changes the behavior of their frameworks, which has nothing to do with what compiler you’re using. There’s a different contract there than what a C compiler gives you.


It’s not quite that simple in Swift-land. There is a significant overlap between what’s compiler and what’s runtime. I’m supposedly linking against the same libraries, but changing the host platform changes the output binaries. Same code, same linked libraries, different host platform, different codegen.

Mine isn’t security critical, but the result is similarly unexpected.


It’s really not the same thing. Your complaint is about Apple’s complex framework API management, not about the compiler optimization/undefined behavior.

Swift frameworks sometimes blur the line the way I think you mean by being able to be back-deployed to earlier OS releases through embedding in your binary. Apple’s documentation is poor (and has been since the NeXT takeover in 1997), but, again, that’s not a compiler issue as such.


As someone that knows C but isn't familiar with compiler internals, I ask: would the disruptive optimizations discussed here kick in even when compiling with ootimizations tured off (-o0)?

C has also other issues related to undefined behavior and it being used for what I call "extreme optimizations" (e.g. not emitting code for an if branch that checks for a null pointer). Rust is emerging as an alternative to C that aims to fix many of its problems, but how does it fares in terms of writing constant-time code? Is it similar to C, easier or more complicated?


The rust compiler uses LLVM in the backend, so you still get all the same wild, complex compiler tricks at play. One of the most surprising to me is that you can sometimes improve performance by adding asserts in rust's code. For example, if you write this code:

    for i in 0..1_000_000 { do_stuff(my_array[i]); }
Then the compiler will do array bounds checking in each loop iteration. If you instead add an assert!(my_array.len() >= 1_000_000) before the loop, the compiler knows the bounds checks aren't needed and the loop runs faster.

But I think being able to rely on llvm's tricks makes rust better. For example, there's usually no overhead from writing functional code in rust using iterators. The compiler generally emits the same machine code as it would if you hand-wrote the equivalent series of for() loops.


I'm sick and tired of people expecting non-standard behaviour from C/C++ compilers when there are long estabished standards that clearly state what is allowed and what is not. If you are writing something like Unreal Engine and you resort on UB to get all of the performance you can get without writing assembly, then you also need to know you'll have to commit to a certain version of a certain compiler if you want a deterministic behaviour.


What an interesting discussion. Especially everything about that writing it in Asm would be the solution if you want secure code.

Both, gcc and clang, are orders of magnitude better tested than all the closed source applications, developed under tight timelines and that we essentially trust our lives with.

To be very clear, there are compiler bugs but those are almost never the problem in the first place. In the vast majority of cases it starts with buggy user code. An now back to handwritten assembly…


Computer security is not a serious field. There is no other group that honestly feels "do what I meant, not what I said" is a sign of someone else's bug.


So, should we be compiling security-critical code with `-O0` then?


UB means undefined behavior

Somehow it took me long minutes to infer this.


Was hoping the title was a pun on Spy vs Spy[0].

[0] https://en.wikipedia.org/wiki/Spy_vs._Spy


Man attempts to write constant time algorithms using language that does not support constant time algorithms, but who’s really at fault here?

Find out on next weeks episode of “lets blame compilers rather than my choice of language”!


Compile your code with `-O0` and shut up already.


Unfortunately GCC’s codegen for GCC’s x86 intrinsics headers is really remarkably awful at -O0, particularly around constant loads and broadcasts, because those usually use code that’s as naïve as possible and rely on compiler optimizations to actually turn it into a broadcast, immediate, or whatever. (I haven’t checked Clang.)


"Unfortunately GCC’s codegen for GCC’s x86 intrinsics headers is really remarkably awful at -O0" - but that kind of seems to be what is asked for..


No. If I say (e.g.) _mm256_set_epi32(a,b,...,c) with constant arguments (which is the preferred way to make a vector constant), I expect to see 32 aligned bytes in the constant pool and a VMOVDQA in the code, not the mess of VPINSRDs that I’ll get at -O0 and that makes it essentially impossible to write decent vectorized code. The same way that I don’t expect to see a MUL in the assembly when I write sizeof(int) * CHAR_BIT in the source (and IIRC I won’t see one).

(Brought to you by a two-week investigation of a mysterious literally 100× slowdown that was caused by the fact that QA always ran a debug build and those are always compiled at -O0.)


Seems you want the compiler to do some optimization, to improve the generated code. Or?


In this case, I’d expect constant folding to be the absolute minimum performed at all optimization levels. It is, in fact,—for integers. For (integer) vectors, it’s not, even though it’s much more important there. That’s why advising cryptographers who program using vector intrinsics (aka “assembly except you get a register allocator”) to compile with GCC at -O0 is such bad advice. (Just checked MSVC and it’s better there.)

There are, however, more unambiguous cases, where Intel documents an intrinsic to produce an instruction, but GCC does not in fact produce said instruction from said intrinsic unless optimization is enabled. (I just don’t remember them because constants in particular were so ridiculously bad in the specific case I hit.)


If you constant fold and keep things in registers then you generally can't look at or change the pieces in a debugger. So everything gets written to the stack where it's easy to find.


Clang tends to put everything on the stack at -O0 and actually try to do register allocation only as an optimization.


Generally? Sure, so does GCC, but that’s IME less impactful than a pessimized vectorized routine. (Corollary that I hit literally yesterday: exclusively at -O0, until you step past the opening brace—i.e. the function prologue—GDB will show stack garbage instead of the function arguments passed in registers.)


I would put it differently:

If you want your code to contain specific assembly instructions, code in assembly. Programming language by design is an abstraction of a higher level and when you use it you shouldn't care that much about actual assembly it produces.


Complains about branching, but doesn't even mention `__builtin_expect_with_probability`.


There's no point in mentioning something that doesn't solve the issue.


It's as close an answer as you're going to get while using a language that's unsuitable for the issue.

And in practice it's pretty reliable at generating `cmov`s ...


Why is that relevant?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: