Hacker News new | past | comments | ask | show | jobs | submit login

I don't really see what's to be accomplished by most of the points of this. A program that invokes undefined behaviour isn't just invalid; it's almost certainly _wrong_. Shifting common mistakes from undefined behaviour to unspecified behaviour just makes such programs less likely to blow up spectacularly. That doesn't make them correct; it makes it harder to notice that they're incorrect.

Granted, not everything listed stops at unspecified behaviour. I'm not convinced that that's a good thing, though. Even something like giving signed integer overflow unsigned semantics is pretty effectively useless. Sure, you can reasonably rely on twos-complement representation, but that doesn't change the fact that you can't represent the number six billion in a thirty-two bit integer, and it doesn't make 32-bit code that happens to depend on the arithmetic properties of the number six billion correct just because the result of multiplying two billion by three is well-defined.

Then there's portability. Strict aliasing is a good example of this. Sure, you can access an "int" aligned however you like on x86. It'll be slow, but it'll work. On MIPS, though? Well, the compiler could generate code to automate the scatter-gather process of accessing unaligned memory locations. This is C, though. It's supposed to be close to the metal; it's supposed to force-- I mean, let you do everything yourself, without hiding complexity from the programmer. How far should the language semantics be stretched to compensate for programmers' implicit, flawed mental model of the machine, and at what point do we realize that we already have much better tools for that level of abstraction?




I have a feeling you haven't read through all of the linked posts and papers.

The problem they're trying to address is that C compliers take advantage of undefined behavior for optimizations. Such optimizations can cause very strange, unintuitive behavior that is very difficult to discover. The linked posts, papers, and even this thread provide many great examples.

You're right that the programs are wrong. The goal of this proposal is to make them wrong in reasonable ways.


>The problem they're trying to address is that C compliers take advantage of undefined behavior for optimizations

That's not a problem. That's a good thing.

>Such optimizations can cause very strange, unintuitive behavior that is very difficult to discover

That's a problem -- or at least the "difficult to discover" part is. "Strange" and "unintuitive" is helpful; it's a nice, big red flag. How does migrating from undefined behaviour to producing unspecified values make bugs easier to discover? I can see how they would make results more consistent, and the bugs easier to hunt down, but that's only useful once you know the bugs are there (inconsistency is another useful red flag here), and there are already good tools like valgrind and ubsan for tracking down the source of the bug.

>The goal of this proposal is to make them wrong in reasonable ways

That isn't the purview of C, though. It's a noble goal, don't get me wrong, but stepping on the optimizer's toes and reinforcing plainly bad programming practices -- I know that's not an intended effect, but it will happen -- isn't the way to do it. A better way, just as an example, would be to give the programmer a proper mechanism for encoding preconditions and other interprocedure-analysable constraints. This doesn't reinforce bad practices, it could actually help the optimizer if done right, and would perhaps encourage programmers to reason a little more rigorously about their code -- an ounce of prevention and all that.


Please read the linked posts and papers for answers to your questions.


Is this program "almost certainly wrong":

    uint32_t bytesToUnsigned(uint8_t bytes[4]
    {
      return bytes[0] |
             bytes[1] << 8 |
             bytes[2] << 16 |
             bytes[3] << 24;
    }
The behavior is undefined on a system with 32 bit integers because of signed arithmetic overflow (despite the fact that all the explicit types involved are unsigned, a uint8_t gets promoted to a signed integer before the left-shift operation).

Right now it will work on every compiler I've tried, but it would be perfectly valid (by the ANSI specification) for a compiler to assume that the result of that function can never have the highest bit set. In friendly C, the result is well defined.


>Is this program "almost certainly wrong"

No, that one's a consequence of C's insane type system. The solution here isn't to change the semantics of signed integer arithmetic. The solution is to change integer promotion to use unsigned arithmetic like it should have done in the first place.


Not taking a position on this, and it's been a long while, but I seem to remember that the discussion in X3J11 on the issue of integer promotions, which mostly occurred before I joined, were long and heated.


Integer promotion will not help, because it may not go to a sufficiently wide unsigned type to cover the shift. (In practice it will, but unsigned int could be just 16 bits).

Promotion of unsigned chars to unsigned int would have problems of its own, mostly because unsigned arithmetic (modulo power of two arithmetic) is inappropriate for most uses, and error-prone: it has a large, silent discontinuity right next to zero.

Alas, in fact, unsigned chars can promote to unsigned int: on rare platforms like DSP's where you have sizeof (int) == 1. Sigh.


There's a simple fix to that:

    -Wsign-conversion -Werror
You've effectively just made C into a safer language by disallowing implicit conversions that change signedness.


Friendly C should also require char to be unsigned so that this code is safe:

   int my_getchar(char *foo)
   {
        if (!*foo) return -1; // Return -1 at end of string
        else return *foo; // Should never return -1
   }


More to the point, char should be unsigned so that this code is safe:

   #include <ctype.h>

   if (isdigit(str[i])) { ... }
The ctype functions require an int-valued argument which is the value of an unsigned char: it must be a positive value, or the negative value EOF.

This is an unfriendly pitfall in the language.


From the examples I'm familiar with, shifting from undefined to unspecified actually makes invalid programs _more_ likely to blow up spectacularly, because they're likely to go on and try to use the unspecified value rather than having the code path that uses it quietly excised or transformed.


You aren't aware just how many security-critical C programs invoke potentially undefined behaviour today.

Every time you write '+' in a C program between two signed values, do you do an overflow check beforehand? Do you know what it takes to write that overflow check? Or do you do a global dataflow analysis of the program to prove that it can't overflow?

Saying that 95% (at a conservative guess, IMO) of C programs out there are wrong makes for a strong argument for this proposal, not against it. Arguing for a technical correctness that is observably almost impossible to achieve in the wild is what's really pointless.


> You aren't aware just how many security-critical C programs invoke potentially undefined behaviour today.

It doesn't take much experience with C to appreciate something of the magnitude of difficulty of keeping a million lines of C UB-free. I'm not in the security field, but if the code there is anything like the codes I've worked with, I'd be surprised to find even a "security-critical" C program of any appreciable complexity that doesn't invoke undefined behaviour.

>Every time you write '+' in a C program between two signed values, do you do an overflow check beforehand?

Personally? I use signed scalar arithmetic very rarely. The bulk of the arithmetic I do is with unsigned integers (bit vectors, really) and arbitrary-precision integers. In fact, essentially the only time I do use signed scalar arithmetic is when I can determine statically that it won't overflow. Signed integer overflow is a bitch, UB optimizations or no.

>Or do you do a global dataflow analysis of the program to prove that it can't overflow?

That's the basic idea behind seL4, for example.

>Arguing for a technical correctness that is observably almost impossible to achieve in the wild is what's really pointless.

Sure, but only because everyone writes their security-critical programs in a language that couldn't more effectively impede security if it were designed to do so. Could a "friendly" dialect help? Maybe, but the proposed changes are far from sufficient to make formal verification feasible, and frankly, I don't know how seriously I can take "security-critical" without formal verification.


> Every time you write '+' in a C program between two signed values, do you do an overflow check beforehand? Do you know what it takes to write that overflow check?

But UB isn't the problem here. You can make two's complement wraparound the defined behavior, and programs which fail to check their arithmetic are very likely to be wrong.

This is already the case with unsigned overflow: it's defined to wrap around. Yet many many programs are vulnerable because they do something like malloc(nitems * size) and carry on thinking they must've gotten the wanted amount of memory.

Most of the time, you really have to do the right checks whether your arithmetic is signed or not. Whether wraparound is defined or not.


With two's complement wraparound being defined behaviour, you have easy ways of checking for wraparound - checking the signs of inputs vs outputs.

In any case, I'm bored of discussing this with people who haven't studied the problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: