I don't really see what's to be accomplished by most of the points of this. A pr...

scott_s · on Aug 27, 2014

I have a feeling you haven't read through all of the linked posts and papers.

The problem they're trying to address is that C compliers take advantage of undefined behavior for optimizations. Such optimizations can cause very strange, unintuitive behavior that is very difficult to discover. The linked posts, papers, and even this thread provide many great examples.

You're right that the programs are wrong. The goal of this proposal is to make them wrong in reasonable ways.

sjolsen · on Aug 27, 2014

>The problem they're trying to address is that C compliers take advantage of undefined behavior for optimizations

That's not a problem. That's a good thing.

>Such optimizations can cause very strange, unintuitive behavior that is very difficult to discover

That's a problem -- or at least the "difficult to discover" part is. "Strange" and "unintuitive" is helpful; it's a nice, big red flag. How does migrating from undefined behaviour to producing unspecified values make bugs easier to discover? I can see how they would make results more consistent, and the bugs easier to hunt down, but that's only useful once you know the bugs are there (inconsistency is another useful red flag here), and there are already good tools like valgrind and ubsan for tracking down the source of the bug.

>The goal of this proposal is to make them wrong in reasonable ways

That isn't the purview of C, though. It's a noble goal, don't get me wrong, but stepping on the optimizer's toes and reinforcing plainly bad programming practices -- I know that's not an intended effect, but it will happen -- isn't the way to do it. A better way, just as an example, would be to give the programmer a proper mechanism for encoding preconditions and other interprocedure-analysable constraints. This doesn't reinforce bad practices, it could actually help the optimizer if done right, and would perhaps encourage programmers to reason a little more rigorously about their code -- an ounce of prevention and all that.

scott_s · on Aug 28, 2014

Please read the linked posts and papers for answers to your questions.

aidenn0 · on Aug 27, 2014

Is this program "almost certainly wrong":

    uint32_t bytesToUnsigned(uint8_t bytes[4]
    {
      return bytes[0] |
             bytes[1] << 8 |
             bytes[2] << 16 |
             bytes[3] << 24;
    }

The behavior is undefined on a system with 32 bit integers because of signed arithmetic overflow (despite the fact that all the explicit types involved are unsigned, a uint8_t gets promoted to a signed integer before the left-shift operation).

Right now it will work on every compiler I've tried, but it would be perfectly valid (by the ANSI specification) for a compiler to assume that the result of that function can never have the highest bit set. In friendly C, the result is well defined.

sjolsen · on Aug 27, 2014

>Is this program "almost certainly wrong"

No, that one's a consequence of C's insane type system. The solution here isn't to change the semantics of signed integer arithmetic. The solution is to change integer promotion to use unsigned arithmetic like it should have done in the first place.

rootbear · on Aug 28, 2014

Not taking a position on this, and it's been a long while, but I seem to remember that the discussion in X3J11 on the issue of integer promotions, which mostly occurred before I joined, were long and heated.

kazinator · on Aug 28, 2014

Integer promotion will not help, because it may not go to a sufficiently wide unsigned type to cover the shift. (In practice it will, but unsigned int could be just 16 bits).

Promotion of unsigned chars to unsigned int would have problems of its own, mostly because unsigned arithmetic (modulo power of two arithmetic) is inappropriate for most uses, and error-prone: it has a large, silent discontinuity right next to zero.

Alas, in fact, unsigned chars can promote to unsigned int: on rare platforms like DSP's where you have sizeof (int) == 1. Sigh.

cygx · on Aug 28, 2014

There's a simple fix to that:

    -Wsign-conversion -Werror

You've effectively just made C into a safer language by disallowing implicit conversions that change signedness.

jhallenworld · on Aug 28, 2014

Friendly C should also require char to be unsigned so that this code is safe:

   int my_getchar(char *foo)
   {
        if (!*foo) return -1; // Return -1 at end of string
        else return *foo; // Should never return -1
   }

kazinator · on Aug 28, 2014

More to the point, char should be unsigned so that this code is safe:

   #include <ctype.h>

   if (isdigit(str[i])) { ... }

The ctype functions require an int-valued argument which is the value of an unsigned char: it must be a positive value, or the negative value EOF.

This is an unfriendly pitfall in the language.

djur · on Aug 28, 2014

From the examples I'm familiar with, shifting from undefined to unspecified actually makes invalid programs _more_ likely to blow up spectacularly, because they're likely to go on and try to use the unspecified value rather than having the code path that uses it quietly excised or transformed.

barrkel · on Aug 28, 2014

You aren't aware just how many security-critical C programs invoke potentially undefined behaviour today.

Every time you write '+' in a C program between two signed values, do you do an overflow check beforehand? Do you know what it takes to write that overflow check? Or do you do a global dataflow analysis of the program to prove that it can't overflow?

Saying that 95% (at a conservative guess, IMO) of C programs out there are wrong makes for a strong argument for this proposal, not against it. Arguing for a technical correctness that is observably almost impossible to achieve in the wild is what's really pointless.

sjolsen · on Aug 28, 2014

> You aren't aware just how many security-critical C programs invoke potentially undefined behaviour today.

It doesn't take much experience with C to appreciate something of the magnitude of difficulty of keeping a million lines of C UB-free. I'm not in the security field, but if the code there is anything like the codes I've worked with, I'd be surprised to find even a "security-critical" C program of any appreciable complexity that doesn't invoke undefined behaviour.

>Every time you write '+' in a C program between two signed values, do you do an overflow check beforehand?

Personally? I use signed scalar arithmetic very rarely. The bulk of the arithmetic I do is with unsigned integers (bit vectors, really) and arbitrary-precision integers. In fact, essentially the only time I do use signed scalar arithmetic is when I can determine statically that it won't overflow. Signed integer overflow is a bitch, UB optimizations or no.

>Or do you do a global dataflow analysis of the program to prove that it can't overflow?

That's the basic idea behind seL4, for example.

>Arguing for a technical correctness that is observably almost impossible to achieve in the wild is what's really pointless.

Sure, but only because everyone writes their security-critical programs in a language that couldn't more effectively impede security if it were designed to do so. Could a "friendly" dialect help? Maybe, but the proposed changes are far from sufficient to make formal verification feasible, and frankly, I don't know how seriously I can take "security-critical" without formal verification.

clarry · on Aug 28, 2014

> Every time you write '+' in a C program between two signed values, do you do an overflow check beforehand? Do you know what it takes to write that overflow check?

But UB isn't the problem here. You can make two's complement wraparound the defined behavior, and programs which fail to check their arithmetic are very likely to be wrong.

This is already the case with unsigned overflow: it's defined to wrap around. Yet many many programs are vulnerable because they do something like malloc(nitems * size) and carry on thinking they must've gotten the wanted amount of memory.

Most of the time, you really have to do the right checks whether your arithmetic is signed or not. Whether wraparound is defined or not.

barrkel · on Sept 6, 2014

With two's complement wraparound being defined behaviour, you have easy ways of checking for wraparound - checking the signs of inputs vs outputs.

In any case, I'm bored of discussing this with people who haven't studied the problem.