Hacker News new | past | comments | ask | show | jobs | submit login
Depressing and faintly terrifying days for the C standard [pdf] (yodaiken.com)
209 points by signa11 on May 30, 2018 | hide | past | favorite | 270 comments



At Microsoft the compiler team (Visual C++) and the Windows team are joined at the hip. I'm sure the same was true at Sun. This can lead to good decisions about undefined behavior that I hope would make Linus smile.

I recently learned of one such good engineering decision (I hope I'm remembering it correctly). Let's say you have a struct with an int32 and a byte in it. That's 5 bytes, right? But the platform alignment is a multiple of 4 bytes, so there's 3 bytes of padding (sizeof the struct is 8 bytes). If we stack-allocate an array of 11 of these and zero-initialize with = { 0 }, what would you expect to see in memory after initialization?

It turns out the answer was that the first element of the array would have its 5 bytes zeroed, but the 3 bytes of padding would be left uninitialized. Then, the remaining 10 elements of the array would be zeroed with a memset that actually zeroed all 80 remaining bytes. It sounds weird but this is a legal thing to do from the standard's perspective. All they're obligated to zero out are the non-padding bytes. This UB was leading to disclosure of little bits of kernel memory back into user mode because Windows engineers assumed that = { 0 } was the same as leaving the variable uninitialized and then memsetting the whole thing to zero. Nope!

The compiler team fixed this by always zeroing out padding too. Problem solved. There are some cases where it's not quite as fast. But it's the right engineering decision by the compiler team for their customers, both internal and external.


Minor nitpick, the zeroing (or lack thereof) of the padding is not undefined behavior, it's unspecified behavior. Undefined behavior and unspecified behavior often look and perhaps behave the same to the programmer, but have semantic differences. In the face of undefined behavior, the compiler is allowed to do pretty much anything it wants (including formatting your hard drive and/or launching the nukes). With unspecified behavior, the compiler implementer must make a conscious decision on what the behavior will be and document the behavior it will follow.


> With unspecified behavior, the compiler implementer must make a conscious decision on what the behavior will be and document the behavior it will follow.

No, what you described is implementation-defined behavior.

It may be confusing, but here's the breakdown of different kinds of behavior in the C standard:

* Well-defined: there is a set of semantics that is defined by the C abstract machine that every implementation must (appear to) execute exactly. Example: the result of a[b].

* Implementation-defined: the compiler has a choice of what it may implement for semantics, and it must document the choice it makes. Example: the size (in bits and chars) of 'int', the signedness of 'char'.

* Unspecified: the compiler has a choice of what it may implement for semantics, but the compiler is not required to document the choice, nor is it required to make the same choice in all circumstances. Example: the order of evaluation of a + b.

* Undefined: the compiler is not required to maintain any observable semantics of a program that executes undefined behavior (key point: undefined behavior is a dynamic property related to an execution trace, not a static property of the source code). Example: dereferencing a null pointer.


Nice comment! Here are the excerpts from n1570.pdf[1] with some punctuation added by me to compensate for the limited formatting support on this forum:

§3.4.0: behavior: external appearance or action

§3.4.1: implementation-defined behavior: unspecified behavior where each implementation documents how the choice is made. EXAMPLE: An example of implementation-defined behavior is the propagation of the high-order bit when a signed integer is shifted right.

§3.4.2: locale-specific behavior: behavior that depends on local conventions of nationality, culture, and language that each implementation documents. EXAMPLE: An example of locale-specific behavior is whether the islower function returns true for characters other than the 26 lowercase Latin letters.

§3.4.3: undefined behavior: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements. NOTE: Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). EXAMPLE: An example of undefined behavior is the behavior on integer overflow.

§3.4.4: unspecified behavior: use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance. EXAMPLE: An example of unspecified behavior is the order in which the arguments to a function are evaluated.

[1]: WG14 working paper for the C11 standard: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf


Thank you kind sir, you win the prize for most coherent and informative technical snippet in the world. On this day anyway.


Accessing uninitialised memory is undefined. As in Nasal Demon undefined. If you don't zero out the padding, I'm sure there's a clever way to access those bytes in ways that invoke undefined behaviour.


The padding has unspecified values, which is distinct from being uninitialised.

If it were otherwise, you couldn't `memcpy()` structures around.


I believe there's an exception for the likes of `memcpy()`. Something along the lines of "type punning and reading indeterminate values is undefined except when we're reading through a `char*` or something.

I'll check the unbelievably thick book that tries to specify C11 (I've printed it, it's over 2 pounds).


The only way to access the padding without otherwise falling into undefined behaviour is by using a char * anyway (including indirectly using a char *, through memcpy()).


You say that as if it’s entirely the responsibility of the programmer to avoid these bear traps that have been left lying around everywhere.

Why not just have the compiler zero the memory, and thereby remove the trap? Seems very sensible to me. Do you think it’s a bad idea, and if so, why?


There is a concern for performance. But that's no reason. Zero initing could be default behavior that can be declared away. E.g.

As a type qualifier keyword:

  { int x, y; /* x and y are zero */ }

  { int noinit x, y; /* x is indeterminate, y is zero */ }
Or as a declaration specifier:

  { noinit x, y; /* both x, y indeterminately-valued */ }
Or a special constant for suppressing zero initialization:

  { int x, y = noinit; /* x zero, y indeterminate */ }
Similarly, unspecified order of evaluation could be supported by explicit request:

  decl (unspec_order) { /* comma-separated list of decl items */
     a[i] = i++; /* UB */
  }

  a[i] = i++; /* well-defined */


zero initing is default behavior for static values or structs.

    static int x, y; /* x and y are zero */


Good idea!


> You say that as if it’s entirely the responsibility of the programmer to avoid these bear traps that have been left lying around everywhere.

Oh no no no, I was picking on hermitdev's characterisation of the behaviour. Sure, what the compiler does with the padding bytes is unspecified. But it can still lead to undefined behaviour, if some unwary user ever reads them. A misinterpretation like that, and poof you get a security vulnerability. The C standard is insane.


Aha, sorry, I misunderstood! Glad to hear you’re on the side of sanity. :)


It's more than a logical or performance issue, it's a cultural one. In C culture, there's a sense that the programmer has direct control over the hardware. They can literally write or read to any memory address as they see fit, and have very fine-grained control over what the machine is doing.

Modern processors, with their out-of-order execution, complex caching algorithms, deep pipeliens and multi-threaded hardware often render this sense of control more illusory than factual, but the C culture remains wedded to the idea that the programmer is in control.

Accordingly, it's pretty provocative to suggest that a compiler or runtime would zero out memory without you specifically saying so. Any C programmer can overload malloc (doesn't zero memory) to calloc (which does). Whether it's a good or bad idea to do so is up to the programmer. The overall idea is don't do anything unless I say so.


Sane compilers should do that. The standard should eventually specify that. But before it does, you can not write portable code that expect that (but hopefully once enough compilers are sane but before the standard is updated, you can write code that targets only the compilers, and don't give a fuck about the other broken garbage that try to trap the world)


if you reference a null pointer then this behavior is 'undefined' from the perspective of the C compiler - however it is very well defined by the operating system (if you are doing a user space application).

No nukes get involved here - only core dumps.


If you are working on top of an operating system that launches nukes upon null pointer access, then you should consider to switch vendors.


> and document the behavior it will follow.

So it's unspecified in terms of the standard, but specified by the implementation


Right! Basically it's up to the compiler programmer to pick a path and follow it... assuming you're talking about implementation-defined.


The problem with this kind of approach is over time it removes the ability to use any other implementation. You are no longer using C, you are using <Implementation>C.

This becomes a problem when a different implementation adds amazing tools for finding bugs (the sanitizer suite, for instance), and you can't use them because your code doesn't build with any other implementation.


Large and complex pieces of software, and operating systems in particular, tend to be tightly tied to their compilers. It is never easy and in some cases practically impossible to port to a different compiler. I expect that Microsoft has come to terms with the fact that Windows will only support being compiled by their compiler.

When a different toolchain introduces a new feature for finding bugs that would be useful for Windows, the Microsoft compiler team can add that functionality to their own tools instead of porting Windows. An advantage of this is that they can customize the feature for exactly their use case. Yes, this is the definition of NIH syndrome, but that’s how large companies work.


Large pieces of sw have been able to switch some plateform specific code to other compilers (chrome for windows comes to mind).

This is probably way smaller than the whole Windows, but I would not be surprised if some MS dev are already internally compiling some of their components with clang for their own dev/testing (even if just for extra warnings, etc.)

And a major part of the work of the MSVC team today seems to be about standard compliance.

But yes, I do not really expect that they switch, and actually they probably don't even have the beginning of a serious reason to do so. This is not even a case of NIH. Their compiler derives from an ancient codebase and has been continuously maintained for several decades. They "invented" it. The only modern serious competition (that cares enough about Windows compat and some of their specific techs) has been started way after... They probably also have all kind of patents and whatnot about some security mitigations that are implemented by collaboration between the generated code and (the most modern versions of) low level parts of the Windows platform.


> . You are no longer using C, you are using <Implementation>C.

If you go deep and gnarly enough with your system, this is always the case, I'm afraid. There's a famous talk at Stanford by a Coverity... founder? consultant? about this.


> You are no longer using C, you are using <Implementation>C.

This is always the case because the C standard is only a partial spec. There's no such thing as "a program written in C(++)", there's only "a program written in <Implementation> C(++)". If you compile your program with a different implementation, then it's a different program. It may work, but it may not.

What I'd really like to see is a modified version of the C/C++ standards which keeps the language the same "in spirit" but removes all undefined and implementation-dependent behaviour. This would give compiler writers a stationary (or at least slower-moving) target to aim for and make it possible for C to be portable in theory as well as (sorta) in practice.


However in this case 'Standard C' is broken and <Implementation C> is not. The proper thing is to fix the standard to require padding bytes be zero'd.


It's easy to see both sides here. The overwhelming majority of code is not written to be run a in a kernel at the kernel-user security boundary, so for most code paying something to initialize padding might not be a great tradeoff. That is, the vast majority of code is running within a single security domain and doesn't need to protect itself from itself.

Still, avoiding initializing padding is probably not a great example of a performance win through standards exploitation: in the example given it's not clear why you'd not just do one 8 byte zeroing write to cover the whole structure, rather than apparently splitting it into a 4-byte and a 1-byte write. Perhaps this was 32-bit code, where 8-byte write are slightly trickier, but even two 4-byte writes are likely to perform just as well or better. Probably it's just bad codegen to treat the initialization of the struct[11] in two parts: a single struct and a 10-member array.


Luckily clang does a good job at being GCC compatible, so I don't worry too much about using GCC extensions. It's quite unlikely one of them will go away anytime soon, and theory pretty much cover all architectures/platforms that have ever existed.


Usually true, unless you want to target embedded, mainframes or some industrial OSes.


It doesn't support everything; for example, Clang doesn't do nested functions.


This was pretty much how writting C code between K&R, ANSI C89 and subsets e.g. Small-C used to look like.


I fail to see how implementing an unspecified part of the standard in a way which _doesn't_ leak kernel memory could ever be a problem.


It's not a problem for the compiler, of course.

The problem is the C code that relies on it: effectively you are using a dialect of C which gives stronger guarantees, so you lose the ability to use any other implementation which doesn't provide those guarantees.


Yeah, I (somehow...) missed the part where the GP explained that the kernel dev's fix was to just rely on the now updated unspecified behavior. I assumed the changed the compiler, but he also memset the structure before sending.


>The problem is the C code that relies on it: effectively you are using a dialect of C which gives stronger guarantees, so you lose the ability to use any other implementation which doesn't provide those guarantees.

How do you lose it in this case? You shouldn't been reading values from those padding bits anyway...


Tell that to a hacker. The problem here isn't that the code is functionally dependent on padding bytes, it's that when you copy those padding bytes around you are leaking information that you probably never meant to copy.

This can be problematic if you are copying kernel space memory to a user space process, for example. Let's say there's a call into the kernel that returns a copy of this 4+1 struct with three more bytes of padding. Maybe what was on the stack before the space was assigned to those last padding bytes are some information the kernel definitely shouldn't leak to user space, like some bytes of a password, and now any user space process could potentially read them simply by calling some unrelated kernel function.


It’s great that the compiler team could implement safer behaviour. But if the programmers’ intent was to zero all bits in the array they should express this explicitly in the code with a memset(). Otherwise a change in the compiler later could throw up this vulnerability again. The code should express the semantics as clearly as possible.


Then other other language lawyers will come around and tell you why you should use {0} instead of memset (e.g. because for some combinations of type and architecture the zero value isn't full of zero bytes).

This example also shows how "the semantics" is a fiddly concept. The reason the standard allows leaving bytes unzeroed is because they are not "semantically" important. But they actually do matter.

The problem with the mentality that it's always the programmer fault for not following "the rules" is they you eventually get to the point where the rules allow for no good solutions at all.


I didn't say it was always the programmers' fault. But I believe the programmer should express their intent as clearly as possible. And memset( (void*)buf, 0, buflen ) says fill-a-contiguous-array-of-unsigned-chars-of-value-zero, which is semantically different from initializing an array of structs that may have padding, and better matches the programmers' intent. It doesn't matter if the zero value is all 0 bits or not - the important thing is that the whole contiguous memory region is zeroed.

I believe C99 says chars and unsigned chars have no padding.

https://stackoverflow.com/questions/13929462/can-the-unsigne...


Another, better, response to your argument. Initializing an array of structs with = {0} does NOT tell the compiler that zeroing the entire contiguous chunk of memory matters - only that each struct should be zeroed. While memset( (void*)buf, 0, len) does tell the compiler that the entire contiguous memory chunk must be zeroed, which is what is intended.

Language lawyers need not apply.


It may be the right thing for the moment, but what happens to that code when new versions of the compiler come out? Someday it'll start leaking information again. Relying on non-standard behavior always ends in tears.


>Relying on non-standard behavior always ends in tears.

Who's relying on unspecified behavior? Do you mean some theoretical programmer? Well of course that's not a good thing, but I don't see what it has to do with the situation in the story and the MS engineers decideing to change how they implement that unspecified part of the spec.


The kernel programmer did, in this story.

An example of such an information leak is:

  struct foo {
    int a;
    char b;
  }

  void send_foo_01() {
    foo x {0, 1};
    write(fd, &x, sizeof(foo));
  }
which sends 3 bytes of the contents of the stack memory over the network, for almost every compiler except the one in the story. Running in a kernel, that could contain secrets.


Sorry, my mistake; yes, the kernel dev is now relying on the compiler to zero out those three bytes. I understand the decision to change this in the compiler, but I think the "fix" would have been to memset in the kernel code. I'd be surprised if they didn't do that, but maybe they can reasonably assume they'll never use another compiler to build the Windows kernel.


Signed integer overflow. It’s undefined because some platforms don’t use two’s complement.

Checking it after the fact is non-portable.


But not in sane compilers, they are better than the standard.

gcc, clang, icc all assume two's complement and happily overflow via -fwrapv. There's no such compiler on the Unisys anymore. They rather emulate two's complement. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm


Thankfully, not for very much longer...

http://wg21.link/P0907r2

C++, but apparently something similar is being considered for C as well.

https://twitter.com/jfbastien/status/989242576598327296?lang...


Isn't that mostly theoretical? I can't seem to find any processor that doesn't use 2's complement.


There are some old architectures that used one's complemeny and there are probably some still in service e.g.

https://en.m.wikipedia.org/wiki/UNIVAC_1100/2200_series


I had the pleasure of programming on these. Having both positive and negative zeros is special and that they don't compare equal is extra special. And yet -- ones complement arithmetic is not even the oddest thing about Univac 1100s. I think the Univac corporate values statement must have included both "Dare to be different" and "Remember your past".


Is anybody writing code for them? Probably not porting modern C anyway.

I assert this 'it might not be two's-complement' excuse is nonsense.


I seriously doubt it.

I do agree with you, it's not a big deal for most devs. The only place I'd worry about it is systems where it might be expected to be in service for many years or intended to be ported frequently. In that case, you never know what sort of crazy architecture you may end up with in the future.

There's always something a bit icky with not sticking to the spec, but in the grand scheme of things relying on two's complement is not a big deal.


from the proposal http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm

"Nowadays Unisys emulates their old architecture using x86 CPUs with attached FPGAs for customers who have legacy applications which they’ve been unable to migrate. These applications are unlikely to be well served by modern C++, signed integers are the least of their problem. Post-modern C++ should focus on serving its existing users well, and incoming users should be blissfully unaware of integer esoterica."


Presumably the compiler team wrote a test for that behavior so there won’t be a regression. I assume that Windows isn’t planning on switching compilers, so they can rely on that behavior as long as the Microsoft compiler passes all its tests.


Well, MSVC team is the last one who can propose something since they didn't even support C11 completely today, and weren't able to add C99 support for almost two decades.


They still don't support C99 fully.


They do, to the extent required by ANSI C++14, ANSI C++17 upgraded the requirements to C11.

They are pretty open the future of systems programming on Windows is C++ and .NET Native.

For C devs, they have helped clang to work on Windows and there is WSL for UNIX like software.

Just like they have improved Visual Studio integration with clang and gcc.

As for Visual C++, the name says it all.


Now that Microsoft is having again a new found love in C, either via Checked C research, the UDMF rewrite in C, WSL, or the upcoming Azure Sphere product, it would be interesting if they could contribute to improving the whole security story in C2X.


I don't think they are really that much in love with C aside from a few niche areas. The compiler itself still generates worse code when in C mode as it only exists for back-compat. When I worked there it was explicitly said that the new "universal" (or whatever they are called now) APIs were explicitly not targeting C. C++ on the other hand seemed to be undergoing a new renaissance. They have added some newer C features but only those required by the C++ standard.


Quite true, when speaking about Visual C++ and I have also stated the same thing multiple times here.

Their C love on the projects I mentioned, is via clang and gcc.


> Problem solved.

This "solution" also prevents code analysis tools from detecting reads from padding bytes as "non-initialized memory reads".

When a routine leaves sensitive data on the stack, the entire area used up by that sensitive data, must be wiped out before the routine returns (and still, wiping out padding bytes would not be required). How about the part of the stack space not covered by that array of 11 elements?


The kernel has a separate stack, inaccessible to user-space.

Otherwise, you'd be right: a shared stack would be a giant source of information leakage from kernel space to user-space unless it was very carefully managed (probably at a significant performance cost). Thus, separate stacks (it also has the advantage of not needing to make assumptions about how user-mode programs use their stack, e.g., if they are transiently using "unallocated" stack above rsp, etc).

Probably what happened here is that this structure was copied back to user space (e.g., as the result of a system call) exposing the kernel data.


At Sun Solaris and Studio were quite separated organizationally (and business-wise too). Mind you, the Solaris engineering group had early access to Studio and also, of course, access to the Studio team, so... not quite joined at the hip, but not too separated either.


In C11, the spec was changed to make zeroing out padding mandatory.


> This UB was leading to disclosure of little bits of kernel memory back into user mode because Windows engineers assumed that = { 0 } was the same as leaving the variable uninitialized and then memsetting the whole thing to zero

But what on earth were they doing with the padding bits?


They aren't doing anything with the padding bytes. But what probably happened was they copied the array into userland memory. Which potentially allows a malicious sandboxed program to read the padding bytes that contain bits of kernal memory.


Total guess, but they could have been including them in a hash function or struct equality check (equality with memcmp, hash by just grabbing bytes, etc.).

That would not have gone well with uninitialized padding :-)


It's not uncommon for a function to use a supplied output buffer as scratch space. So the padding could have contained pretty much anything.


I'm skeptical that it was implemented that way because of kernel memory leaking concerns.

More likely it was done to prevent Windows API calls from panicking when they accessed unset parameter structures.

As a former Windows programmer, that was one of the largest sources of errors back in the Win32 days.


But you don't access the padding; it's generally implicitly added by the compiler to comply with the ABI rules for the platform. That argument might make sense if this was about zeroing _all_ stack objects not explicitly initialised, but it's explicitly talking about the padding.


> This UB was leading to disclosure of little bits of kernel memory back into user mode

If you write inline assembler, you can access this stuff anyway. So I'm not seeing what the value is in zeroing it by the caller. The kernel callee should zero its stack frame before returning.


How so? The kernel presumably has a separate stack which is not accessible to user-space, but here information was disclosed because a structure copied back to user space was create on the stack, initialized to {0} and then member-wise assigned, with some padding bytes never being touched and thus containing whatever previous values happened to be on the kernel stack. So far this is all in kernel-space so nothing been exposed yet.

Then, however, if this structure is copied back to user-space, e.g,. as an output parameter of a kernel call, the padding bytes with the exposed data will be copied along with it (unless you get lucky and the copy routine happens to make the exact same decision with regard to padding handling).

If the kernel stack _itself_ was visible to user-space, you'd have a whole separate set of problems: you'd have to zero the whole stack (or at least the extent of the stack that could have been touched) on every kernel call.


Yes, you're right.


Like phkahler, I don't want C to grow any more. I've been a C programmer for a long time, the vast majority of my day-to-day work is still in a C codebase, and I expect to continue working in C for a while yet. Nonetheless, its time has passed. It will be around for a while, just like FORTRAN and COBOL are, but there's no good reason for new code to be written at that poor level of abstraction. Even for systems software - what I write - there are always better choices that provide higher-level data and control structures. They variously use garbage collection, reference counting, ownership rules, or whatever you call that hot mess C++ has. Writing safe, secure C code is certainly possible, but it's too much unnecessary work - especially in the concurrent and/or parallel world that any non-trivial code has to live in nowadays.

That said, I really wish proponents of other languages would get their stuff together about creating libraries that can be used from other languages. A library written in C can be used by anyone else. Many other languages are avid consumers of this functionality, many advertise it as a key feature, but very few return the favor by producing reusable code. The industry doesn't need such Balkanization. If you're one of the very many people who look down their noses at C and want to get rid of it, do your part.


> I really wish proponents of other languages

Your wish is my command! That's the purpose of D in betterC mode. You can draw an arbitrary line through your C project, and implement one side in DasBetterC, and it'll work just fine.

In fact, I've been considering reimplementing the C standard library for the Digital Mars C compiler in D!


Thank you, Walter. For everything.


welcs!


> That said, I really wish proponents of other languages would get their > stuff together about creating libraries that can be used from other languages. > A library written in C can be used by anyone else. Many other languages are > avid consumers of this functionality, many advertise it as a key feature, but > very few return the favor by producing reusable code.

I kind of agree with you on this. I have a bidirectional binding process in my hobbes project at Morgan Stanley (https://github.com/Morgan-Stanley/hobbes) such that C/C++ functions can be viewed as hobbes functions and hobbes functions can be viewed as C/C++ functions (without translation).

This works as far as C/C++ types are adequate for your programming language, which obviously does cover a huge space, but there are types that don't translate well and there are staging considerations that don't translate either (e.g. there is a logical type that can be decided, but it's not decided until what might be regarded as "run time").

There are interesting problems to be considered in this area though, happy to discuss further out of band if anyone is interested (discussions on hacker news tend to be short-lived and shallow IME).


>A library written in C can be used by anyone else.

This is because of how much of the unix clone ecosystem has been built around C workflows, and this wasn't true on Windows until linux compatibility was developed on it.

>If you're one of the very many people who look down their noses at C and want to get rid of it, do your part.

convincing linus and sysadmin greybeards to modernize linux and is no small task, until then we'll all still be just be scripting over archaic C apis

"science progresses one funeral at a time"


> This is because of how much of the unix clone ecosystem has been built around C workflows

Absolutely correct.

> convincing linus and sysadmin greybeards to modernize linux

That's not what I'm suggesting. Anything that's already written in C can and probably should continue to be so. What I'm suggesting is that people who prefer to work in other languages should have an easy way to make their work available beyond their own language community. I'm not talking about interpreted/scripting languages here. I'm talking about compiled/systems languages. Stuff that gets linked together, or that should be able to use some sort of dlopen/FFI back and forth fluidly despite multiple languages being involved. There's some work to be done there, but everyone seems to prefer hiding in their own language bunker instead of reaching out to others.


The problem is that other languages necessarily place more restrictions on how their data can be used in order to gain all their nice features. Since you can't control the caller you lose all those guarantees. Because of that it will never be easy to interop across higher level languages. C works well here because its low level enough that it expects few guarantees. Just keep the stack aligned and balanced and it will mostly be happy.


There are solutions, but only at platform level.

JVM, CLR, COM, UWP, DEX, TIMI, ILE are all possible approaches with various degrees of success and most of them also have C implementations available.

It is almost impossible to get some kind universal ABI between OSes and languages without an extra level of indirection.


>>> This is because of how much of the unix clone ecosystem has been built around C workflows, and this wasn't true on Windows until linux compatibility was developed on it.

Windows is almost entirely coded in C. The Windows C API is one of the most depended upon API with the largest codebase and the longest documentation in existence.


Kind of.

After Longhorn's failure, they started to focus on COM to be the future of Windows APIs, leading to UWP.

And if masochists can implement COM in C, they need to be quite pain resistant for the UWP update.

In the meantime, the new C runtime was rewritten in C++, using extern "C" entry points.

Windows drivers can be written in C++ since Windows 8, and since then they have been migrating the code to be compilable as the C subset from C++ as well.


I meant the base Windows API. The MFC and COM were built to abstract it but never really took off.

The low level API have been fixed for a very very long time. It doesn't really need to change, opening a file is not different now than a decade ago. It gets no love and no popularity, but it runs the world.

Most development is done in higher languages nowadays, typically C#, java or python and all these languages rely on the Windows API to run. They are abstractions of C.

P.S. Drivers could already be written in C++ since Windows XP. There were a lot of constraints though because of running in the kernel.


> The MFC and COM were built to abstract it but never really took off.

MFC was the way to write real Windows applications until .NET came around.

Visual Basic was mostly used by not so skilled developers, creating applications that IT department had to take care of, sometimes rewriting them into MFC ones.

Delphi and C++ Builder were the only solid alternatives outside Microsoft world, but thanks to the way they drove prices upwards and the identity crisis from Borland, most developers went away to Microsoft products, as the platform is always a safer better for development tools.

As I mentioned, since Windows Vista all new API are COM, and UWP, which is the future of the platform is COM as well.

UWP is what .NET was supposed to be initially, COM+ Runtime, just that UWP uses .NET metadata instead of COM type libraries and allows for real instances, not only interfaces.

COM is a first class type in .NET given its original design, many of the .NET APIs are COM objects underneath, including the whole CLR native APIs.

Drivers written in C++ on XP only by adventurous coders with their own compilers or hacks somehow, as Visual C++ only supports kernel mode since Windows 8.

> It doesn't really need to change, opening a file is not different now than a decade ago

Actually it has changed.

OpenFile() has been deprecated and replaced by CreateFile(), which was superseded by CreateFile2().

And on Windows 10, CreateFileFromApp() and CreateFile2FromApp() should be used instead, otherwise the application won't run from the store.

And if you want to use any of the goodies from Windows 8 onwards, they are only available as UWP APIs.


> on Windows 10, CreateFileFromApp() and CreateFile2FromApp() should be used instead, otherwise the application won't run from the store.

So?

If you dislike the app store monopolies, it's stupid to herd your users into the store. Tell your users that your app is too sophisticated to run in dummy mode. Bonus: Sell to all the Win7 holdouts.

If you love app stores (monopoly rents be damned), you rank the store ecosystems and then never write a Windows version.


You missed the part that given how things turned out, thanks to Project Centennial, there is an ongoing process to bring store containers to whole user space, MSIX.

So regardless of Win32 or UWP, everything will eventually be sandboxed.

Windows 7 is the new XP.


> Windows 7 is the new XP

Win7 is a focal point of some importance: It's where developers who don't want to be pushed into app store indentured servitude join up with the Windows compatible systems, Wine and ReactOS.

Everything before Win7 is not a focal point because it's too old. Everything after Win7 is not a focal point because nobody can handle the update threadmill. By elimination, it's Win7.

Win7 is not the new XP; it's the future of stability-oriented Windows compatible computing.


CreateFile() is not deprecated. https://msdn.microsoft.com/en-us/library/windows/desktop/aa3...

OpenFile() could be considered deprecated but the official documentation doesn't mark it as deprecated.

You missed the point that these API have been stable for decades and software will continue to work without any change.


> CreateFile() is not deprecated. https://msdn.microsoft.com/en-us/library/windows/desktop/aa3....

"OpenFile() has been deprecated and replaced by CreateFile(), which was superseded by CreateFile2()."

I never said CreateFile() was deprecated. Superseded is not the same thing.

> OpenFile() could be considered deprecated but the official documentation doesn't mark it as deprecated.

"Note This function has limited capabilities and is not recommended. For new application development, use the CreateFile function."

Looks pretty much an euphemism for deprecated.

> You missed the point that these API have been stable for decades and software will continue to work without any change.

Not when Windows puts a container around what they can see.


You don't need anybody's permission to do this. Linux has a language-agnostic system call interface. Everything that happens on the computer is accomplished through that interface, and to use it all you need to do is put some values in specific registers and issue a special instruction. This isn't the domain of C; the JIT compiler could have a special system call generation feature. You could get rid of GNU and write your own user space in Lisp if you wanted to.



> This is because of how much of the unix clone ecosystem has been built around C workflows, and this wasn't true on Windows until linux compatibility was developed on it.

Because C (and its close relatives like C++) is still the only language suitable for systems development.

Just booting any other language on bare metal is enough of an achievement that people loudly brag when they get it to just barely work. If you get it to work in C then nobody is surprised, because C is engineered for systems. Other languages aren't.

> convincing linus and sysadmin greybeards to modernize linux and is no small task, until then we'll all still be just be scripting over archaic C apis

I'm not sure replacing C with a worse language constitutes "modernizing".


If there were better options, you and others would use them. You don't use anything else so C is the best you've got.

Most of your last paragraph only amplifies what I just said. If there was another language that bested C, people would use it, but they don't.


I also think that the committee is out of touch. C99 was an awesome improvement on the language, and since then, it went downhill. We don't need the extra weird C11 syntax things, duplicating of existing libraries; we want tools that scope C better, or extensions that have been proven helpful and stable (one such example is the switch(x) { case A...B: } from gcc!).

I want strict boundary checking, I want an array base type that can't be cast as a pointer. I want some sort of scoping mechanism (ie, blocks), I want a bit of standardisation of memory barrier and such. I want #pragma once FFS -- it was proven a good idea for 25 years.

Basically there's tons of stuff that could help make the language better -- C99 did that; C99 is a masterpiece for example on how you can statically initialise extremely complex data into a single block, without having to use code. It's used all over the Linux kernel (amongst other thing; for example my own simavr is heavily based on that feature [0]).

* Standardise the stupid bit order in bitfield declaration FFS. I've been wanting to use that feature for 30 years and I can't because they 'forgot' to make up their mind!

* Coroutine standardisation would be awesome (stack swap primitives, with boundary checks etc)....

* gcc 'sub functions' (or a derivative) would be awesome if improved to make them safe.

* Reference counted allocator (basically, get libtalloc and roll it in [1])

There are so many things that could be improved, without diverging into weird stuff nobody needs (complex math anyone??!?!).

* In fact I want SIMD. I don't need these complex types.

[0]: https://github.com/buserror/simavr/blob/master/simavr/cores/...

[1]: https://talloc.samba.org/talloc/doc/html/index.html


I can’t say much for most of your wish list, but I believe C11’s thread model/atomic operations include a standard memory barrier ( http://en.cppreference.com/w/c/atomic/atomic_thread_fence ).


Honest question. You keep expecting all of these to be readily available for you in C (as part of the standard). Why don't you, instead, just a use a language/ecosystem which already offers all (or most) of these for you today? (ie. D/Go/Rust/Nim)

> "* Reference counted allocator (basically, get libtalloc and roll it in [1])"

Why and how should that be standardised exactly? Memory allocation is platform-dependent, hardware-dependent and generally case-specific. malloc() and free() are the lowest common denominators the standard can assume, anything beyond that is simply restrictive. If you need a "reference counted allocator", why not just find/implement one that simply suits your needs?

> "* Coroutine standardisation would be awesome (stack swap primitives, with boundary checks etc)...."

Again, what makes you think this can be standardised across the infinite span of platforms and compilation-targets, where C is often used?

> "* In fact I want SIMD. I don't need these complex types."

I'm not following your point. You're simply asking for a better abstraction for SIMD. Also, as I'm sure you're well aware, SIMD is not available everywhere. Wherever available, you have clear instruction-set APIs/ABIs you need to follow to make it work. What else is missing?


I don't see your point. There's tons of stuff in C11 for example that is not applicable to a vast majority of where C is used. Even in C99 basic stuff as 'floating point' or 'malloc' is not available on many hardware, that doesn't stop having a standard way of using them /when applicable/.

I know there are traps to fall into -- when I see people writing floating point code on an 8 bit AVR, I cringe, but well, 'it works'.

As far as changing language, you just answered your own question by mentioning 4 of the myriad of them that aren't ported on as many platform as C, requires runtime of unknown quality, and also requires a body of developers that... doesn't exist.

I've had a long enough time in the industry to have seen quite a few times a whole bunch of software done by someone who was following the fancy trendy language of the day, and required a complete rewrite in... C to be able to move on from it.

Heck, I've done similarly as well, done 20+ years of C++, gradually trying to scope down the subset of what I was using to then realize I just might be better of with plain C -- and magic happened -- stuff still compile/work years after they were made... And anyone/everyone can just dive in and use the codebase.


Looking at C2X list, I bet you aren't getting any of those wishes.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/PreBrno2018....

I am with you on strict boundary checking and array base types.


> weird stuff nobody needs (complex math anyone??!?!).

in my work (signal processing) I use complex math in C every day, and native language support for complex numbers is the single most important thing in my choice of C over other languages

> I want an array base type that can't be cast as a pointer

just out of curiosity? why would you like that?

I mostly agree with the spirit of your comments (apart from neglecting the importance of complex math). I would add the following:

* better stack support (e.g. a standard way know whether a VLA fits)

* closures (being able to return a pointer to a local function)


> in my work (signal processing) I use complex math in C every day, and native language support for complex numbers is the single most important thing in my choice of C over other languages

Fair enough -- but I think you are in a fairly corner market -- personally I think I used complex math /twice/ over the last 30+ years... I used SIMD extensively tho, from Altivec onward!

>> I want an array base type that can't be cast as a pointer

> just out of curiosity? why would you like that?

For bound checking. In C right now if you declare a char blob[4] and pass it as parameter to anything, it's passed down as a char * mostly -- and that function can clobber whatever it likes. An array type would propagate the size down so bound checking can be done at for the whole lifetime of the array.

> * better stack support (e.g. a standard way know whether a VLA fits)

Yes, anything involving stack requires serious dirty hacking -- even for basic things as 'what are my high/low water marks'. Basically your stack is just a problem waiting to explode in your face, especially on smaller devices.

> * closures (being able to return a pointer to a local function)

That's mostly what I meant by sub-functions -- I used them quite a bit, until I had to crossport to llvm (they refused to implement them). Apple "blocks" look pretty good, but I know there are a couple alternative implementations...


> Fair enough -- but I think you are in a fairly corner market -- personally I think I used complex math /twice/ over the last 30+ years... I used SIMD extensively tho, from Altivec onward!

One of the best ways to implement SIMD support for C would be to add native quaternion and octonion support to the language!


For addition and subtraction, that sounds like one of the most evil-assembly-hack tricks I've heard for a high (-er than assembly) level language, but how would you do ... pretty much everything but addition and subtraction?


there are a lot of operators in the C language that could be overloaded as SIMD operations when acting over quaternionic and octonionic values. This would be useful for the most common operations. For the rest, there would be functions with well-thougt names (not the current avx monstrosities), defined on #include <octonion.h>


I found the title a little misleading: there isn’t much about where the standard is heading or even where it’s been.

I would vote for the title “a rant on undefined behavior in C.”

——

Simple example: the article complains that unsigned integer overflow is defined in C while signed integer overflow is not. There is very little in the article about this except for the claim that the performance for incrementing a signed int should match the performance for incrementing an unsigned int. The writer refuses to believe otherwise, even though he accepts that undefined behavior “supposedly” allows the compiler to omit overflow checks.

It’s the “supposedly” that makes this a rant. The article’s sources mention that Clang does omit overflow checks and that the Clang team believes this makes loops up to 20% faster (“up to” because the optimization can’t be applied to all loops, and the performance increase will depend on how tight the loop is, i.e., how much overhead there is in incrementing and testing the loop variable in comparison to the loop body).


(Technically it is not really an overflow check the problem in that case, but typically more the need of extending an index from 32 to 64 bits because the instruction set in 64 bits do not support indexing with 32 bits index)

If you want perf in a critical tight loop that has been identified by some profiling, you can easily optimize it yourself (and yes, typically bumping the counter type to size_t / ptrdiff_t is enough to optimize, but the advantage you have is that you can actually check that this transformation is sound according of the intent of the programmer of the original code, whether in the context of the C programming language, the compiler doesn't even try to check that itself, it merely blindly makes the hypothesis that there is absolutely no UB ever, and to hell if there actually was)

But anyway we have since invented sane languages in which we BOTH have safety, and the capability to apply that kind of transformations that have a small but positive perf impact. In some cases it is actually way easier to optimize using those approaches from safe languages than from mostly unchecked ones culturally full of type punning and other kind of insanity (like C is), and this is not even a recent discovery: there is a reason for why number crunching stuck to Fortran. So C should be kind of considered as a legacy programming language, at this point. A very important one for historical reasons, but one should think twice before writing new critical infrastructure with it...


> the article complains that unsigned integer overflow is defined in C while signed integer overflow is not.

The article complains that signed integer overflow is undefined, while unsigned integer overflow is not. There's a difference.

> the Clang team believes this makes loops up to 20% faster

I'd love to see the benchmarks they conducted. Anyway, undefined behaviour was the wrong way to solve the problem. The C language should have grown a `for` loop that's more than a thin veneer of syntax sugar over `while`. Seriously, how hard would the following be?

  for (int i = start; end)) {
      // body
  }
No explicit comparison, evaluate `start` and `end` at the start of the loop, maybe cast them to the type of i, and increment implicitly. While loops can handle the weird cases. There you have your 20%.


> > the article complains that unsigned integer overflow is defined in C while signed integer overflow is not.

> The article complains that signed integer overflow is undefined, while unsigned integer overflow is not. There's a difference.

You are correct.

> > the Clang team believes this makes loops up to 20% faster

> I'd love to see the benchmarks they conducted.

It wouldn’t surprise me if they have micro benchmarks. I convinced myself they were telling the truth based on crude instruction counting: a loop must be converted into something like:

- loop body (assume it contains at least one machine instruction)

- increment loop variable

- (for unsigned): clear overflow flag

- test loop variable, exit loop if appropriate

- goto beginning of loop

I believe they don’t have to actually check the overflow flag, it’s OK to let the overflow happen. But they do have to clear the flag to avoid a spurious error if the flag gets looked at later.

I’m no expert, so it’s possible this oversimplifies things, but removing the instruction to clear the flag does remove a big chunk of this loop. But it’s a big chunk only because it’s a tight loop.

For the record, I happily use the foreach loop constructs in C++, D, Java, C#, Python, Perl, etc. but I originally avoided them until I saw a comment by Walter Bright that there is no performance penalty in D (the compiler rewrites the loop appropriately; there may be a penalty in Java because the feature might be defined in terms of their relatively heavy iterators).


Microbenchmark can be very misleading compared to real impact in real programs. Still, the gains allowed by UB of signed overflow (when you are lucky enough that this transformation is actually correct in the context of what the original programmer had in mind...) are positive and probably measurable even in real programs, or if hardly measurable, maybe they at least permit a few percent of whole system perf improvement when using SMT processors. But they are more suited to other programming language than C, and actually yes, in C++ (and probably in most languages at this point) it is better both of code readability (most important!) and performance (nice to have, but very secondary compared to code readability) to use for each constructs compared to maintaining an index yourself.

Technically there is no overflow flag to reset, it is just that some CPU instruction sets do not support indexing with a 32 bit register when using 64 bits addressing, so you have to insert an extra sign extend instruction if you want to support 2s-complement signed overflow on 32 bits indexes. So you typically already don't have any cost if your indexes are already size_t/ptrdiff_t, but ptrdiff_t signed overflow is still UB according to the C standard, which is also a shame, because it allows for far less interesting "optimizations" at this point (maybe a + w >= a --> true if w is positive, but that's actually typically dangerous, because that was historically what was used to check for overflow at source level, and now the compiler is suppressing all the checks!)

So all of that really only are trade-offs, and in the modern age (with e.g. a security picture that is kind of worrying, etc.) some people are arguing that this was a terrible idea to use this approach so carelessly, in their opinion. Most experts now think that no non-trivial codebase exist with no potential UB in it, so it is not just rants all around, some even are working on the mathematical model of the llvm optimizer to make it actually sound (for now even internally, it seems that it is not -- so unfortunately with this approach of optimisation for now there is no mathematical justification as for why the optimizations performed are actually correct even with the hypothesis of strict conformance to the C standard, so I let you imagine what happens in practice when almost no program is actually conforming...)


If there are microbenchmarks, I didn’t write them. And I’ll acknowledge that my instruction-counting approach has limits, especially since I don’t really know the details of the platform. And my approach also doesn’t account for pipelining.

But I would expect someone complaining about this optimization to do more than simply hand wave with a “supposedly.” They could instead say that the optimization can be applied when the compiler can prove x < x + 1, which it can show when both the beginning and end of the loop are known at compile time. In fact, I think it’s better to say “omit the pessimization that applies when the compiler has to allow for overflow.”

But going no farther than labeling it a “supposed optimization” turns the complaint into a standard rant.


I've done benchmarks and see no performance improvement with a small loop body. If there is a minor improvement on a trivial loop - that's not much gain.

Try it yourself.


I forgot to do this last night. I think the compiler is able to apply the optimization in many cases even if the loop variable is unsigned, so it’s more accurate to say that a pessimization is added when the compiler has to account for overflow.

If the rant had said the pessimization wasn’t all that common, or that a decent programmer should ask “how can I go through this loop fewer times?” before asking “how can I speed up each time through the loop?” I wouldn’t consider it a rant. As it is, the writer acknowledges that the other side has an argument, and then he dismisses it with a hand wave and a “supposedly.”


So "rant" is an unnecessary pejorative. Try to make your case without yelling. And I don't get your point. You have just agreed that the compiler can optimize these loops without UB. The pessimization is minor and programmer choice. So what is the argument on the other side ? How do you justify a code transformation that, e.g. produces an infinite loop by removing a programmer check for a condition that actually can occur?


If I claim somebody is ranting and raving, I’m not claiming anything about whether I’m yelling, but whether they are.

Of course, “rant” has more than one meaning. I did not mean “the writer should be held in a mental institution as a threat to themself or others,” or even “writer should feel embarrassed and ashamed for publishing this.” I meant “article makes statements that generally aren’t supported by valid arguments.”

I then gave an example that (1) was not the only case of a supporting argument not actually supporting the claim, and (2) perhaps wasn’t completely fair. But first, a story:

Recently, one of my sons had diarrhea. He complained that his sister made him sick, since she had just recovered from a cold. He never understood why we rejected that argument (yes, people with colds can give other people colds, but he didn’t have a cold), and then told him he didn’t have anything to support his claim. As far as he was concerned, he had made something that looked like a valid argument: his sister had a cold, now he was sick, q.e.d.

So when I said that the statement “‘supposed’ optimization” isn’t really an argument, but a statement that “there are people who disagree with me, but they’re wrong,” which isn’t a supporting argument — if anything, it’s a placeholder for a supporting argument — I forgot that the essay does advance one reason to reject the “‘supposed’ optimization,” but I rejected this reason (the issue about whether it’s slower to increment a value that has defined overflow than one that does not isn’t a question of whether the two values are the same size), and then said the statement had no support.

It is true that at this point, I can provide other reasons to reject the compiler writers’ argument, but then the whole exercise feels like a psychic reading (writer: “I feel like blue will be important for my next argument, can you tell me why?” reader: “wow, that’s uncanny; blue is important ...”).


I don’t think that saying “the article is a rant because it doesn’t make a serious effort to respond to counterguments” qualifies as yelling.

As for my point: in my opinion, the title promised something interesting, but the article didn’t deliver. It acknowledged that the standard committee and compiler writers disagree with its premise, but then simply asserts that the committee and compiler writers are wrong. I put it at DH3 ( http://www.paulgraham.com/disagree.html ).



You are invited to bring forward concrete proposals for changes to the C standard in the relevant committee.

http://www.open-std.org/jtc1/sc22/wg14/

Since UB allows the compiler to do anything in those situations, we can reduce the amount of UB without breaking existing “legal” code. By actually defining more behavior.

If a proposal is reasonable needs to be discussed in the standardization committee. All other discussions are a nice hobby. But eventually moot.


> You are invited to bring forward concrete proposals for changes to the C standard in the relevant committee.

My proposal: make the future versions of the C standard freely available (not only drafts).


Those kind of discussions have various effects, some of which I believe to be far from moot.

First, they permit that some people even take notice about this situation. Few developers read the standard and even less write it or follow the discussions to change it (are they even open?) or write a compiler for it. The rationales are not even tracked [1]. It actually would be insanely hard to get a good understanding of those subjects by e.g. just reading the standard, without having those kind of discussions on forums typically used by more devs than just a few dozens of compiler writers...

[1]: but while I'm thinking about it, an impressive independent book as been written by Derek Jones: The New C Standard: An Economic and Cultural Commentary http://www.knosof.co.uk/cbook/cbook.html



If desired, I can comment your document over a private channel. In that case, please give a short heads-up when ready.

Note that I am not directly affiliated with WG14. So take my comments with a grain of salt.


i think you should be able to make private comments on that document. Otherwise victor.yodaiken@gmail.com


I will.


My understanding is that a lot of undefinedness in the original c89 standard came from a desire to support non 2 complement machines like Burroughs. Of course in hindsight that was a total mistake. Yes would be a great idea to come up with an modern update to the c standard that is specified on let's say the level of java.


There’s also the fact that registers may be larger than the numbers stored in them.

The best known example is that of the 80-bit floats of the 8087 FPU, but that’s relatively rare compared to loading integers into larger registers.

For example, if you compile

   if( foo + 10 > bar) goto baz
to assembly similar to

   move foo to R1
   move bar to R2
   add 10 to R1
   subtract R2 from R1
   branch-if-greater-than-zero baz
and foo and bar are 32-bits, that add can overflow if R1 and R2 also are 32-bits, but never overflows if R1 and R2 are 64-bit. That changes the result of the comparison if, for example

   foo = INT_MAX
   bar = INT_MAX
The designers of the CPU with 64-bits registers may not want to add variants of add (and subtract, mult, etc) that work on 32 bits (and 16 bits, and 8 bits).

They also won’t want to have their C compiler slowing down this kind of code, only because other (often relatively old and slow) CPUs exist.


But that's trivially solvable (on x86 at least). You do your arithmetic in rax and then compare the result in eax. You've got the "mask the high 32 bits" operation for free. At worst you'll need to do one bitwise and before using the value for cases where the high bits being poisoned matters. Or let me add annotations to arithmetic expressions that "this expression will definitely not need expensive cleaning up afterwards, promise", which I will need to use in maybe a dozen super-hot loops in the whole codebase. FWIW, we compile all our code with -fwrapv and don't notice any slowdown.


Of course it's trivially solvable in x86-64 because it has a full complement of 32-bit operations inherited from its x86 lineage. The GP is presumably talking about _other_ architectures when he mentions designs that might not want to add a full complement of 32-bit operations.

In any case, the problem doesn't really apply to x86 because all x86 compilers I'm aware of use the "expected" size for the various types rather than larger-than-needed-for-speed, exactly because the smaller operations are all generally available.


”But that's trivially solvable (on x86 at least)”

Even if that’s true (I don’t know enough of x86 assembly to judge that, but one thing I can think of that it may require extra register-to-register moves), that it isn’t an issue for one architecture won’t help a committee discussing the C standard.

Also, others in this thread describe what you call “let me annotate” as “forking C”.


As someone who was on X3J11 at the time C89 was ratified, I can tell you that supporting 1s-compliment architectures was necessary. The standard would not have been approved if 2s-compliment had been mandated. Standards are all about compromise, even when you know that that will cause trouble later.


> I can tell you that supporting 1s-compliment architectures was necessary

Can you elaborate on that point? Why was it necessary then, but not still necessary now?


Sorry, I didn't mean to imply that there are no 1s-compliment systems still around. The comment I was replying to suggested it was a mistake to support 1s-compliment architectures because that was the source of some of the Undefined Behavior (UB) that so troubles C. My point was that not supporting 1s-compliment wasn't an option then (and probably is not one now.) After all, without 1s-compliment support, how could I write new C apps for the Apollo Guidance Computer?

I would like to give an example of how supporting both 1s- and 2s-compliment is the source of a specific UB, but I can't take the time to do that right now, regrettably. Similarly, supporting both Big Endian and Little Endian was necessary. As was supporting ASCII, EBCDIC, and probably Fieldata and five level Baudot (looking at you, trigraphs). All of this generality made it hard to say anything useful in some areas, so you end up in some cases just calling it Undefined, since there was no consensus on how it should be defined.


> All of this generality made it hard to say anything useful in some areas, so you end up in some cases just calling it Undefined, since there was no consensus on how it should be defined.

This is a very valid justification of implementation defined. Maybe even unspecified. But undefined? How could you justify that?

Even if some platform does not support something at all, and goes bananas whenever you do it (say, signed overflow that traps), what would stopped the standard to say that whether the stuff is undefined or not is platform dependent?

Perhaps the committee didn't anticipate how unreasonable compiler writers turned out to be?


I agree that you want as few UBs as possible. I found the text below from the Rationale useful for understanding these issues. An important feature of UB is that it doesn't require the compiler to catch certain things that can be very hard to catch, such as dereferencing NULL pointers. As noted, it also provides opportunities for enhancing the language in ways that don't break conforming programs. An implementation is allowed to treat Undefined Behavior as Implementation Defined, simply be saying what the implementation does in such cases. But it's not required to say what it does.

----

The terms unspecified behavior, undefined behavior, and implementation-defined behavior are used to categorize the result of writing programs whose properties the Standard does not, or cannot, completely describe. The goal of adopting this categorization is to allow a certain variety among implementations which permits quality of implementation to be an active force in the marketplace as well as to allow certain popular extensions, without removing the cachet of conformance to the Standard. Appendix F to the Standard catalogs those behaviors which fall into one of these three categories.

Unspecified behavior gives the implementor some latitude in translating programs. This latitude does not extend as far as failing to translate the program.

Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior.

Implementation-defined behavior gives an implementor the freedom to choose the appropriate approach, but requires that this choice be explained to the user. Behaviors designated as implementation-defined are generally those in which a user could make meaningful coding decisions based on the implementation definition. Implementors should bear in mind this criterion when deciding how extensive an implementation definition ought to be. As with unspecified behavior, simply failing to translate the source containing the implementation-defined behavior is not an adequate response.


If I remember correctly, one of the things that's likely changing in the next version of the C standard is that arithmetic will be required to be 2s-compliment. There just aren't enough machines around which use anything else anymore.


I suppose you both can be right. It could have been required to support 1s-compliment systems at the time, but also an unfortunate mistake in retrospect.


When issues like this come up, I think the right outcome is to support the popular choice as the default, and support the unpopular choice, but with a penalty.

For example, forcing extra code to be inserted, that is, a check (or even a call out to a library), so one unified semantic is followed, and there is no undefined behaviour. Of course, you wouldn't mandate how the semantics are followed.

The problem is, it was very hard for us to accept this kind of solution when we grew up feeling like every cycle counted, so that's what I blame for a lot of the UB we see today in the standard.


Could have been (should, in some -most?- cases) implementation defined.


It's really annoying

It seems the committee is more interested in adding optimization gotchas (that nobody cares about for the most part) while making it impossible to write correct code without relying on convoluted code (which denies the benefit of "extra optimization") and "UBing everything"

C is broken, let's face it

- Builtin strings are a joke. stdlib functions are even worse.

- It was not developed with modern systems in mind.

- DIY Memory management: malloc/free is like giving a kid a chainsaw to play with. Sure, every C programmer had to work out with this crap. But yeah, please complain how my static void * should really be a volatile char * . Idiot. (of course it's not wrong to complain, but that's like complaining about a faulty blinker in a car without brakes). And of course the compiler is going to ignore the 'volatile' part of the pointer because f. you and item 7c of paragraph 3 of the spec lets us do that, even though it is blatantly stupid to do so.

Rust is a step in the right direction.

Making string and (memory) slices a fundamental part of the language helps. You can have null-terminated memory pointers for interoperability, but having it as a fundamental construct eliminates several problems. Also greenlets/threads/multiprocesses.


C was broken from the beginning if you go search what compiler researchers from Algol school of languages had to say about it during the early 80's.

"Oh, it was quite a while ago. I kind of stopped when C came out. That was a big blow. We were making so much good progress on optimizations and transformations. We were getting rid of just one nice problem after another. When C came out, at one of the SIGPLAN compiler conferences, there was a debate between Steve Johnson from Bell Labs, who was supporting C, and one of our people, Bill Harrison, who was working on a project that I had at that time supporting automatic optimization...The nubbin of the debate was Steve's defense of not having to build optimizers anymore because the programmer would take care of it. That it was really a programmer's issue....

Seibel: Do you think C is a reasonable language if they had restricted its use to operating-system kernels?

Allen: Oh, yeah. That would have been fine. And, in fact, you need to have something like that, something where experts can really fine-tune without big bottlenecks because those are key problems to solve. By 1960, we had a long list of amazing languages: Lisp, APL, Fortran, COBOL, Algol 60. These are higher-level than C. We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine. This is one of the reasons compilers are ... basically not taught much anymore in the colleges and universities."

-- Fran Allen interview, Excerpted from: Peter Seibel. Coders at Work: Reflections on the Craft of Programming

However, C and UNIX are symbiotic, so there is no way we can get rid of C, while keeping POSIX based OSes around us.

So, it will stay around for many decades until quantum computers or something else eventually takes off.


>C is broken, let's face it

Face what? Do you think any of this C criticism is novel or all that contentious (beyond the falling-sky attitude) in 2018?


I've the feeling that only a language fork from the GCC/clang teams could have any chance to success. It looks like the C standard committee is not interested in changing point of view.


GCC and Clang have already essentially forked the C language. Perhaps the most important C project—the Linux kernel—is written in the GNU dialect of C.


Although Google has managed to compile it with clang, as they removed gcc from Android.


Right, that's the point—a useful C compiler nowadays needs to be a GNU C compiler, not a standard C compiler.


To reinforce your point that is also how Microsoft decided to go on Azure Sphere.

https://www.mediatek.com/products/azureSphere/mt3620


And that was a project that took years: https://lwn.net/Articles/734071/


And is still not finished. The Linux kernel now requires the "asm goto" extension, which clang doesn't have (see https://lwn.net/Articles/748074/ and https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...).

If you check the GCC version defines (__GNUC__ and friends), you'll see that clang claims to be compatible with GCC 4.2.1, while "asm goto" was introduced later, with GCC 4.5.

The tracking bug for compiling the Linux kernel with clang is https://bugs.llvm.org/show_bug.cgi?id=4068, in case you want to follow their progress.


It is finished on Android's source tree.

https://source.android.com/setup/build/requirements

As of Android 8, only clang is used.


Woot, next up is assemble and link without binutils!


It's like the word "standard" has meaning, e.g. "not changing". Not even for good reasons? Well "good" is a subjective statement. Objectively, only changes to maintain the standard would qualify, which is a contradiction. I guess, adding to the standard library wouldn't be a problem, but I guess you are rather leaning on security concerns and all that jazz that comes up when rust is discussed. C is open as can be but you can't tack on a change of the semantics without changing the language. Then, what do you still need C for?


I want C to not grow any more. The problem with having groups to oversee things is that they usually feel the need to change or grow the thing they're supposed to be watching. Sometimes this is good, but often it's not. C is the foundation of a lot of stuff, and it'd be better to rewrite that stuff in a better language than to try and turn C into something it's not - that's what C++ was for.


I'm not too worried about C growing. It seems well and truly stuck in the 70s.

All the growth is from people doing whatever they want on top of C.


Well, you're in luck. A lot of people don't care to use anything beyond C89.


Oh please. I've been using C99 for almost 5 years now.


I would never go back. And the amount of campaigning I did at my last job to use C++11 was because all the new language features to allow compile-time checking helped us not do dumb things at run-time.

Except for the stupid TI compiler, which is still stuck on C++03.


Or C++98, depending on the chip, if some of their docs are still up to date. :\


I'm one of those. If maximum portability is desired C89 is still king.


Standard C grows at a rate of 1nm per decade... but in any case, I'm confident that your compiler will always let you select the version you like the most.


But it doesn't grow anymore, C ecosystem is out of anyone's control. The committee is as irrelevant as it can be.


Not when you need to use certified compilers.


I can't be the only one that 99.9% of the time doesn't care _one iota_ about these mythical optimizations that compilers can introduce by exploiting undefined behavior. I just want to write straightforward, predictable code that tries its best to be safe but for one reason or another have to stick with C.

Is there a guide/reference on how to disable these optimizations in modern compilers? A list of GCC/Clang arguments that disable as much of this as possible would be greatly appreciated. I've seen a lot of posts and articles discussing C undefined behavior but almost nothing describing how to counter it.


Sure. Compile at -O0.

A huge number of seemingly-trivial optimizations depend on assuming that undefined behavior will never happen. The number of optimizations that don't depend on UB in any way is quite small. For example, if you want to get pedantic about it, even automatic promotion of local variables to registers is exploiting undefined behavior—who's to say you didn't have a pointer that just happened to point to one of them?


> Sure. Compile at -O0.

Hmm, nope. Sure, it will save you from much UB. But GCC and Clang have idiotic front ends. On purpose. They subscribe to the idea that the optimizer phase should clean up the front end mess. They're not entirely wrong, since it does simplify the front end, and does not really make the rest any more complex. It does make it slower, though.

Long story short, those compilers don't generate reasonable code by default. You need at least -O1.


I was recently surprised to find that GCC, at -O0, does some level of optimization. At least it does for multiplications. https://godbolt.org/g/ekTkBm

Edit: the irony is that if you do compile with optimizations, it does a different optimization.


What do you mean by "reasonable" ? Can you give an example of code that is compiled in a more obvious, straightforward way with O1 than O0 ?


My knowledge of this is only second hand, so I don't have any specific example. I believe Chandler Carruth explains this well on one of his LLVM talks (that is, how and why the front end emits crappy intermediate code, before the optmiser kicks in).


Of course you can have a pointer to a local variable, and that variable can still spend portions of its lifetime in a register.

What won't necessarily work as expected is a shadily obtained pointer to a local variable, like by displacing &i to try to point to the same location as &j.

If &j doesn't occur anywhere (or does occur, but in a way that is optimized away), j may not even have a location.


How do you reflect the fact that pulling a pointer to a value "out of thin air" may not make any sense without either banning it entirely or defining some step of the process to be undefined behavior?


Most languages have no concept of undefined behavior and they still have compilers and runtimes that can do a huge amount of optimizations.


Because they define-away the scenarios where UB is an issue for most code. Sometimes they do this by having a GC, but Rust and Swift prove that isn't necessary.

Such languages typically do provide some form of "no promises, here there be dragons" in the form of unsafe blocks or functions. Restricting this region of unsafety is a huge benefit for programmers and compilers. Programmers because relatively few pieces of code need to be carefully reviewed. Compilers because most of the program contains little or no UB.

As a typical example, in C you can alias and type pun. Therefore to avoid all UB the compiler would need to be extremely careful in any function containing a pointer. You could return a reference to a local through a separate opaque function call, or receive multiple pointers all aliasing the same memory. To completely avoid all UB means inserting checks after every potentially mutable machine instruction, or emitting duplicate function bodies that take different paths when the pointers alias; that's assuming the compiler even has enough type information to answer the question!

With typical C#, Rust, Swift, etc you just can't cause those kinds of problems without deliberate subterfuge and use of Unsafe types or blocks.


Note that `unsafe` code blocks (or having some unsafe primitives) fundamentally results in some kind of UB in the language as a whole, and you totally can create tons of problems from it, since the rest of the language has invariants it can't itself violate, but the unsafe code can (potentially much easier than C, if there are more invariants) - that is the "UB", and the invariants that can be violated are what the compiler optimizes based on.

Lest we forget, any language with a C FFI capability must have some notion of UB, because the FFI effectively includes the C code in its own semantics (unless fully sandboxed, which may be too expensive to be done).


True, but they are quite easy to track down.

In systems like Unisys ClearPath, you can even configure the system such that only admins can execute applications with unsafe blocks.

In C any line of code, if care is not taken to use the correct compiler flags, can be a possible source of UB.


Indeed. Only wanted to point out that their optimizations also rely on UB, so they're bad examples for specifically "optimizing without relying on UB".


> In C any line of code, if care is not taken to use the correct compiler flags, can be a possible source of UB.

What do you mean?


You need to turn on all warnings as errors, static analysers and pedantic modes depending how each compiler allows them.

ANSI C11 has 200 documented cases of UB, and each compiler might have additional cases, are you sure you can know all of them by heart while looking at a random line of C code?


Languages like Ada and Modula-2 were already proving it long time ago.

Or to go even further in the past, NEWP, still alive on Unisys ClearPath mainframes.


The languages that you describe must look nothing like C (other than, ironically, syntax), and must have no untyped direct memory access pointer feature at all, which usually means they rely on a GC instead for memory safety.


We can easily imagine a language that is exactly like C in every regard, but free of some gratuitous behaviors. All standard-conforming C programs work in this dialect, and a good many others also work and are portable.

For instance, we could have a dialect of C in which this is required to print "0123":

   int i = 0;
   printf("%d%d%d%d\n", i++, i++, i++, i++);
A behavior is gratuitously undefined if it is left that way for no good reason, such that programming language constructs can be undefined simply for having the wrong form. That is to say, the input values are well-defined (i has a good initial value, which we can increment four times), and the individual operations are defined also (i++ is fine by itself). But for no possible value of i is the above printf call correct.

An example of undefined behavior which is not gratuitous is overflow on integer addition. Th expression i + j, where i and j are int, is not ipso facto undefined because of its form. Only for certain combinations of values of its operands is it undefined. We cannot simply banish that without banishing addition, and various ways of making overflow defined have drawbacks, like being expensive (e.g. target machine has no native support for the particular behavior, so extra instructions have to be generated) or super-expensive, with complicated representation and memory management (switching to bignums).


It's interesting to see which things Rust has defined where C refused to, such as fixing a strict evaluation order (mostly post-order on the AST), requiring signed integers to be 2's complement or masking shift amounts (`x << n` being `x << (n % bitwidth)`).

Where does C actually gain anything nowadays? Signed integer UB is mostly useful for optimizing misuses of `int` for unsigned values (https://news.ycombinator.com/item?id=17191295), and is evaluation order even relevant anymore? (I don't think clang can pass that information down to LLVM, at all)

The only example I gave which has known drawbacks is the shift one, where modern platforms differ in the behavior, and LLVM will only optimize out the masking of the shift amount on the platforms that have that same behavior in their shift instructions (I think x86, but not ARM).

However, even if you make all of these changes to C, there's still a lot of UB left, in the form of memory accesses, which is much harder to get rid of (see the other comments, some of which mention Rust as well).


> I don't think clang can pass that information down to LLVM, at all

I.e. you mean that the compiler front end has to choose an order without knowing which order will be good for LLVM, and there is no information to say "I don't actually need this specific order; pick another one that is better".


I can't think of a case in which changing the evaluation order between sequence points in a way that LLVM can't prove is safe on its own actually improves codegen.


Wasn't the ordering thing in C for stack push order in calling conventions? At least that's one theory I've heard.


Unspecified evaluation order helps dumb compilers produce better code. Whenever a particular evaluation order is convenient in order fit some canned code generation pattern, like a function call or whatever, that desired evaluation order can just be blindly used, rather than following forced evaluation order (using temporaries to hold the results).

Compilers for languages with strict evaluation order can still perturb evaluation order when it is safe to do so: like when expressions do not have side effects, or have effects that don't mutually interfere and are not external. They can thus avoid or eliminate the temporaries.

C is still being specified like it's 1982 and compilers have to run in a few kilobytes of RAM, in a single pass, and go straight from source to target machine code.


> and must have no untyped direct memory access pointer feature at all

Can you expand on why you feel this must be the case?

I'm thinking for example deference of pointer could be defined to be equivalent of platform-specific memory load instruction. It wouldn't be memory safe and would segfault like C, but it still wouldn't bring up nasal demons like C UB.


If every source-language load must result in a memory load, then you at the very least killed scalar replacement of aggregates (a critical optimization), and, depending on how you define it, you might have killed register allocation too.


You get register allocation back if you deny taking the address of variables declared with the `register` keyword - and we've gone full circle!


Taking the addresses of variables defined register is denied; that's the only modern meaning of the specifier.


Give me the rule "every source language load of a structure member, array or global variable, or anything loaded indirectly through a pointer results in an actual load" and I will still crank out fast code, by explicitly using local variables for caching the values coming from those places.


You'll also write unmaintainable code. To a large degree, optimizations exist to allow programmers to write maintainable code without sacrificing performance.


I will write absolutely clear code this way. I use that style anyway, quite often.

Optimization can work backwards in this regard.

For instance, let's consider CSE (common subexpression elimination). That allows some repetitive code to produce the same results as code that adheres to DRY.

E.g. stupid way to insert into a circular list:

   node->prev = prev;
   node->next = prev->next;
   prev->next->prev = node;
   prev->next = node;
The compiler has no idea whether node and prev are aliased or not. The assignment to node->next might be the same as prev->next, and so prev->next has to be reloaded even though it was just loaded in the previous line.

smart way: get a local variable for prev->next!

   node *next = prev->next;

   node->prev = prev;
   node->next = next;
   next->prev = node;
   prev->next = node;
Bonus: much more readable, lining up in neat columns. Just one arrow in each line. If this doesn't generate at most one load and four stores, the compiler is garbage.

Also note that we can reorder these four assignments in any of the 4! permutations and they produce the same result (unless there is aliasing, which would be unintentional and wrong regardless of the order).

Not the case in the original. For instance, it's important that prev->next is stored in node->next before the assignment to prev->next.

You can shoot yourself in the foot with local variables though. Caching is susceptible to staleness. You have to know when it is legitimate to keep using the cached value and when it must be reloaded or disused.

After our assignment prev->next = node, the next variable no longer represents the value of prev->next. In this case, since we are done, we don't care. If more code followed which still assumed that next is the original successor of prev (now the successor of node), that would be wrong.


Do you also, for example, look up the magic multiplication number in Hacker's Delight every time you want to divide by a constant?


That silliness is not comparable to effective use of local variables to both streamline the source code and get better output from the compiler.

By the way, the magic multiplication number can be worked out, because it's just fixed point math. We take the 32:32 fixed point representation of 1 and divide by 17 to get an approximation of 1/17 in 32:32 fixed: 0x100000000 / 17 = 0xF0F0F0F. That's our magic number for dividing 32 bits by 17 by doing a 64 bit multiplication. For instance 90/17 is 90 * 0xF0F0F0F = 0x54B4B4B46. The integer part of this 32:32 fixed point value is in the upper 32 bits which is 5.

There are some subtleties there, plus considerations of whether we want signed or unsigned division. Better let the compiler deal with it. Arithmetic reductions are safe optimizations. It's hard to imagine what you could do wrong so that an arithmetic reduction breaks your code, given that it produces the same result and doesn't interact with some some memory aliasing where the compiler isn't informed about what you're doing.

And by the way, given this code:

  #include <stdio.h>

  struct node {
    struct node *next, *prev;
  };

  void ins_after_a(struct node *prev, struct node *node)
  {
    node->prev = prev;
    node->next = prev->next;
    prev->next->prev = node;
    prev->next = node;
  }

  void ins_after_b(struct node *prev, struct node *node)
  {
    struct node *next = prev->next;
    node->prev = prev;
    node->next = next;
    prev->next = node;
    next->prev = node;
  }
gcc 7.2.0 on Ubuntu 17 generates better code for the cleaner, streamlined second one with the local variable, for exactly the reason I gave. ins_after_a yields 6 movq instructions; in_after_b yields 5.

Better source that is easier to reason about; better machine code: all round win.

  ins_after_a:
  .LFB23:
          .cfi_startproc
          movq  (%rdi), %rax
          movq  %rdi, 8(%rsi)
          movq  %rax, (%rsi)
          movq  (%rdi), %rax  <-- wasteful re-load of (%rdi) due to aliasing suspicion
          movq  %rsi, 8(%rax)
          movq  %rsi, (%rdi)
          ret   
          .cfi_endproc
  .LFE23:
          .size ins_after_a, .-ins_after_a
          .p2align 4,,15
          .globl        ins_after_b
          .type ins_after_b, @function
  ins_after_b:
  .LFB24:
          .cfi_startproc
          movq  (%rdi), %rax
          movq  %rdi, 8(%rsi)
          movq  %rax, (%rsi)
          movq  %rsi, (%rdi)
          movq  %rsi, 8(%rax)
          ret   
          .cfi_endproc
The language could be specified that way (accesses to structs are memory loads) for all I care and that could be helpful.


In other words, TBAA is fundamentally limited in its ability to capture the programmer's expectations about aliasing, because it must assume that any two pointers that happen to have the same type could alias – even if a person reading the code would consider that obviously unreasonable.

In this case, you could avoid the wasteful re-load by marking the function arguments as `restrict`. One of the reasons Rust's memory model is interesting is that the language statically prevents mutable aliasing in most cases, so "restrict" comes for free… well, kind of. (It's actually rather difficult to nail down the precise guarantees, especially when your compiler's backend was originally designed for C.)


Ah, but by using restrict I'm saying, "please screw my C program with C99 stuff that potentially introduces undefined behavior".

That's what restrict does: it makes some behaviors undefined and otherwise doesn't change the semantics of the program. It's completely against the grain of moving to a safer language.

Why would I do that, if I can instead beautify the source and machine code while sticking to what was available in C90.

I think restrict is mainly intended as a way of competing against Fortran. If you're processing arrays referenced by pointers, and can get the compiler to believe that they do not overlap, then the compiler can unroll the loops and rearrange the accesses and calculations in the unrolled body. Like it can load four elements from a source arrays, do four calculations, and then store four elements into a target, rather than interleaving. That can be done by vectorized instructions.

Here is where our technique falls short: if we write the loop body with our load-calculate-store style, it cannot be unrolled and vectorized. We have pinned down an exact behavior for all possible cases of aliasing, like self-overlap with a displacement. Vectorizing unrolls do not preserve that behavior.

Manual unrolling isn't attractive because it's a guessing game. A good amount of unrolling on one machine may give the instruction cache indigestion on another machine.


Pointers are a red herring; misuses of pointer dereferences are the manageable form of UB that programmers understand well, in reference to memory models that are straightforward. It's all the nasty gratuitous UB that wrecks the C language: behavior that could be well-defined without changing the character of the C language, or making incorrect any existing programs that are correct.

Like UB in the preprocessor. WTF? Processing a bunch of tokens at compile time should be totally safe. But no: if the ## token pasting operator glues together two tokens which do not look like one token, the behavior is undefined.

It worked differently on different compilers 35 years ago and was coded as undefined. It being undefined gave the compiler writers no incentive to fix their implementations to some common behavior (like diagnosis of an invalid token paste).


Right, because those languages are type- and memory-safe. C is not.


I'd like to see some actual evidence of some significant optimization that depends on UB.


See this sibling thread https://news.ycombinator.com/item?id=17189666 - typically anything touching memory needs invariants to be optimized, which in languages that can directly manipulate memory means there's also UB (code that can't be statically proven not to break those invariants).


you don't need UB for invariant optimization. What UB permits the compiler to do is INCORRECTLY assume invariance. This permits the compiler to "optimize" badly written C code in a way that silently changes its function - which I don't call an optimization.


I’m not sure you’ve though this through.

Any C function that manipulates pointers (or arrays) can alias those pointers, type pun, and otherwise touch the same block of memory through different paths and/or treating the bits as different types.

Vast areas of optimization are completely closed to you if you want code to be resilient in the face of aliasing.

You should sit down with pen and paper to figure out how to optimize a simple function while preserving invariants but without UB. It would be very illuminating.


It's really irritating that people take this condescending approach. C is harder to optimize than language that e.g. do not have pointers. That is a tradeoff the language designers chose because they wanted C programmers to be able to type pun, for example. This is why C89's only limit on type punning was memory alignment. The intent of "restrict" was to enable some aliasing optimizations via opt-in. So, sure, things that are not invariant in C are not available for invariant optimization. C permits aliasing. So it calls for smarter compiler techniques. If you want a language that forbids aliasing, use one that does and then call out to C libraries. But breaking C to allow CS220 "clever optimizing tricks" to be easily used is bad engineering. Essentially, what you are saying is, "elimination of invariants is a good optimization, so pretending C has more invariants than it does is a good thing. "

So maybe come up with a real example and stop hand waving.


The UB is the result of operations which assume invariants, when they are not met. Invariants are useless if you can't assume them. Having no UB in C would require having no way to break those assumptions but C is memory/type-unsafe so that's outright impossible.

Recently I've been trying to imagine what Rust's safe abstractions that use `unsafe` code internally would look like on top of some advanced mix of dependent type theory and proofs about state-manipulating imperative programs.

At that point, the optimizer would have proofs of the invariants it can rely on and it could even potentially emit proofs that every single transformation it performed preserves the (safe) semantics of the code being optimized.

But we're not there yet. Today, we need at least a small subset of the libraries of a systems language to prevent UB "by hand". And in C (or C++, although not necessarily for the same reasons), that's even harder, as there is no notion of a "safe abstraction" (one which you cannot misuse to produce UB).


>Invariants are useless if you can't assume them.

I would think that what we want is invariants that have been validated, not assumed - especially not assumed incorrectly.


You won't be able to entirely validate yourself most memory-related invariants in a language which can allow breaking memory safety, that's why I mentioned the hypothetical language which can encode proofs for invariants.

What you want is effectively automated proof-search (a hard problem) for an entirely-safe systems language (which doesn't even exist yet, AFAIK. at least not what I described).


Right. C is not designed to be memory safe. So certain invariants are harder to prove (or just not true). One problem with UB is that it allows compilers to assume false things.


Conversely, I'd like to see some actual evidence of some significant amount of UB that you could make defined without losing much optimization ability.

Everyone always rails against the bogeyman of asshole compile writers maliciously rewriting your program to eke out 1ms on benchmarks, but nobody every names which clang or llvm optimization pass, specifically, they think should be turned off even in -O2 or -O3.


That's a false choice. Obviously Linux code is still optimized significantly despite their extensive disabling of UB based "optmizations".


Okay, so, specifically, which passes do you think clang/llvm should not run that it's running now in -O3 ?


Chandler Carruth's talk on the subject was really useful for my own understanding of the reasoning behind UB:

https://www.youtube.com/watch?v=yG1OZ69H_-o


Some of that is sadly C having bad defaults - like people ending up using signed 32-bit integers (i.e. `int`) to index arrays on 64-bit platforms, which keeps signed integer overflow UB relevant, for optimizing typical indexing C code.

Both C++ and Rust use pointer ranges for iteration, and Rust even forces indexing/counting to use pointer-sized unsigned integers. So Rust turned off the LLVM bit which says "signed overflow is UB" and did not really lose much from it (AFAIK, anyway).


It's an interesting talk, but, just for example: architecture dependent behavior is not something anyone really wants to prohibit in C. It's hard to believe, that DJB, who writes sophisticated C code that calls out directly to Intel vector operations thinks that should be machine independent. And his API analogy is wrong. If you have an API that responds to out-of-contract parameters by, e.g. rewriting code it is a bad API. People do fuzzing tests specifically to find badly implemented APIs that don't reject bad parameters. In particular, imagine an API that does one thing, the thing you expect, when called without optimizations, and breaks your program when "optimized"! I got to the node example and gave up. The compiler is not asked to check program correctness.


That's, given the question, is a really moronic answer.

There are means to apply the modern approach of optimizations beyond O0 on C without importing all kind of UB from the language level. You just have to actually proove the properties you want to rely on, instead of relying on wishful "UB => authorized by the standard => programmer fault if anything goes bad" thinking.

And promoting local variable to registers CERTAINLY NOT depends on language level UB. It would be permitted by the as-if general rule even if anything prevented it to happen in the first place, which is not the case. You don't have to have an address if nobody wants it and random pointers haver never been required to allow access to all objects, especially those who might never have an address at all. Plus nobody ever expected that anyway. People expect 2s complement. Or at least something that can not result in nasal daemons, and given C history, something that matches what the processor does. So 2s complements is at least not utterly stupid. So conflating the two is dishonest to the highest point -- except maybe if the only intended audience of the C language is now experts who e.g. write compilers. What a bright future this would be.

Hell, we dropped the hypothetical flat memory model even without strict aliasing for maybe 20 years (and probably 30, to be honest), and this had NEVER caused the kind of issues we are talking about. So don't pretend it did, just to dismiss the real issues. So ok even then it was actually probably informal as hell and in some ways worse for experts, but the amount of exploited UB was also WAY smaller. Quantity matters in this area. And context too. Do you want secure OR fast embedded systems? I would prefer reasonably secure and reasonably fast. Certainly NOT fast to execute and exploit, or more probably fast to crash pathetically.

You know very well that compiling in O0 is not going to happen in prod on tons on projects.

Don't dismiss real concerns with false "solutions", especially when mixed with proofs of your misunderstanding of the situation.


> You don't have to have an address if nobody wants it

The spec says otherwise:

> An object exists, has a constant address, and retains its last-stored value throughout its lifetime.

[C11 6.2.4.2]

But that's more of a pedantic detail. More pragmatically, you do have a point. It is indeed possible to build a C compiler that has no "undefined behavior", but only unspecified behavior, as long as the program doesn't actually violate memory safety by, say, writing to some random address it can't prove it has permission to write. For example, guessing the stack slot used for a variable and overwriting it pretty much has to be undefined behavior – it's hard to optimize anything if variables can randomly change their values without being referenced. But that's okay, because overwriting random memory is inherently unsafe without obtaining a guarantee of what that memory will be used for. On the other hand, reading from random stack memory could be unspecified. A particular stack slot might be used for a variable, a temporary expression, or nothing at all, so it's unspecified what you might find there. But the compiler will always generate a single, real load instruction, without making any assumptions about aliasing that it can't prove; thus, you'll never get logical impossibilities like "x + 1 > x".

And such a compiler could definitely produce code that's better optimized than -O0 – because -O0 is a very, very low bar (it doesn't even do register allocation, in the compilers I've seen). But I expect it would do substantially worse than a modern compiler's -O2 even on average code, with a lot of little missed optimizations that add up. (Though if you're using -fno-strict-aliasing, you're probably already eating a decent percentage of that penalty.) And in the worst cases, like tight loops that can be autovectorized only by taking advantage of undefined behavior, it might be only a fraction of the speed of the better-optimized version.

Still, it might be an interesting project, especially if you could formalize the "no undefined behavior" guarantee.


Here's an example where these optimizations are useful. Suppose I create a macro DEREF_AND_FREE(x) which expands to if(x)free(*x). Often times, I'll lazily use that in places where I know x isn't null. It's more readable and maintainble than splitting the macro up into two separate macros, DEREF_AND_FREE_IF_NONNULL and DEREF_AND_FREE_I_KNOW_ITS_NONNULL. The UB-based optimizations mean that I don't take the performance hit for my laziness.


There is nothing to prevent the compiler from deducing that x is non-null without UB. What UB permits is for the the compiler to deduce that x is not null in the absence of information.

so : if(x)f(x) ...code ... if(x)g(x) doesn't need UB but y = x-> n ..... code if(x)g(x) - the compiler uses UB to remover the check. This will be especially horrible when LTO becomes more prevalent.


Disabling them is trivial: compile with -O0 in clang or gcc and your code will be compiled in as straightforward a way as possible.


UB is also a source of portability errors...


Amen. IMO the root of all evil is compiler "optimization" of code. If you really, really need the wee bit of extra performance through optimization, hire an expert at ass'y language and optimize the bit that matters. Rarely is this the case, especially with business apps.

I've seen my fair share of bugs from the compiler incorrectly hoisting variables from loops, mis-using registers, etc. I prefer to turn off compiler optimization as one of the first steps in a new project, but that's me.


> If you really, really need the wee bit of extra performance through optimization, hire an expert at ass'y language and optimize the bit that matters. Rarely is this the case, especially with business apps.

If execution efficiency is not a concern and you're using C, then you're most likely using the wrong language. Especially for "business apps".


> If execution efficiency is not a concern and you're using C

Two things.

Thing one: Execution efficient on super scalar processors is very unlike what you assume it to be. For instance you assume alignment makes you program fast, when in fact it makes it slower due to cache pressure.

Thing two: On lower end processors what's important is not speed but code size.

Thing three: (off by one error natch) The amount of critical C code is tiny tiny tiny compared to the amount of C code where execution speed just does not matter because the code is almost never executed. The bleeding hot sections are often rewritten in assembly (encryption cough decryption)


> Execution efficient on super scalar processors is very unlike what you assume it to be.

It's not. What do you think I "assume execution efficiency to be"?

> On lower end processors what's important is not speed but code size.

That's why I said "most likely", not "always". Besides, those tiny microprocessors typically don't run the "business apps" which bokglobule was talking about.

> The amount of critical C code is tiny tiny tiny compared to the amount of C code where execution speed just does not matter because the code is almost never executed.

Not true for most software. Sure, most applications spend most of their execution time in a small part of the code, but it's typically not so small that you could easily rewrite it in assembly.

> The bleeding hot sections are often rewritten in assembly (encryption cough decryption)

Cryptography kernels are written in assembly to ensure that they have a constant execution time to prevent timing attacks. Not so much for performance.


> I "assume execution efficiency to be"?

You assume that modern high performance processors execute instructions. They do not, they analyze, optimize and emit their an internal instruction stream and then execute that. That extra step is what makes fiddly UB type 'optimizations' worthless. And since you don't know what processor is going to execute the code and how it's optimized, the speed gains are basically 'noise'

> Besides, those tiny microprocessors typically don't run the "business apps" which bokglobule was talking about.

"Business apps" is usually network and io bound more than anything.

> Not true for most software.

Dan Bernstein says you're wrong.

> Cryptography kernels are written in assembly to ensure that they have a constant execution time to prevent timing attacks. Not so much for performance.

Dan Bernstein says you're wrong here too. Performance is everything with encryption. Consider video streaming over and encrypted connection. Yeah that.


> You assume that modern high performance processors execute instructions.

Yeah, stop making stuff up about me.

> they analyze, optimize and emit their an internal instruction stream and then execute that.

I know how an out-of-order, superscalar processor works. I also know that they can't do magic, because their optimizations are severely limited by time and scope constraints (although they do have access to some information that isn't available at compile time).

> That extra step is what makes fiddly UB type 'optimizations' worthless.

I'm not arguing for optimizations based on UB. This subthread is about compiler optimizations in general, which bokglobule claimed to be nearly always unnecessary.

> Dan Bernstein says you're wrong.

Do you have a study you can cite? Do you think just dropping a name will convince anyone?

> Performance is everything with encryption.

I never claimed it wasn't. I claimed that performance isn't the reason why cryptography kernels are rewritten in assembly, and that's because C + optimizing compiler is already fast enough and the small performance gain alone doesn't justify the switch to assembly.


You can care enough about efficiency to use C, and still not care about the last 0.1% of efficiency. You can especially not care about the last 0.1% of efficiency in 99% of your code.


If you're compiling C with -O0 (as OP implies)... it's not just the last 0.1% of efficiency you're missing out on. Modern C compilers generate really, really crappy code at -O0, and you're looking at a 10x, 20x slowdown by forgoing optimizations. At those slowdowns, using an interpreted language that lacks a JIT starts to look competitive in performance.


Which is why it's so important to have -- ultra-clear -- (have you looked at the clang documentation or GCC manpages?) guidelines on how to disable optimizations that can be dangerous/unpredictable (strict-aliasing comes to mind) "within" -O1 or -O2 or -Os

One shouldn't need to throw the baby out with the bathwater (-O0) in order to get some semblance of semantics that won't pull the rug under your feet when you're not looking.


In practice, what these requests tend to boil down to is requests for the compiler to read the programmer's mind (strict aliasing is pretty much the biggest exception). What optimizations do you think are "dangerous/unpredictable"? Demanding that things like traps happen predictably means that you heavily constrain the ability to do dead-code elimination (can't eliminate code that could trap!) or loop-invariant code motion, two of the biggest performance wins, especially for things that could trap such as memory loads and stores, which are the code you most want to avoid whenever possible for performance.

Undefined behavior essentially says that compilers don't have to care about what happens in the cases that would constitute undefined behavior. This doesn't manifest in the compiler as if (undefined_behavior()) { destroy_users_code(); }, contrary to popular opinion. It instead tends to manifest as logic like "along this control-flow path, this condition is true, so we can now thread the jump from block A to block C since you're redundantly checking a known-true condition" and only after unwrapping several layers of computed assumptions do you find the "we assumed overflow cannot occur" at the bottom.


Exactly this.

There seems to be some idea that there is some "evil pass" in the compilers which looks for undefined behavior and then maliciously optimizes surrounding code to do something the programmer didn't expect.

Not at all. Even all the examples of UB leading to unexpected optimizations usually involve a chain of events, with very straightforward and necessary optimizations being involved, like value propagation, inlining, dead-code elimination, etc - and these aren't inherently related to "exploiting UB". You could (in some compilers) disable one of the things in the chain and perhaps avoid the problem for that example, but you'd also hurt a lot of other code that relied on the optimization.

That's one of the problems with people asking for specific small snippets of code where the UB-related transformation produced a big gain: you can certainly find these (but they'll often be picked apart) - but the bigger problem is not with the specific examples: it's that whatever optimization optimization you disable to make the small example work as you'd expect might then produce worse code across your application.

So people who want to disable the optimization that does "that" are often incorrectly assuming there is a small simple optimization which leads to "that" in the first place.

Still, I definitely agree that the situation regarding UB is depressing in many respects. Many of the decisions made by the C committee in the past haven't aged well. If you take a look at the low-level optimizations afforded by the largely-deterministic Java specification, they are largely at the same level as C - but Java had the benefit of coming along a couple of decades later where many open questions in C's time, such as integer overflow behavior, integer sizes, shift behavior, pointer models, etc, had largely been resolved. Platforms that don't conform to the JVM's model of an ideal machine will just have to generate slow code in some cases.


I'm not requesting the compiler to read my mind, I'm just asking for dead obvious and simple guidelines that allow me to perform a cost-benefit analysis and tune the compiler's behavior to what I consider acceptable.

Examples:

I don't care at all about optimizations that are a result of treating signed integer overflow as undefined behavior. I'll go for predictable, deterministic behavior every single time.

Same for strict aliasing rules. I find it absolutely insane and mind boggling that -O2 enables strict aliasing amongst who knows what else (50+ other flags). Why can't the impact of these optimization flags be easier to deduce? Why do I feel like I need to be a compiler developer just to get some measure of confidence in what the optimizers are doing? It's insane that there are people who treat this sort of unwarranted, dangerous complexity as a rite of passage and don't push for something that's better. Most importantly, it's terrifying that a significant chunk of C programmers _are not even aware of these issues_.

C will stick around for decades and will continue to be picked up by newcomers. It doesn't have to stay as dangerous and uncompromising as it is now.

EDIT: There is some progress with things like ubsan and asan but alas they're fairly limited platform-wise. What's worrying is that there hasn't been a clear shift in the mentality of those who control the language as the OP indicates in his report.


> I'm not requesting the compiler to read my mind, I'm just asking for dead obvious and simple guidelines that allow me to perform a cost-benefit analysis and tune the compiler's behavior to what I consider acceptable.

And how can I, as a compiler writer, know what you, the compiler user, consider acceptable without reading your mind?

It would be nice if we had some sort of contract. I, the compiler writer, will say that, so long as you write code that conforms to this contract, I will faithfully execute that code. And you, the programmer, will say that you will only write code that conforms to this contract, and that I don't need to worry about you writing code that fails to conform to this contract. Well, somebody already wrote this contract: it's the C specification.

What you're really saying is that you don't like the contract you have, which is a totally valid position to take (I myself would love to see strict aliasing go die in a fire). But almost no one who objects to the C contract is willing to actually negotiate a new contract. Instead, it's almost invariably a complaint that amounts to "you fucking compiler writers are ruining my code with your stupid optimizations." Well, no, your code was already broken per the contract; we were just too dumb to realize it earlier. And for the big undefined behaviors where well-behaved semantics are feasible (such as strict aliasing or signed integer overflow), we provide options to let you say "I don't agree to the C contract, I want to agree to a not-quite-C contract instead."

Many people right now don't realize that undefined behavior is merely a contract that the programmers not to create these statements, true. But that's why we, as compiler writers, try to educate people about why undefined behavior exists, what it means for programmers, and providing tools to make it much easier for programmers to find where it exists (hence things like asan and ubsan).


> C will stick around for decades and will continue to be picked up by newcomers. It doesn't have to stay as dangerous and uncompromising as it is now.

Exactly, even if most of us don't touch it directly, we need to use systems that make an heavy use of C (including Objective-C and C++), so it would already be an improvement to our computing stack if C could be improved as such.


Every once in a while I have to compile a kernel without optimizations. The performance penalty you pay is far away from 0.1%, it is definitely very noticeable on first glance. There might be projects where even that does not matter, but I would not think those are the majority.


An order of magnitude slower on non-trivial computation-heavy code (i.e,. not just doing a lot of IO or calls into libraries/the kernel) is a pretty good rule of thumb. Sometimes better, sometimes much worse.

The more layers of abstraction, the worse the penalty for not optimizing, so C++ is usually more heavily affected than C, for example: many of the so called "zero-cost" abstractions in C++ rely on a good optimizer.


C is nowadays foremost a system programming language. Try compiling your operating system with -O0 and see if you still don’t need those optimizations. You’d be surprised.


there's tenuous and difficult to reason about optimizations and then there's removing all of the completely useless loads and stores of intermediate results that occurs in the default simplistic translation.

i might be with you on the former, but that first pass really cuts down on executable (icache) size, nets a huge performance win and actually makes the generated code alot easier to read.

some of those analyses can get pretty involved. but i dont think we should be discouraging people from investigating them - or from providing nice safe defaults and flags to turn them on.

one other speed-independent factor here is that like you I usually start without any optimizations. and usually when I turn it on I find a few bugs in my own code right away. thats pretty helpful.


If the project will eventually be compiled with optimizations in production, it's probably better to hit those problems earlier than later.


How much of the UB in C is carried over to C++? Does the C++ standards committee carry a lot of the UB over into C++, or try to fix the issue? If C++ doesn't have the same UB, maybe the solution is to compile C code with a C++ compiler.


The intent of C++ is that the subset that is valid C code ends up being largely semantically identical in C++ and C. There are some differences, particularly related to the type system (for example, in C, a ternary expression is an rvalue, where it can be an lvalue expression in C++). Most, perhaps all, of the undefined behavior is retained C++.

There is potentially a case where C eliminated undefined behavior that C++ retained--the "union trick" for getting access to the bits of a float was made legal in C99 but the C++ language (modified to support unrestricted unions) suggests that it is undefined behavior in C++.


Strict aliasing needs to go out the window. That's for starters.


That may be. It sure is broken. What they did:

Things of similar base type (so ignoring "signed", "const", etc.) may alias. Everything may alias with "char".

What would make more sense:

Any unsigned integer type may alias with any type of the same size; this is not transitive. (so a floating-point type may not alias with a pointer type, but there likely are unsigned integer types that may alias with both) Structs may alias other structs from the beginning until the point at which the types of their content diverge.

Even with that better default, you'd still want easy ways to override it in both directions.


Naive question:

Are there any compiler optimizations that -Ox makes possible which couldn't have been written originally as readable C code?


For sure.

Consider idiom recognition, where a compiler takes a series of operations that implement some operation that isn't expressible as a language primitive and turns it into a single machine instruction that performs that operation. Rotate built up from a couple of shifts for example.

Consider auto-vectorization for which the language has no direct support.

Consider inlining and value-propagation and all the simplification can that occur when you combine them.

Basically compile any non-trivial C program without optimization (or without some interesting optimization option) and with optimization, and look at the assembly. In many cases you wouldn't be able to write C code at all to reproduce the optimized version with the not-optimizing compiler. In other cases you could, but at the cost of code duplication or other things that would reduce the quality of your code.


I didn't know the C std was still being modified/enhanced. Interesting. Does sound scary.


All ANSI standards are required to be maintained and updated periodically, or they lapse. That can be as simple as rubber stamping the last standard, but in practice, there are usually a few tweaks needed.


It hasn't been updated in any meaningful sense beyond C89, though.


inttypes was pretty big, right? I know it's just one header, but it's an important thing to have in the standard for a systems language.


C'mon it even has some kind of primitive generics support nowadays, which other modern languages keep ignoring.


C11 generic selection is basically not worth using and almost worse than not having "generics" at all. It exists so tgmath.h can be written without implementation magic but tgmath.h is kind of a garbage fire itself.


Faintly?!


We need a -Osafe flag.


https://github.com/ziglang/zig and Jonathan Blow's JAI language seem to be contenders for C replacement.


Arguably the Rust language is a replacement candidate for 'C' (and C++).


> Arguably the Rust language is a replacement candidate for 'C' (and C++).

Or is it C and 'C++'?

As it happens, I disagree, because I think a lot of programmers who can otherwise handle pointers and manual memory allocation are going to revolt if we made them do the explicit ownership management notation Rust demands. It's too different from anything done in any other language, and the push-back that it's "too high-level" and "inefficient" and "bloated" will kill it for sure.


I'm studying Rust, though wouldn't claim I know it yet. I believe Rust is an effort to create a language in which you can write code that's as fast and portable as C; and yet, for which, the compiler can help you coordinate a "single owner permitted to write" strategy for thread safety -- at the expense of increased bother and fuss coaxing your program into that pattern. I wouldn't call it a C replacement; while I'm glad to have Rust available, I prefer concurrency (when I need it at all) via processes, rather than threads, with no shared memory.


These guarantees are useful for more than threads, too: http://manishearth.github.io/blog/2015/05/17/the-problem-wit...


> while I'm glad to have Rust available, I prefer concurrency (when I need it at all) via processes, rather than threads, with no shared memory.

Are you implying that Rust can't do multi-processing through processes?


I don't know yet.

I've heard about at least two languages that adore threads, and have issues with multi-processing. Perl6 has threads in its runtime helping its GC --- and calling fork kills its programs. I don't know yet if rust has threads _required_ in a fashion that makes it crash if you try to fork().

perl6 cannot fork; and golang behaves badly if you fork in a program that uses goroutines. Ends up unhappy, since shared library manipulation is kinda dependent on the process loader. Oops.

I'm tempted to hope that the Rust lang devs have kept use of threads out of its required runtime. To answer your question, I didn't mean to imply that rust cannot call fork(2), and I hope it isn't true. I meant to say that I wouldn't call Rust a C replacement; it's an effort to solve a problem (threads with shared heap or stack) that C flat doesn't attempt to consider. I love C, and feel obliged to learn Rust.


Rust has no more runtime than c. Threads are implemented in the standard library, but they'll only be used if you explicitly create one (or a library you are using does). Fork works fine :)

Rust has a very different design philosphy to c, and in that respect you are right that it isn't a complete replacement. That said, Rust is good for far more than just multithreading. It solves memory safety as well as tgread safety.


> Fork works fine.

It can work but it’s quite unsafe. The standard library doesn’t make any guarantees about fork safety. The language itself doesn’t understand what fork is, and so can’t guarantee anything.


FWIW, Perl 6 does have `Proc` (to run an external process) and `Proc::Async` (to run an external process asynchronously). But `fork`, no: it was decided that `fork` was a unixism that was not supportable on OS's like Windows. And it was also decided to not make the same mistake that Perl 5 made with regards to trying to mimic `fork` on Windows, and use that idea later to try to mimic threads on non-Windows OS's.


Thank you! That's great to hear. I'm sorry that they chose that direction; I'd much, much rather not support Windows than not support Unix's fundamental primitive for multiprocessing (and its coordinated IPC). But I do realize that not everyone can act on such preferences. Fortunately, perl5 is still well and strong.


I'm very excited about Jai and I'm eager to test it out when it's released!


And, of course, DasBetterC (D as a Better C), which is a D subset designed to only require the existence of the C runtime library.


Zig and Jai were designed to not require the C runtime library. You can't replace C if you're building on top of it, can you?


You'll need something to set up an entry point, stack, somewhere for assert() failures to go, etc. The C library is minimal enough and fine for that. If you know what you're doing, it's easy enough to take care of those services yourself with DasBetterC.


That something can be written in machine code or the alternative language though. Could you provide a link for DasBetterC? Googling didn't really lead to anything.


The name is a bit of a pun. I wrote a presentation at DConf "D as Better C" and noticed what I'd unintentionally written :-)

https://dlang.org/spec/betterc.html

http://dconf.org/2018/talks/bright.html


Wow... I remember a simple, elegant language. The C, I knew had close to a 1-1 mapping to assembly. The compiler didn't overthink things.

How can you muck this up!

I thought it was C++'s job to turn something simple/elegant into an indecipherable mess!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: