Depressing and faintly terrifying days for the C standard [pdf]

copper_think · on May 30, 2018

At Microsoft the compiler team (Visual C++) and the Windows team are joined at the hip. I'm sure the same was true at Sun. This can lead to good decisions about undefined behavior that I hope would make Linus smile.

I recently learned of one such good engineering decision (I hope I'm remembering it correctly). Let's say you have a struct with an int32 and a byte in it. That's 5 bytes, right? But the platform alignment is a multiple of 4 bytes, so there's 3 bytes of padding (sizeof the struct is 8 bytes). If we stack-allocate an array of 11 of these and zero-initialize with = { 0 }, what would you expect to see in memory after initialization?

It turns out the answer was that the first element of the array would have its 5 bytes zeroed, but the 3 bytes of padding would be left uninitialized. Then, the remaining 10 elements of the array would be zeroed with a memset that actually zeroed all 80 remaining bytes. It sounds weird but this is a legal thing to do from the standard's perspective. All they're obligated to zero out are the non-padding bytes. This UB was leading to disclosure of little bits of kernel memory back into user mode because Windows engineers assumed that = { 0 } was the same as leaving the variable uninitialized and then memsetting the whole thing to zero. Nope!

The compiler team fixed this by always zeroing out padding too. Problem solved. There are some cases where it's not quite as fast. But it's the right engineering decision by the compiler team for their customers, both internal and external.

hermitdev · on May 30, 2018

Minor nitpick, the zeroing (or lack thereof) of the padding is not undefined behavior, it's unspecified behavior. Undefined behavior and unspecified behavior often look and perhaps behave the same to the programmer, but have semantic differences. In the face of undefined behavior, the compiler is allowed to do pretty much anything it wants (including formatting your hard drive and/or launching the nukes). With unspecified behavior, the compiler implementer must make a conscious decision on what the behavior will be and document the behavior it will follow.

jcranmer · on May 30, 2018

> With unspecified behavior, the compiler implementer must make a conscious decision on what the behavior will be and document the behavior it will follow.

No, what you described is implementation-defined behavior.

It may be confusing, but here's the breakdown of different kinds of behavior in the C standard:

* Well-defined: there is a set of semantics that is defined by the C abstract machine that every implementation must (appear to) execute exactly. Example: the result of a[b].

* Implementation-defined: the compiler has a choice of what it may implement for semantics, and it must document the choice it makes. Example: the size (in bits and chars) of 'int', the signedness of 'char'.

* Unspecified: the compiler has a choice of what it may implement for semantics, but the compiler is not required to document the choice, nor is it required to make the same choice in all circumstances. Example: the order of evaluation of a + b.

* Undefined: the compiler is not required to maintain any observable semantics of a program that executes undefined behavior (key point: undefined behavior is a dynamic property related to an execution trace, not a static property of the source code). Example: dereferencing a null pointer.

susam · on May 31, 2018

Nice comment! Here are the excerpts from n1570.pdf[1] with some punctuation added by me to compensate for the limited formatting support on this forum:

§3.4.0: behavior: external appearance or action

§3.4.1: implementation-defined behavior: unspecified behavior where each implementation documents how the choice is made. EXAMPLE: An example of implementation-defined behavior is the propagation of the high-order bit when a signed integer is shifted right.

§3.4.2: locale-specific behavior: behavior that depends on local conventions of nationality, culture, and language that each implementation documents. EXAMPLE: An example of locale-specific behavior is whether the islower function returns true for characters other than the 26 lowercase Latin letters.

§3.4.3: undefined behavior: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements. NOTE: Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). EXAMPLE: An example of undefined behavior is the behavior on integer overflow.

§3.4.4: unspecified behavior: use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance. EXAMPLE: An example of unspecified behavior is the order in which the arguments to a function are evaluated.

[1]: WG14 working paper for the C11 standard: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

billforsternz · on May 30, 2018

Thank you kind sir, you win the prize for most coherent and informative technical snippet in the world. On this day anyway.

loup-vaillant · on May 30, 2018

Accessing uninitialised memory is undefined. As in Nasal Demon undefined. If you don't zero out the padding, I'm sure there's a clever way to access those bytes in ways that invoke undefined behaviour.

caf · on May 31, 2018

The padding has unspecified values, which is distinct from being uninitialised.

If it were otherwise, you couldn't `memcpy()` structures around.

loup-vaillant · on May 31, 2018

I believe there's an exception for the likes of `memcpy()`. Something along the lines of "type punning and reading indeterminate values is undefined except when we're reading through a `char*` or something.

I'll check the unbelievably thick book that tries to specify C11 (I've printed it, it's over 2 pounds).

caf · on June 1, 2018

The only way to access the padding without otherwise falling into undefined behaviour is by using a char * anyway (including indirectly using a char *, through memcpy()).

iainmerrick · on May 30, 2018

You say that as if it’s entirely the responsibility of the programmer to avoid these bear traps that have been left lying around everywhere.

Why not just have the compiler zero the memory, and thereby remove the trap? Seems very sensible to me. Do you think it’s a bad idea, and if so, why?

kazinator · on May 30, 2018

There is a concern for performance. But that's no reason. Zero initing could be default behavior that can be declared away. E.g.

As a type qualifier keyword:

  { int x, y; /* x and y are zero */ }

  { int noinit x, y; /* x is indeterminate, y is zero */ }

Or as a declaration specifier:

  { noinit x, y; /* both x, y indeterminately-valued */ }

Or a special constant for suppressing zero initialization:

  { int x, y = noinit; /* x zero, y indeterminate */ }

Similarly, unspecified order of evaluation could be supported by explicit request:

  decl (unspec_order) { /* comma-separated list of decl items */
     a[i] = i++; /* UB */
  }

  a[i] = i++; /* well-defined */

rurban · on June 2, 2018

zero initing is default behavior for static values or structs.

    static int x, y; /* x and y are zero */

sam_dam_gai · on May 31, 2018

Good idea!

loup-vaillant · on May 30, 2018

> You say that as if it’s entirely the responsibility of the programmer to avoid these bear traps that have been left lying around everywhere.

Oh no no no, I was picking on hermitdev's characterisation of the behaviour. Sure, what the compiler does with the padding bytes is unspecified. But it can still lead to undefined behaviour, if some unwary user ever reads them. A misinterpretation like that, and poof you get a security vulnerability. The C standard is insane.

iainmerrick · on May 30, 2018

Aha, sorry, I misunderstood! Glad to hear you’re on the side of sanity. :)

speedplane · on June 2, 2018

It's more than a logical or performance issue, it's a cultural one. In C culture, there's a sense that the programmer has direct control over the hardware. They can literally write or read to any memory address as they see fit, and have very fine-grained control over what the machine is doing.

Modern processors, with their out-of-order execution, complex caching algorithms, deep pipeliens and multi-threaded hardware often render this sense of control more illusory than factual, but the C culture remains wedded to the idea that the programmer is in control.

Accordingly, it's pretty provocative to suggest that a compiler or runtime would zero out memory without you specifically saying so. Any C programmer can overload malloc (doesn't zero memory) to calloc (which does). Whether it's a good or bad idea to do so is up to the programmer. The overall idea is don't do anything unless I say so.

xorblurb · on May 30, 2018

Sane compilers should do that. The standard should eventually specify that. But before it does, you can not write portable code that expect that (but hopefully once enough compilers are sane but before the standard is updated, you can write code that targets only the compilers, and don't give a fuck about the other broken garbage that try to trap the world)

MichaelMoser123 · on June 1, 2018

if you reference a null pointer then this behavior is 'undefined' from the perspective of the C compiler - however it is very well defined by the operating system (if you are doing a user space application).

No nukes get involved here - only core dumps.

MichaelMoser123 · on June 3, 2018

If you are working on top of an operating system that launches nukes upon null pointer access, then you should consider to switch vendors.

ythn · on May 30, 2018

> and document the behavior it will follow.

So it's unspecified in terms of the standard, but specified by the implementation

jexah · on May 31, 2018

Right! Basically it's up to the compiler programmer to pick a path and follow it... assuming you're talking about implementation-defined.

bigcheesegs · on May 30, 2018

The problem with this kind of approach is over time it removes the ability to use any other implementation. You are no longer using C, you are using <Implementation>C.

This becomes a problem when a different implementation adds amazing tools for finding bugs (the sanitizer suite, for instance), and you can't use them because your code doesn't build with any other implementation.

8_hours_ago · on May 30, 2018

Large and complex pieces of software, and operating systems in particular, tend to be tightly tied to their compilers. It is never easy and in some cases practically impossible to port to a different compiler. I expect that Microsoft has come to terms with the fact that Windows will only support being compiled by their compiler.

When a different toolchain introduces a new feature for finding bugs that would be useful for Windows, the Microsoft compiler team can add that functionality to their own tools instead of porting Windows. An advantage of this is that they can customize the feature for exactly their use case. Yes, this is the definition of NIH syndrome, but that’s how large companies work.

xorblurb · on May 30, 2018

Large pieces of sw have been able to switch some plateform specific code to other compilers (chrome for windows comes to mind).

This is probably way smaller than the whole Windows, but I would not be surprised if some MS dev are already internally compiling some of their components with clang for their own dev/testing (even if just for extra warnings, etc.)

And a major part of the work of the MSVC team today seems to be about standard compliance.

But yes, I do not really expect that they switch, and actually they probably don't even have the beginning of a serious reason to do so. This is not even a case of NIH. Their compiler derives from an ancient codebase and has been continuously maintained for several decades. They "invented" it. The only modern serious competition (that cares enough about Windows compat and some of their specific techs) has been started way after... They probably also have all kind of patents and whatnot about some security mitigations that are implemented by collaboration between the generated code and (the most modern versions of) low level parts of the Windows platform.

pnathan · on May 30, 2018

> . You are no longer using C, you are using <Implementation>C.

If you go deep and gnarly enough with your system, this is always the case, I'm afraid. There's a famous talk at Stanford by a Coverity... founder? consultant? about this.

taneq · on May 31, 2018

> You are no longer using C, you are using <Implementation>C.

This is always the case because the C standard is only a partial spec. There's no such thing as "a program written in C(++)", there's only "a program written in <Implementation> C(++)". If you compile your program with a different implementation, then it's a different program. It may work, but it may not.

What I'd really like to see is a modified version of the C/C++ standards which keeps the language the same "in spirit" but removes all undefined and implementation-dependent behaviour. This would give compiler writers a stationary (or at least slower-moving) target to aim for and make it possible for C to be portable in theory as well as (sorta) in practice.

Gibbon1 · on May 30, 2018

However in this case 'Standard C' is broken and <Implementation C> is not. The proper thing is to fix the standard to require padding bytes be zero'd.

BeeOnRope · on May 30, 2018

It's easy to see both sides here. The overwhelming majority of code is not written to be run a in a kernel at the kernel-user security boundary, so for most code paying something to initialize padding might not be a great tradeoff. That is, the vast majority of code is running within a single security domain and doesn't need to protect itself from itself.

Still, avoiding initializing padding is probably not a great example of a performance win through standards exploitation: in the example given it's not clear why you'd not just do one 8 byte zeroing write to cover the whole structure, rather than apparently splitting it into a 4-byte and a 1-byte write. Perhaps this was 32-bit code, where 8-byte write are slightly trickier, but even two 4-byte writes are likely to perform just as well or better. Probably it's just bad codegen to treat the initialization of the struct[11] in two parts: a single struct and a 10-member array.

iforgotpassword · on May 30, 2018

Luckily clang does a good job at being GCC compatible, so I don't worry too much about using GCC extensions. It's quite unlikely one of them will go away anytime soon, and theory pretty much cover all architectures/platforms that have ever existed.

pjmlp · on May 30, 2018

Usually true, unless you want to target embedded, mainframes or some industrial OSes.

saagarjha · on May 31, 2018

It doesn't support everything; for example, Clang doesn't do nested functions.

pjmlp · on May 30, 2018

This was pretty much how writting C code between K&R, ANSI C89 and subsets e.g. Small-C used to look like.

EpicEng · on May 30, 2018

I fail to see how implementing an unspecified part of the standard in a way which _doesn't_ leak kernel memory could ever be a problem.

BeeOnRope · on May 30, 2018

It's not a problem for the compiler, of course.

The problem is the C code that relies on it: effectively you are using a dialect of C which gives stronger guarantees, so you lose the ability to use any other implementation which doesn't provide those guarantees.

EpicEng · on May 30, 2018

Yeah, I (somehow...) missed the part where the GP explained that the kernel dev's fix was to just rely on the now updated unspecified behavior. I assumed the changed the compiler, but he also memset the structure before sending.

coldtea · on May 31, 2018

>The problem is the C code that relies on it: effectively you are using a dialect of C which gives stronger guarantees, so you lose the ability to use any other implementation which doesn't provide those guarantees.

How do you lose it in this case? You shouldn't been reading values from those padding bits anyway...

boomlinde · on May 31, 2018

Tell that to a hacker. The problem here isn't that the code is functionally dependent on padding bytes, it's that when you copy those padding bytes around you are leaking information that you probably never meant to copy.

This can be problematic if you are copying kernel space memory to a user space process, for example. Let's say there's a call into the kernel that returns a copy of this 4+1 struct with three more bytes of padding. Maybe what was on the stack before the space was assigned to those last padding bytes are some information the kernel definitely shouldn't leak to user space, like some bytes of a password, and now any user space process could potentially read them simply by calling some unrelated kernel function.

robotresearcher · on May 31, 2018

It’s great that the compiler team could implement safer behaviour. But if the programmers’ intent was to zero all bits in the array they should express this explicitly in the code with a memset(). Otherwise a change in the compiler later could throw up this vulnerability again. The code should express the semantics as clearly as possible.

adrianratnapala · on May 31, 2018

Then other other language lawyers will come around and tell you why you should use {0} instead of memset (e.g. because for some combinations of type and architecture the zero value isn't full of zero bytes).

This example also shows how "the semantics" is a fiddly concept. The reason the standard allows leaving bytes unzeroed is because they are not "semantically" important. But they actually do matter.

The problem with the mentality that it's always the programmer fault for not following "the rules" is they you eventually get to the point where the rules allow for no good solutions at all.

robotresearcher · on May 31, 2018

I didn't say it was always the programmers' fault. But I believe the programmer should express their intent as clearly as possible. And memset( (void*)buf, 0, buflen ) says fill-a-contiguous-array-of-unsigned-chars-of-value-zero, which is semantically different from initializing an array of structs that may have padding, and better matches the programmers' intent. It doesn't matter if the zero value is all 0 bits or not - the important thing is that the whole contiguous memory region is zeroed.

I believe C99 says chars and unsigned chars have no padding.

https://stackoverflow.com/questions/13929462/can-the-unsigne...

robotresearcher · on May 31, 2018

Another, better, response to your argument. Initializing an array of structs with = {0} does NOT tell the compiler that zeroing the entire contiguous chunk of memory matters - only that each struct should be zeroed. While memset( (void*)buf, 0, len) does tell the compiler that the entire contiguous memory chunk must be zeroed, which is what is intended.

Language lawyers need not apply.

tlb · on May 30, 2018

It may be the right thing for the moment, but what happens to that code when new versions of the compiler come out? Someday it'll start leaking information again. Relying on non-standard behavior always ends in tears.

EpicEng · on May 30, 2018

>Relying on non-standard behavior always ends in tears.

Who's relying on unspecified behavior? Do you mean some theoretical programmer? Well of course that's not a good thing, but I don't see what it has to do with the situation in the story and the MS engineers decideing to change how they implement that unspecified part of the spec.

tlb · on May 30, 2018

The kernel programmer did, in this story.

An example of such an information leak is:

  struct foo {
    int a;
    char b;
  }

  void send_foo_01() {
    foo x {0, 1};
    write(fd, &x, sizeof(foo));
  }

which sends 3 bytes of the contents of the stack memory over the network, for almost every compiler except the one in the story. Running in a kernel, that could contain secrets.

EpicEng · on May 30, 2018

Sorry, my mistake; yes, the kernel dev is now relying on the compiler to zero out those three bytes. I understand the decision to change this in the compiler, but I think the "fix" would have been to memset in the kernel code. I'd be surprised if they didn't do that, but maybe they can reasonably assume they'll never use another compiler to build the Windows kernel.

legulere · on May 31, 2018

Signed integer overflow. It’s undefined because some platforms don’t use two’s complement.

Checking it after the fact is non-portable.

rurban · on June 2, 2018

But not in sane compilers, they are better than the standard.

gcc, clang, icc all assume two's complement and happily overflow via -fwrapv. There's no such compiler on the Unisys anymore. They rather emulate two's complement. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm

LexiMax · on May 31, 2018

Thankfully, not for very much longer...

http://wg21.link/P0907r2

C++, but apparently something similar is being considered for C as well.

https://twitter.com/jfbastien/status/989242576598327296?lang...

JoeAltmaier · on May 31, 2018

Isn't that mostly theoretical? I can't seem to find any processor that doesn't use 2's complement.

gh02t · on May 31, 2018

There are some old architectures that used one's complemeny and there are probably some still in service e.g.

https://en.m.wikipedia.org/wiki/UNIVAC_1100/2200_series

davidgould · on May 31, 2018

I had the pleasure of programming on these. Having both positive and negative zeros is special and that they don't compare equal is extra special. And yet -- ones complement arithmetic is not even the oddest thing about Univac 1100s. I think the Univac corporate values statement must have included both "Dare to be different" and "Remember your past".

JoeAltmaier · on May 31, 2018

Is anybody writing code for them? Probably not porting modern C anyway.

I assert this 'it might not be two's-complement' excuse is nonsense.

gh02t · on May 31, 2018

I seriously doubt it.

I do agree with you, it's not a big deal for most devs. The only place I'd worry about it is systems where it might be expected to be in service for many years or intended to be ported frequently. In that case, you never know what sort of crazy architecture you may end up with in the future.

There's always something a bit icky with not sticking to the spec, but in the grand scheme of things relying on two's complement is not a big deal.

rurban · on June 2, 2018

from the proposal http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm

"Nowadays Unisys emulates their old architecture using x86 CPUs with attached FPGAs for customers who have legacy applications which they’ve been unable to migrate. These applications are unlikely to be well served by modern C++, signed integers are the least of their problem. Post-modern C++ should focus on serving its existing users well, and incoming users should be blissfully unaware of integer esoterica."

8_hours_ago · on May 30, 2018

Presumably the compiler team wrote a test for that behavior so there won’t be a regression. I assume that Windows isn’t planning on switching compilers, so they can rely on that behavior as long as the Microsoft compiler passes all its tests.

xvilka · on May 31, 2018

Well, MSVC team is the last one who can propose something since they didn't even support C11 completely today, and weren't able to add C99 support for almost two decades.

nwellnhof · on May 31, 2018

They still don't support C99 fully.

pjmlp · on May 31, 2018

They do, to the extent required by ANSI C++14, ANSI C++17 upgraded the requirements to C11.

They are pretty open the future of systems programming on Windows is C++ and .NET Native.

For C devs, they have helped clang to work on Windows and there is WSL for UNIX like software.

Just like they have improved Visual Studio integration with clang and gcc.

As for Visual C++, the name says it all.

pjmlp · on May 30, 2018

Now that Microsoft is having again a new found love in C, either via Checked C research, the UDMF rewrite in C, WSL, or the upcoming Azure Sphere product, it would be interesting if they could contribute to improving the whole security story in C2X.

slededit · on May 30, 2018

I don't think they are really that much in love with C aside from a few niche areas. The compiler itself still generates worse code when in C mode as it only exists for back-compat. When I worked there it was explicitly said that the new "universal" (or whatever they are called now) APIs were explicitly not targeting C. C++ on the other hand seemed to be undergoing a new renaissance. They have added some newer C features but only those required by the C++ standard.

pjmlp · on May 30, 2018

Quite true, when speaking about Visual C++ and I have also stated the same thing multiple times here.

Their C love on the projects I mentioned, is via clang and gcc.

minipci1321 · on May 30, 2018

> Problem solved.

This "solution" also prevents code analysis tools from detecting reads from padding bytes as "non-initialized memory reads".

When a routine leaves sensitive data on the stack, the entire area used up by that sensitive data, must be wiped out before the routine returns (and still, wiping out padding bytes would not be required). How about the part of the stack space not covered by that array of 11 elements?

BeeOnRope · on May 30, 2018

The kernel has a separate stack, inaccessible to user-space.

Otherwise, you'd be right: a shared stack would be a giant source of information leakage from kernel space to user-space unless it was very carefully managed (probably at a significant performance cost). Thus, separate stacks (it also has the advantage of not needing to make assumptions about how user-mode programs use their stack, e.g., if they are transiently using "unallocated" stack above rsp, etc).

Probably what happened here is that this structure was copied back to user space (e.g., as the result of a system call) exposing the kernel data.

cryptonector · on May 30, 2018

At Sun Solaris and Studio were quite separated organizationally (and business-wise too). Mind you, the Solaris engineering group had early access to Studio and also, of course, access to the Studio team, so... not quite joined at the hip, but not too separated either.

nwellnhof · on May 31, 2018

In C11, the spec was changed to make zeroing out padding mandatory.

oblong · on May 30, 2018

> This UB was leading to disclosure of little bits of kernel memory back into user mode because Windows engineers assumed that = { 0 } was the same as leaving the variable uninitialized and then memsetting the whole thing to zero

But what on earth were they doing with the padding bits?

Gibbon1 · on May 30, 2018

They aren't doing anything with the padding bytes. But what probably happened was they copied the array into userland memory. Which potentially allows a malicious sandboxed program to read the padding bytes that contain bits of kernal memory.

kabdib · on May 30, 2018

Total guess, but they could have been including them in a hash function or struct equality check (equality with memcmp, hash by just grabbing bytes, etc.).

That would not have gone well with uninitialized padding :-)

buckminster · on May 30, 2018

It's not uncommon for a function to use a supplied output buffer as scratch space. So the padding could have contained pretty much anything.

captain_perl · on May 31, 2018

I'm skeptical that it was implemented that way because of kernel memory leaking concerns.

More likely it was done to prevent Windows API calls from panicking when they accessed unset parameter structures.

As a former Windows programmer, that was one of the largest sources of errors back in the Win32 days.

jclulow · on May 31, 2018

But you don't access the padding; it's generally implicitly added by the compiler to comply with the ABI rules for the platform. That argument might make sense if this was about zeroing _all_ stack objects not explicitly initialised, but it's explicitly talking about the padding.

skookumchuck · on May 30, 2018

> This UB was leading to disclosure of little bits of kernel memory back into user mode

If you write inline assembler, you can access this stuff anyway. So I'm not seeing what the value is in zeroing it by the caller. The kernel callee should zero its stack frame before returning.

BeeOnRope · on May 30, 2018

How so? The kernel presumably has a separate stack which is not accessible to user-space, but here information was disclosed because a structure copied back to user space was create on the stack, initialized to {0} and then member-wise assigned, with some padding bytes never being touched and thus containing whatever previous values happened to be on the kernel stack. So far this is all in kernel-space so nothing been exposed yet.

Then, however, if this structure is copied back to user-space, e.g,. as an output parameter of a kernel call, the padding bytes with the exposed data will be copied along with it (unless you get lucky and the copy routine happens to make the exact same decision with regard to padding handling).

If the kernel stack _itself_ was visible to user-space, you'd have a whole separate set of problems: you'd have to zero the whole stack (or at least the extent of the stack that could have been touched) on every kernel call.

skookumchuck · on May 31, 2018

Yes, you're right.

notacoward · on May 30, 2018

Like phkahler, I don't want C to grow any more. I've been a C programmer for a long time, the vast majority of my day-to-day work is still in a C codebase, and I expect to continue working in C for a while yet. Nonetheless, its time has passed. It will be around for a while, just like FORTRAN and COBOL are, but there's no good reason for new code to be written at that poor level of abstraction. Even for systems software - what I write - there are always better choices that provide higher-level data and control structures. They variously use garbage collection, reference counting, ownership rules, or whatever you call that hot mess C++ has. Writing safe, secure C code is certainly possible, but it's too much unnecessary work - especially in the concurrent and/or parallel world that any non-trivial code has to live in nowadays.

That said, I really wish proponents of other languages would get their stuff together about creating libraries that can be used from other languages. A library written in C can be used by anyone else. Many other languages are avid consumers of this functionality, many advertise it as a key feature, but very few return the favor by producing reusable code. The industry doesn't need such Balkanization. If you're one of the very many people who look down their noses at C and want to get rid of it, do your part.

WalterBright · on May 30, 2018

> I really wish proponents of other languages

Your wish is my command! That's the purpose of D in betterC mode. You can draw an arbitrary line through your C project, and implement one side in DasBetterC, and it'll work just fine.

In fact, I've been considering reimplementing the C standard library for the Digital Mars C compiler in D!

notacoward · on May 31, 2018

Thank you, Walter. For everything.

WalterBright · on May 31, 2018

welcs!

kthielen · on May 30, 2018

> That said, I really wish proponents of other languages would get their > stuff together about creating libraries that can be used from other languages. > A library written in C can be used by anyone else. Many other languages are > avid consumers of this functionality, many advertise it as a key feature, but > very few return the favor by producing reusable code.

I kind of agree with you on this. I have a bidirectional binding process in my hobbes project at Morgan Stanley (https://github.com/Morgan-Stanley/hobbes) such that C/C++ functions can be viewed as hobbes functions and hobbes functions can be viewed as C/C++ functions (without translation).

This works as far as C/C++ types are adequate for your programming language, which obviously does cover a huge space, but there are types that don't translate well and there are staging considerations that don't translate either (e.g. there is a logical type that can be decided, but it's not decided until what might be regarded as "run time").

There are interesting problems to be considered in this area though, happy to discuss further out of band if anyone is interested (discussions on hacker news tend to be short-lived and shallow IME).

blobjectivism · on May 30, 2018

>A library written in C can be used by anyone else.

This is because of how much of the unix clone ecosystem has been built around C workflows, and this wasn't true on Windows until linux compatibility was developed on it.

>If you're one of the very many people who look down their noses at C and want to get rid of it, do your part.

convincing linus and sysadmin greybeards to modernize linux and is no small task, until then we'll all still be just be scripting over archaic C apis

"science progresses one funeral at a time"

notacoward · on May 30, 2018

> This is because of how much of the unix clone ecosystem has been built around C workflows

Absolutely correct.

> convincing linus and sysadmin greybeards to modernize linux

That's not what I'm suggesting. Anything that's already written in C can and probably should continue to be so. What I'm suggesting is that people who prefer to work in other languages should have an easy way to make their work available beyond their own language community. I'm not talking about interpreted/scripting languages here. I'm talking about compiled/systems languages. Stuff that gets linked together, or that should be able to use some sort of dlopen/FFI back and forth fluidly despite multiple languages being involved. There's some work to be done there, but everyone seems to prefer hiding in their own language bunker instead of reaching out to others.

slededit · on May 30, 2018

The problem is that other languages necessarily place more restrictions on how their data can be used in order to gain all their nice features. Since you can't control the caller you lose all those guarantees. Because of that it will never be easy to interop across higher level languages. C works well here because its low level enough that it expects few guarantees. Just keep the stack aligned and balanced and it will mostly be happy.

pjmlp · on May 30, 2018

There are solutions, but only at platform level.

JVM, CLR, COM, UWP, DEX, TIMI, ILE are all possible approaches with various degrees of success and most of them also have C implementations available.

It is almost impossible to get some kind universal ABI between OSes and languages without an extra level of indirection.

user5994461 · on May 30, 2018

>>> This is because of how much of the unix clone ecosystem has been built around C workflows, and this wasn't true on Windows until linux compatibility was developed on it.

Windows is almost entirely coded in C. The Windows C API is one of the most depended upon API with the largest codebase and the longest documentation in existence.

pjmlp · on May 30, 2018

Kind of.

After Longhorn's failure, they started to focus on COM to be the future of Windows APIs, leading to UWP.

And if masochists can implement COM in C, they need to be quite pain resistant for the UWP update.

In the meantime, the new C runtime was rewritten in C++, using extern "C" entry points.

Windows drivers can be written in C++ since Windows 8, and since then they have been migrating the code to be compilable as the C subset from C++ as well.

user5994461 · on May 30, 2018

I meant the base Windows API. The MFC and COM were built to abstract it but never really took off.

The low level API have been fixed for a very very long time. It doesn't really need to change, opening a file is not different now than a decade ago. It gets no love and no popularity, but it runs the world.

Most development is done in higher languages nowadays, typically C#, java or python and all these languages rely on the Windows API to run. They are abstractions of C.

P.S. Drivers could already be written in C++ since Windows XP. There were a lot of constraints though because of running in the kernel.

pjmlp · on May 31, 2018

> The MFC and COM were built to abstract it but never really took off.

MFC was the way to write real Windows applications until .NET came around.

Visual Basic was mostly used by not so skilled developers, creating applications that IT department had to take care of, sometimes rewriting them into MFC ones.

Delphi and C++ Builder were the only solid alternatives outside Microsoft world, but thanks to the way they drove prices upwards and the identity crisis from Borland, most developers went away to Microsoft products, as the platform is always a safer better for development tools.

As I mentioned, since Windows Vista all new API are COM, and UWP, which is the future of the platform is COM as well.

UWP is what .NET was supposed to be initially, COM+ Runtime, just that UWP uses .NET metadata instead of COM type libraries and allows for real instances, not only interfaces.

COM is a first class type in .NET given its original design, many of the .NET APIs are COM objects underneath, including the whole CLR native APIs.

Drivers written in C++ on XP only by adventurous coders with their own compilers or hacks somehow, as Visual C++ only supports kernel mode since Windows 8.

> It doesn't really need to change, opening a file is not different now than a decade ago

Actually it has changed.

OpenFile() has been deprecated and replaced by CreateFile(), which was superseded by CreateFile2().

And on Windows 10, CreateFileFromApp() and CreateFile2FromApp() should be used instead, otherwise the application won't run from the store.

And if you want to use any of the goodies from Windows 8 onwards, they are only available as UWP APIs.

YouAreGreat · on May 31, 2018

> on Windows 10, CreateFileFromApp() and CreateFile2FromApp() should be used instead, otherwise the application won't run from the store.

So?

If you dislike the app store monopolies, it's stupid to herd your users into the store. Tell your users that your app is too sophisticated to run in dummy mode. Bonus: Sell to all the Win7 holdouts.

If you love app stores (monopoly rents be damned), you rank the store ecosystems and then never write a Windows version.

pjmlp · on May 31, 2018

You missed the part that given how things turned out, thanks to Project Centennial, there is an ongoing process to bring store containers to whole user space, MSIX.

So regardless of Win32 or UWP, everything will eventually be sandboxed.

Windows 7 is the new XP.

YouAreGreat · on May 31, 2018

> Windows 7 is the new XP

Win7 is a focal point of some importance: It's where developers who don't want to be pushed into app store indentured servitude join up with the Windows compatible systems, Wine and ReactOS.

Everything before Win7 is not a focal point because it's too old. Everything after Win7 is not a focal point because nobody can handle the update threadmill. By elimination, it's Win7.

Win7 is not the new XP; it's the future of stability-oriented Windows compatible computing.

user5994461 · on May 31, 2018

CreateFile() is not deprecated. https://msdn.microsoft.com/en-us/library/windows/desktop/aa3...

OpenFile() could be considered deprecated but the official documentation doesn't mark it as deprecated.

You missed the point that these API have been stable for decades and software will continue to work without any change.

pjmlp · on May 31, 2018

> CreateFile() is not deprecated. https://msdn.microsoft.com/en-us/library/windows/desktop/aa3....

"OpenFile() has been deprecated and replaced by CreateFile(), which was superseded by CreateFile2()."

I never said CreateFile() was deprecated. Superseded is not the same thing.

> OpenFile() could be considered deprecated but the official documentation doesn't mark it as deprecated.

"Note This function has limited capabilities and is not recommended. For new application development, use the CreateFile function."

Looks pretty much an euphemism for deprecated.

> You missed the point that these API have been stable for decades and software will continue to work without any change.

Not when Windows puts a container around what they can see.

matheusmoreira · on May 31, 2018

You don't need anybody's permission to do this. Linux has a language-agnostic system call interface. Everything that happens on the computer is accomplished through that interface, and to use it all you need to do is put some values in specific registers and issue a special instruction. This isn't the domain of C; the JIT compiler could have a special system call generation feature. You could get rid of GNU and write your own user space in Lisp if you wanted to.

walterbell · on May 31, 2018

Or in Go: https://github.com/google/gvisor

pizlonator · on May 31, 2018

> This is because of how much of the unix clone ecosystem has been built around C workflows, and this wasn't true on Windows until linux compatibility was developed on it.

Because C (and its close relatives like C++) is still the only language suitable for systems development.

Just booting any other language on bare metal is enough of an achievement that people loudly brag when they get it to just barely work. If you get it to work in C then nobody is surprised, because C is engineered for systems. Other languages aren't.

> convincing linus and sysadmin greybeards to modernize linux and is no small task, until then we'll all still be just be scripting over archaic C apis

I'm not sure replacing C with a worse language constitutes "modernizing".

sureaboutthis · on May 31, 2018

If there were better options, you and others would use them. You don't use anything else so C is the best you've got.

Most of your last paragraph only amplifies what I just said. If there was another language that bested C, people would use it, but they don't.

buserror · on May 30, 2018

I also think that the committee is out of touch. C99 was an awesome improvement on the language, and since then, it went downhill. We don't need the extra weird C11 syntax things, duplicating of existing libraries; we want tools that scope C better, or extensions that have been proven helpful and stable (one such example is the switch(x) { case A...B: } from gcc!).

I want strict boundary checking, I want an array base type that can't be cast as a pointer. I want some sort of scoping mechanism (ie, blocks), I want a bit of standardisation of memory barrier and such. I want #pragma once FFS -- it was proven a good idea for 25 years.

Basically there's tons of stuff that could help make the language better -- C99 did that; C99 is a masterpiece for example on how you can statically initialise extremely complex data into a single block, without having to use code. It's used all over the Linux kernel (amongst other thing; for example my own simavr is heavily based on that feature [0]).

* Standardise the stupid bit order in bitfield declaration FFS. I've been wanting to use that feature for 30 years and I can't because they 'forgot' to make up their mind!

* Coroutine standardisation would be awesome (stack swap primitives, with boundary checks etc)....

* gcc 'sub functions' (or a derivative) would be awesome if improved to make them safe.

* Reference counted allocator (basically, get libtalloc and roll it in [1])

There are so many things that could be improved, without diverging into weird stuff nobody needs (complex math anyone??!?!).

* In fact I want SIMD. I don't need these complex types.

[0]: https://github.com/buserror/simavr/blob/master/simavr/cores/...

[1]: https://talloc.samba.org/talloc/doc/html/index.html

maxlybbert · on May 30, 2018

I can’t say much for most of your wish list, but I believe C11’s thread model/atomic operations include a standard memory barrier ( http://en.cppreference.com/w/c/atomic/atomic_thread_fence ).

xtrapolate · on May 31, 2018

Honest question. You keep expecting all of these to be readily available for you in C (as part of the standard). Why don't you, instead, just a use a language/ecosystem which already offers all (or most) of these for you today? (ie. D/Go/Rust/Nim)

> "* Reference counted allocator (basically, get libtalloc and roll it in [1])"

Why and how should that be standardised exactly? Memory allocation is platform-dependent, hardware-dependent and generally case-specific. malloc() and free() are the lowest common denominators the standard can assume, anything beyond that is simply restrictive. If you need a "reference counted allocator", why not just find/implement one that simply suits your needs?

> "* Coroutine standardisation would be awesome (stack swap primitives, with boundary checks etc)...."

Again, what makes you think this can be standardised across the infinite span of platforms and compilation-targets, where C is often used?

> "* In fact I want SIMD. I don't need these complex types."

I'm not following your point. You're simply asking for a better abstraction for SIMD. Also, as I'm sure you're well aware, SIMD is not available everywhere. Wherever available, you have clear instruction-set APIs/ABIs you need to follow to make it work. What else is missing?

buserror · on May 31, 2018

I don't see your point. There's tons of stuff in C11 for example that is not applicable to a vast majority of where C is used. Even in C99 basic stuff as 'floating point' or 'malloc' is not available on many hardware, that doesn't stop having a standard way of using them /when applicable/.

I know there are traps to fall into -- when I see people writing floating point code on an 8 bit AVR, I cringe, but well, 'it works'.

As far as changing language, you just answered your own question by mentioning 4 of the myriad of them that aren't ported on as many platform as C, requires runtime of unknown quality, and also requires a body of developers that... doesn't exist.

I've had a long enough time in the industry to have seen quite a few times a whole bunch of software done by someone who was following the fancy trendy language of the day, and required a complete rewrite in... C to be able to move on from it.

Heck, I've done similarly as well, done 20+ years of C++, gradually trying to scope down the subset of what I was using to then realize I just might be better of with plain C -- and magic happened -- stuff still compile/work years after they were made... And anyone/everyone can just dive in and use the codebase.

pjmlp · on May 30, 2018

Looking at C2X list, I bet you aren't getting any of those wishes.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/PreBrno2018....

I am with you on strict boundary checking and array base types.

enriquto · on May 31, 2018

> weird stuff nobody needs (complex math anyone??!?!).

in my work (signal processing) I use complex math in C every day, and native language support for complex numbers is the single most important thing in my choice of C over other languages

> I want an array base type that can't be cast as a pointer

just out of curiosity? why would you like that?

I mostly agree with the spirit of your comments (apart from neglecting the importance of complex math). I would add the following:

* better stack support (e.g. a standard way know whether a VLA fits)

* closures (being able to return a pointer to a local function)

buserror · on May 31, 2018

> in my work (signal processing) I use complex math in C every day, and native language support for complex numbers is the single most important thing in my choice of C over other languages

Fair enough -- but I think you are in a fairly corner market -- personally I think I used complex math /twice/ over the last 30+ years... I used SIMD extensively tho, from Altivec onward!

>> I want an array base type that can't be cast as a pointer

> just out of curiosity? why would you like that?

For bound checking. In C right now if you declare a char blob[4] and pass it as parameter to anything, it's passed down as a char * mostly -- and that function can clobber whatever it likes. An array type would propagate the size down so bound checking can be done at for the whole lifetime of the array.

> * better stack support (e.g. a standard way know whether a VLA fits)

Yes, anything involving stack requires serious dirty hacking -- even for basic things as 'what are my high/low water marks'. Basically your stack is just a problem waiting to explode in your face, especially on smaller devices.

> * closures (being able to return a pointer to a local function)

That's mostly what I meant by sub-functions -- I used them quite a bit, until I had to crossport to llvm (they refused to implement them). Apple "blocks" look pretty good, but I know there are a couple alternative implementations...

enriquto · on May 31, 2018

> Fair enough -- but I think you are in a fairly corner market -- personally I think I used complex math /twice/ over the last 30+ years... I used SIMD extensively tho, from Altivec onward!

One of the best ways to implement SIMD support for C would be to add native quaternion and octonion support to the language!

a1369209993 · on June 1, 2018

For addition and subtraction, that sounds like one of the most evil-assembly-hack tricks I've heard for a high (-er than assembly) level language, but how would you do ... pretty much everything but addition and subtraction?

enriquto · on June 1, 2018

there are a lot of operators in the C language that could be overloaded as SIMD operations when acting over quaternionic and octonionic values. This would be useful for the most common operations. For the rest, there would be functions with well-thougt names (not the current avx monstrosities), defined on #include <octonion.h>

maxlybbert · on May 30, 2018

I found the title a little misleading: there isn’t much about where the standard is heading or even where it’s been.

I would vote for the title “a rant on undefined behavior in C.”

——

Simple example: the article complains that unsigned integer overflow is defined in C while signed integer overflow is not. There is very little in the article about this except for the claim that the performance for incrementing a signed int should match the performance for incrementing an unsigned int. The writer refuses to believe otherwise, even though he accepts that undefined behavior “supposedly” allows the compiler to omit overflow checks.

It’s the “supposedly” that makes this a rant. The article’s sources mention that Clang does omit overflow checks and that the Clang team believes this makes loops up to 20% faster (“up to” because the optimization can’t be applied to all loops, and the performance increase will depend on how tight the loop is, i.e., how much overhead there is in incrementing and testing the loop variable in comparison to the loop body).

xorblurb · on May 30, 2018

(Technically it is not really an overflow check the problem in that case, but typically more the need of extending an index from 32 to 64 bits because the instruction set in 64 bits do not support indexing with 32 bits index)

If you want perf in a critical tight loop that has been identified by some profiling, you can easily optimize it yourself (and yes, typically bumping the counter type to size_t / ptrdiff_t is enough to optimize, but the advantage you have is that you can actually check that this transformation is sound according of the intent of the programmer of the original code, whether in the context of the C programming language, the compiler doesn't even try to check that itself, it merely blindly makes the hypothesis that there is absolutely no UB ever, and to hell if there actually was)

But anyway we have since invented sane languages in which we BOTH have safety, and the capability to apply that kind of transformations that have a small but positive perf impact. In some cases it is actually way easier to optimize using those approaches from safe languages than from mostly unchecked ones culturally full of type punning and other kind of insanity (like C is), and this is not even a recent discovery: there is a reason for why number crunching stuck to Fortran. So C should be kind of considered as a legacy programming language, at this point. A very important one for historical reasons, but one should think twice before writing new critical infrastructure with it...

loup-vaillant · on May 30, 2018

> the article complains that unsigned integer overflow is defined in C while signed integer overflow is not.

The article complains that signed integer overflow is undefined, while unsigned integer overflow is not. There's a difference.

> the Clang team believes this makes loops up to 20% faster

I'd love to see the benchmarks they conducted. Anyway, undefined behaviour was the wrong way to solve the problem. The C language should have grown a `for` loop that's more than a thin veneer of syntax sugar over `while`. Seriously, how hard would the following be?

  for (int i = start; end)) {
      // body
  }

No explicit comparison, evaluate `start` and `end` at the start of the loop, maybe cast them to the type of i, and increment implicitly. While loops can handle the weird cases. There you have your 20%.

maxlybbert · on May 30, 2018

> > the article complains that unsigned integer overflow is defined in C while signed integer overflow is not.

> The article complains that signed integer overflow is undefined, while unsigned integer overflow is not. There's a difference.

You are correct.

> > the Clang team believes this makes loops up to 20% faster

> I'd love to see the benchmarks they conducted.

It wouldn’t surprise me if they have micro benchmarks. I convinced myself they were telling the truth based on crude instruction counting: a loop must be converted into something like:

- loop body (assume it contains at least one machine instruction)

- increment loop variable

- (for unsigned): clear overflow flag

- test loop variable, exit loop if appropriate

- goto beginning of loop

I believe they don’t have to actually check the overflow flag, it’s OK to let the overflow happen. But they do have to clear the flag to avoid a spurious error if the flag gets looked at later.

I’m no expert, so it’s possible this oversimplifies things, but removing the instruction to clear the flag does remove a big chunk of this loop. But it’s a big chunk only because it’s a tight loop.

For the record, I happily use the foreach loop constructs in C++, D, Java, C#, Python, Perl, etc. but I originally avoided them until I saw a comment by Walter Bright that there is no performance penalty in D (the compiler rewrites the loop appropriately; there may be a penalty in Java because the feature might be defined in terms of their relatively heavy iterators).

xorblurb · on May 30, 2018

Microbenchmark can be very misleading compared to real impact in real programs. Still, the gains allowed by UB of signed overflow (when you are lucky enough that this transformation is actually correct in the context of what the original programmer had in mind...) are positive and probably measurable even in real programs, or if hardly measurable, maybe they at least permit a few percent of whole system perf improvement when using SMT processors. But they are more suited to other programming language than C, and actually yes, in C++ (and probably in most languages at this point) it is better both of code readability (most important!) and performance (nice to have, but very secondary compared to code readability) to use for each constructs compared to maintaining an index yourself.

Technically there is no overflow flag to reset, it is just that some CPU instruction sets do not support indexing with a 32 bit register when using 64 bits addressing, so you have to insert an extra sign extend instruction if you want to support 2s-complement signed overflow on 32 bits indexes. So you typically already don't have any cost if your indexes are already size_t/ptrdiff_t, but ptrdiff_t signed overflow is still UB according to the C standard, which is also a shame, because it allows for far less interesting "optimizations" at this point (maybe a + w >= a --> true if w is positive, but that's actually typically dangerous, because that was historically what was used to check for overflow at source level, and now the compiler is suppressing all the checks!)

So all of that really only are trade-offs, and in the modern age (with e.g. a security picture that is kind of worrying, etc.) some people are arguing that this was a terrible idea to use this approach so carelessly, in their opinion. Most experts now think that no non-trivial codebase exist with no potential UB in it, so it is not just rants all around, some even are working on the mathematical model of the llvm optimizer to make it actually sound (for now even internally, it seems that it is not -- so unfortunately with this approach of optimisation for now there is no mathematical justification as for why the optimizations performed are actually correct even with the hypothesis of strict conformance to the C standard, so I let you imagine what happens in practice when almost no program is actually conforming...)

maxlybbert · on May 31, 2018

If there are microbenchmarks, I didn’t write them. And I’ll acknowledge that my instruction-counting approach has limits, especially since I don’t really know the details of the platform. And my approach also doesn’t account for pipelining.

But I would expect someone complaining about this optimization to do more than simply hand wave with a “supposedly.” They could instead say that the optimization can be applied when the compiler can prove x < x + 1, which it can show when both the beginning and end of the loop are known at compile time. In fact, I think it’s better to say “omit the pessimization that applies when the compiler has to allow for overflow.”

But going no farther than labeling it a “supposed optimization” turns the complaint into a standard rant.

vyodaiken · on May 30, 2018

I've done benchmarks and see no performance improvement with a small loop body. If there is a minor improvement on a trivial loop - that's not much gain.

Try it yourself.

maxlybbert · on May 31, 2018

I forgot to do this last night. I think the compiler is able to apply the optimization in many cases even if the loop variable is unsigned, so it’s more accurate to say that a pessimization is added when the compiler has to account for overflow.

If the rant had said the pessimization wasn’t all that common, or that a decent programmer should ask “how can I go through this loop fewer times?” before asking “how can I speed up each time through the loop?” I wouldn’t consider it a rant. As it is, the writer acknowledges that the other side has an argument, and then he dismisses it with a hand wave and a “supposedly.”

vyodaiken · on May 31, 2018

So "rant" is an unnecessary pejorative. Try to make your case without yelling. And I don't get your point. You have just agreed that the compiler can optimize these loops without UB. The pessimization is minor and programmer choice. So what is the argument on the other side ? How do you justify a code transformation that, e.g. produces an infinite loop by removing a programmer check for a condition that actually can occur?

maxlybbert · on June 1, 2018

If I claim somebody is ranting and raving, I’m not claiming anything about whether I’m yelling, but whether they are.

Of course, “rant” has more than one meaning. I did not mean “the writer should be held in a mental institution as a threat to themself or others,” or even “writer should feel embarrassed and ashamed for publishing this.” I meant “article makes statements that generally aren’t supported by valid arguments.”

I then gave an example that (1) was not the only case of a supporting argument not actually supporting the claim, and (2) perhaps wasn’t completely fair. But first, a story:

Recently, one of my sons had diarrhea. He complained that his sister made him sick, since she had just recovered from a cold. He never understood why we rejected that argument (yes, people with colds can give other people colds, but he didn’t have a cold), and then told him he didn’t have anything to support his claim. As far as he was concerned, he had made something that looked like a valid argument: his sister had a cold, now he was sick, q.e.d.

So when I said that the statement “‘supposed’ optimization” isn’t really an argument, but a statement that “there are people who disagree with me, but they’re wrong,” which isn’t a supporting argument — if anything, it’s a placeholder for a supporting argument — I forgot that the essay does advance one reason to reject the “‘supposed’ optimization,” but I rejected this reason (the issue about whether it’s slower to increment a value that has defined overflow than one that does not isn’t a question of whether the two values are the same size), and then said the statement had no support.

It is true that at this point, I can provide other reasons to reject the compiler writers’ argument, but then the whole exercise feels like a psychic reading (writer: “I feel like blue will be important for my next argument, can you tell me why?” reader: “wow, that’s uncanny; blue is important ...”).

maxlybbert · on May 31, 2018

I don’t think that saying “the article is a rant because it doesn’t make a serious effort to respond to counterguments” qualifies as yelling.

As for my point: in my opinion, the title promised something interesting, but the article didn’t deliver. It acknowledged that the standard committee and compiler writers disagree with its premise, but then simply asserts that the committee and compiler writers are wrong. I put it at DH3 ( http://www.paulgraham.com/disagree.html ).

justboxing · on May 30, 2018

Direct link to PDF: http://www.yodaiken.com/wp-content/uploads/2018/05/ub-1.pdf

jpfr · on May 30, 2018

You are invited to bring forward concrete proposals for changes to the C standard in the relevant committee.

http://www.open-std.org/jtc1/sc22/wg14/

Since UB allows the compiler to do anything in those situations, we can reduce the amount of UB without breaking existing “legal” code. By actually defining more behavior.

If a proposal is reasonable needs to be discussed in the standardization committee. All other discussions are a nice hobby. But eventually moot.

wolfgke · on May 30, 2018

> You are invited to bring forward concrete proposals for changes to the C standard in the relevant committee.

My proposal: make the future versions of the C standard freely available (not only drafts).

xorblurb · on May 30, 2018

Those kind of discussions have various effects, some of which I believe to be far from moot.

First, they permit that some people even take notice about this situation. Few developers read the standard and even less write it or follow the discussions to change it (are they even open?) or write a compiler for it. The rationales are not even tracked [1]. It actually would be insanely hard to get a good understanding of those subjects by e.g. just reading the standard, without having those kind of discussions on forums typically used by more devs than just a few dozens of compiler writers...

[1]: but while I'm thinking about it, an impressive independent book as been written by Derek Jones: The New C Standard: An Economic and Cultural Commentary http://www.knosof.co.uk/cbook/cbook.html

vyodaiken · on May 31, 2018

Here is a start. https://docs.google.com/document/d/1xouelPcphQ-o7DmdSwz5UcL4...

jpfr · on May 31, 2018

If desired, I can comment your document over a private channel. In that case, please give a short heads-up when ready.

Note that I am not directly affiliated with WG14. So take my comments with a grain of salt.

vyodaiken · on June 8, 2018

i think you should be able to make private comments on that document. Otherwise victor.yodaiken@gmail.com

vyodaiken · on May 30, 2018

I will.

nn3 · on May 30, 2018

My understanding is that a lot of undefinedness in the original c89 standard came from a desire to support non 2 complement machines like Burroughs. Of course in hindsight that was a total mistake. Yes would be a great idea to come up with an modern update to the c standard that is specified on let's say the level of java.

Someone · on May 30, 2018

There’s also the fact that registers may be larger than the numbers stored in them.

The best known example is that of the 80-bit floats of the 8087 FPU, but that’s relatively rare compared to loading integers into larger registers.

For example, if you compile

   if( foo + 10 > bar) goto baz

to assembly similar to

   move foo to R1
   move bar to R2
   add 10 to R1
   subtract R2 from R1
   branch-if-greater-than-zero baz

and foo and bar are 32-bits, that add can overflow if R1 and R2 also are 32-bits, but never overflows if R1 and R2 are 64-bit. That changes the result of the comparison if, for example

   foo = INT_MAX
   bar = INT_MAX

The designers of the CPU with 64-bits registers may not want to add variants of add (and subtract, mult, etc) that work on 32 bits (and 16 bits, and 8 bits).

They also won’t want to have their C compiler slowing down this kind of code, only because other (often relatively old and slow) CPUs exist.

Asooka · on May 30, 2018

But that's trivially solvable (on x86 at least). You do your arithmetic in rax and then compare the result in eax. You've got the "mask the high 32 bits" operation for free. At worst you'll need to do one bitwise and before using the value for cases where the high bits being poisoned matters. Or let me add annotations to arithmetic expressions that "this expression will definitely not need expensive cleaning up afterwards, promise", which I will need to use in maybe a dozen super-hot loops in the whole codebase. FWIW, we compile all our code with -fwrapv and don't notice any slowdown.

BeeOnRope · on May 30, 2018

Of course it's trivially solvable in x86-64 because it has a full complement of 32-bit operations inherited from its x86 lineage. The GP is presumably talking about _other_ architectures when he mentions designs that might not want to add a full complement of 32-bit operations.

In any case, the problem doesn't really apply to x86 because all x86 compilers I'm aware of use the "expected" size for the various types rather than larger-than-needed-for-speed, exactly because the smaller operations are all generally available.

Someone · on May 30, 2018

”But that's trivially solvable (on x86 at least)”

Even if that’s true (I don’t know enough of x86 assembly to judge that, but one thing I can think of that it may require extra register-to-register moves), that it isn’t an issue for one architecture won’t help a committee discussing the C standard.

Also, others in this thread describe what you call “let me annotate” as “forking C”.

rootbear · on May 30, 2018

As someone who was on X3J11 at the time C89 was ratified, I can tell you that supporting 1s-compliment architectures was necessary. The standard would not have been approved if 2s-compliment had been mandated. Standards are all about compromise, even when you know that that will cause trouble later.

ropeadopepope · on May 30, 2018

> I can tell you that supporting 1s-compliment architectures was necessary

Can you elaborate on that point? Why was it necessary then, but not still necessary now?

rootbear · on May 30, 2018

Sorry, I didn't mean to imply that there are no 1s-compliment systems still around. The comment I was replying to suggested it was a mistake to support 1s-compliment architectures because that was the source of some of the Undefined Behavior (UB) that so troubles C. My point was that not supporting 1s-compliment wasn't an option then (and probably is not one now.) After all, without 1s-compliment support, how could I write new C apps for the Apollo Guidance Computer?

I would like to give an example of how supporting both 1s- and 2s-compliment is the source of a specific UB, but I can't take the time to do that right now, regrettably. Similarly, supporting both Big Endian and Little Endian was necessary. As was supporting ASCII, EBCDIC, and probably Fieldata and five level Baudot (looking at you, trigraphs). All of this generality made it hard to say anything useful in some areas, so you end up in some cases just calling it Undefined, since there was no consensus on how it should be defined.

loup-vaillant · on May 30, 2018

> All of this generality made it hard to say anything useful in some areas, so you end up in some cases just calling it Undefined, since there was no consensus on how it should be defined.

This is a very valid justification of implementation defined. Maybe even unspecified. But undefined? How could you justify that?

Even if some platform does not support something at all, and goes bananas whenever you do it (say, signed overflow that traps), what would stopped the standard to say that whether the stuff is undefined or not is platform dependent?

Perhaps the committee didn't anticipate how unreasonable compiler writers turned out to be?

rootbear · on May 30, 2018

I agree that you want as few UBs as possible. I found the text below from the Rationale useful for understanding these issues. An important feature of UB is that it doesn't require the compiler to catch certain things that can be very hard to catch, such as dereferencing NULL pointers. As noted, it also provides opportunities for enhancing the language in ways that don't break conforming programs. An implementation is allowed to treat Undefined Behavior as Implementation Defined, simply be saying what the implementation does in such cases. But it's not required to say what it does.

----

The terms unspecified behavior, undefined behavior, and implementation-defined behavior are used to categorize the result of writing programs whose properties the Standard does not, or cannot, completely describe. The goal of adopting this categorization is to allow a certain variety among implementations which permits quality of implementation to be an active force in the marketplace as well as to allow certain popular extensions, without removing the cachet of conformance to the Standard. Appendix F to the Standard catalogs those behaviors which fall into one of these three categories.

Unspecified behavior gives the implementor some latitude in translating programs. This latitude does not extend as far as failing to translate the program.

Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior.

Implementation-defined behavior gives an implementor the freedom to choose the appropriate approach, but requires that this choice be explained to the user. Behaviors designated as implementation-defined are generally those in which a user could make meaningful coding decisions based on the implementation definition. Implementors should bear in mind this criterion when deciding how extensive an implementation definition ought to be. As with unspecified behavior, simply failing to translate the source containing the implementation-defined behavior is not an adequate response.

makomk · on May 30, 2018

If I remember correctly, one of the things that's likely changing in the next version of the C standard is that arithmetic will be required to be 2s-compliment. There just aren't enough machines around which use anything else anymore.

jonny_eh · on May 30, 2018

I suppose you both can be right. It could have been required to support 1s-compliment systems at the time, but also an unfortunate mistake in retrospect.

insulanus · on May 30, 2018

When issues like this come up, I think the right outcome is to support the popular choice as the default, and support the unpopular choice, but with a penalty.

For example, forcing extra code to be inserted, that is, a check (or even a call out to a library), so one unified semantic is followed, and there is no undefined behaviour. Of course, you wouldn't mandate how the semantics are followed.

The problem is, it was very hard for us to accept this kind of solution when we grew up feeling like every cycle counted, so that's what I blame for a lot of the UB we see today in the standard.

xorblurb · on May 30, 2018

Could have been (should, in some -most?- cases) implementation defined.

raverbashing · on May 30, 2018

It's really annoying

It seems the committee is more interested in adding optimization gotchas (that nobody cares about for the most part) while making it impossible to write correct code without relying on convoluted code (which denies the benefit of "extra optimization") and "UBing everything"

C is broken, let's face it

- Builtin strings are a joke. stdlib functions are even worse.

- It was not developed with modern systems in mind.

- DIY Memory management: malloc/free is like giving a kid a chainsaw to play with. Sure, every C programmer had to work out with this crap. But yeah, please complain how my static void * should really be a volatile char * . Idiot. (of course it's not wrong to complain, but that's like complaining about a faulty blinker in a car without brakes). And of course the compiler is going to ignore the 'volatile' part of the pointer because f. you and item 7c of paragraph 3 of the spec lets us do that, even though it is blatantly stupid to do so.

Rust is a step in the right direction.

Making string and (memory) slices a fundamental part of the language helps. You can have null-terminated memory pointers for interoperability, but having it as a fundamental construct eliminates several problems. Also greenlets/threads/multiprocesses.

pjmlp · on May 30, 2018

C was broken from the beginning if you go search what compiler researchers from Algol school of languages had to say about it during the early 80's.

"Oh, it was quite a while ago. I kind of stopped when C came out. That was a big blow. We were making so much good progress on optimizations and transformations. We were getting rid of just one nice problem after another. When C came out, at one of the SIGPLAN compiler conferences, there was a debate between Steve Johnson from Bell Labs, who was supporting C, and one of our people, Bill Harrison, who was working on a project that I had at that time supporting automatic optimization...The nubbin of the debate was Steve's defense of not having to build optimizers anymore because the programmer would take care of it. That it was really a programmer's issue....

Seibel: Do you think C is a reasonable language if they had restricted its use to operating-system kernels?

Allen: Oh, yeah. That would have been fine. And, in fact, you need to have something like that, something where experts can really fine-tune without big bottlenecks because those are key problems to solve. By 1960, we had a long list of amazing languages: Lisp, APL, Fortran, COBOL, Algol 60. These are higher-level than C. We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine. This is one of the reasons compilers are ... basically not taught much anymore in the colleges and universities."

-- Fran Allen interview, Excerpted from: Peter Seibel. Coders at Work: Reflections on the Craft of Programming

However, C and UNIX are symbiotic, so there is no way we can get rid of C, while keeping POSIX based OSes around us.

So, it will stay around for many decades until quantum computers or something else eventually takes off.

forgottenpass · on May 31, 2018

>C is broken, let's face it

Face what? Do you think any of this C criticism is novel or all that contentious (beyond the falling-sky attitude) in 2018?

antirez · on May 30, 2018

I've the feeling that only a language fork from the GCC/clang teams could have any chance to success. It looks like the C standard committee is not interested in changing point of view.

pcwalton · on May 30, 2018

GCC and Clang have already essentially forked the C language. Perhaps the most important C project—the Linux kernel—is written in the GNU dialect of C.

pjmlp · on May 30, 2018

Although Google has managed to compile it with clang, as they removed gcc from Android.

pcwalton · on May 30, 2018

Right, that's the point—a useful C compiler nowadays needs to be a GNU C compiler, not a standard C compiler.

pjmlp · on May 30, 2018

To reinforce your point that is also how Microsoft decided to go on Azure Sphere.

https://www.mediatek.com/products/azureSphere/mt3620

mrpippy · on May 30, 2018

And that was a project that took years: https://lwn.net/Articles/734071/

cesarb · on May 31, 2018

And is still not finished. The Linux kernel now requires the "asm goto" extension, which clang doesn't have (see https://lwn.net/Articles/748074/ and https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...).

If you check the GCC version defines (__GNUC__ and friends), you'll see that clang claims to be compatible with GCC 4.2.1, while "asm goto" was introduced later, with GCC 4.5.

The tracking bug for compiling the Linux kernel with clang is https://bugs.llvm.org/show_bug.cgi?id=4068, in case you want to follow their progress.

pjmlp · on May 31, 2018

It is finished on Android's source tree.

https://source.android.com/setup/build/requirements

As of Android 8, only clang is used.

ndesaulniers · on May 30, 2018

Woot, next up is assemble and link without binutils!

posterboy · on May 30, 2018

It's like the word "standard" has meaning, e.g. "not changing". Not even for good reasons? Well "good" is a subjective statement. Objectively, only changes to maintain the standard would qualify, which is a contradiction. I guess, adding to the standard library wouldn't be a problem, but I guess you are rather leaning on security concerns and all that jazz that comes up when rust is discussed. C is open as can be but you can't tack on a change of the semantics without changing the language. Then, what do you still need C for?

phkahler · on May 30, 2018

I want C to not grow any more. The problem with having groups to oversee things is that they usually feel the need to change or grow the thing they're supposed to be watching. Sometimes this is good, but often it's not. C is the foundation of a lot of stuff, and it'd be better to rewrite that stuff in a better language than to try and turn C into something it's not - that's what C++ was for.

jandrese · on May 30, 2018

I'm not too worried about C growing. It seems well and truly stuck in the 70s.

All the growth is from people doing whatever they want on top of C.

_blrj · on May 30, 2018

Well, you're in luck. A lot of people don't care to use anything beyond C89.

iforgotpassword · on May 30, 2018

Oh please. I've been using C99 for almost 5 years now.

jschwartzi · on May 30, 2018

I would never go back. And the amount of campaigning I did at my last job to use C++11 was because all the new language features to allow compile-time checking helped us not do dumb things at run-time.

Except for the stupid TI compiler, which is still stuck on C++03.

pjmlp · on May 30, 2018

Or C++98, depending on the chip, if some of their docs are still up to date. :\

augustk · on May 30, 2018

I'm one of those. If maximum portability is desired C89 is still king.

pera · on May 30, 2018

Standard C grows at a rate of 1nm per decade... but in any case, I'm confident that your compiler will always let you select the version you like the most.

zzzcpan · on May 30, 2018

But it doesn't grow anymore, C ecosystem is out of anyone's control. The committee is as irrelevant as it can be.

pjmlp · on May 30, 2018

Not when you need to use certified compilers.

armitron · on May 30, 2018

I can't be the only one that 99.9% of the time doesn't care _one iota_ about these mythical optimizations that compilers can introduce by exploiting undefined behavior. I just want to write straightforward, predictable code that tries its best to be safe but for one reason or another have to stick with C.

Is there a guide/reference on how to disable these optimizations in modern compilers? A list of GCC/Clang arguments that disable as much of this as possible would be greatly appreciated. I've seen a lot of posts and articles discussing C undefined behavior but almost nothing describing how to counter it.

pcwalton · on May 30, 2018

Sure. Compile at -O0.

A huge number of seemingly-trivial optimizations depend on assuming that undefined behavior will never happen. The number of optimizations that don't depend on UB in any way is quite small. For example, if you want to get pedantic about it, even automatic promotion of local variables to registers is exploiting undefined behavior—who's to say you didn't have a pointer that just happened to point to one of them?

loup-vaillant · on May 30, 2018

> Sure. Compile at -O0.

Hmm, nope. Sure, it will save you from much UB. But GCC and Clang have idiotic front ends. On purpose. They subscribe to the idea that the optimizer phase should clean up the front end mess. They're not entirely wrong, since it does simplify the front end, and does not really make the rest any more complex. It does make it slower, though.

Long story short, those compilers don't generate reasonable code by default. You need at least -O1.

glandium · on May 30, 2018

I was recently surprised to find that GCC, at -O0, does some level of optimization. At least it does for multiplications. https://godbolt.org/g/ekTkBm

Edit: the irony is that if you do compile with optimizations, it does a different optimization.

umanwizard · on May 31, 2018

What do you mean by "reasonable" ? Can you give an example of code that is compiled in a more obvious, straightforward way with O1 than O0 ?

loup-vaillant · on May 31, 2018

My knowledge of this is only second hand, so I don't have any specific example. I believe Chandler Carruth explains this well on one of his LLVM talks (that is, how and why the front end emits crappy intermediate code, before the optmiser kicks in).

kazinator · on May 30, 2018

Of course you can have a pointer to a local variable, and that variable can still spend portions of its lifetime in a register.

What won't necessarily work as expected is a shadily obtained pointer to a local variable, like by displacing &i to try to point to the same location as &j.

If &j doesn't occur anywhere (or does occur, but in a way that is optimized away), j may not even have a location.

jcranmer · on May 30, 2018

How do you reflect the fact that pulling a pointer to a value "out of thin air" may not make any sense without either banning it entirely or defining some step of the process to be undefined behavior?

esrauch · on May 30, 2018

Most languages have no concept of undefined behavior and they still have compilers and runtimes that can do a huge amount of optimizations.

xenadu02 · on May 30, 2018

Because they define-away the scenarios where UB is an issue for most code. Sometimes they do this by having a GC, but Rust and Swift prove that isn't necessary.

Such languages typically do provide some form of "no promises, here there be dragons" in the form of unsafe blocks or functions. Restricting this region of unsafety is a huge benefit for programmers and compilers. Programmers because relatively few pieces of code need to be carefully reviewed. Compilers because most of the program contains little or no UB.

As a typical example, in C you can alias and type pun. Therefore to avoid all UB the compiler would need to be extremely careful in any function containing a pointer. You could return a reference to a local through a separate opaque function call, or receive multiple pointers all aliasing the same memory. To completely avoid all UB means inserting checks after every potentially mutable machine instruction, or emitting duplicate function bodies that take different paths when the pointers alias; that's assuming the compiler even has enough type information to answer the question!

With typical C#, Rust, Swift, etc you just can't cause those kinds of problems without deliberate subterfuge and use of Unsafe types or blocks.

eddyb · on May 30, 2018

Note that `unsafe` code blocks (or having some unsafe primitives) fundamentally results in some kind of UB in the language as a whole, and you totally can create tons of problems from it, since the rest of the language has invariants it can't itself violate, but the unsafe code can (potentially much easier than C, if there are more invariants) - that is the "UB", and the invariants that can be violated are what the compiler optimizes based on.

Lest we forget, any language with a C FFI capability must have some notion of UB, because the FFI effectively includes the C code in its own semantics (unless fully sandboxed, which may be too expensive to be done).

pjmlp · on May 30, 2018

True, but they are quite easy to track down.

In systems like Unisys ClearPath, you can even configure the system such that only admins can execute applications with unsafe blocks.

In C any line of code, if care is not taken to use the correct compiler flags, can be a possible source of UB.

eddyb · on May 30, 2018

Indeed. Only wanted to point out that their optimizations also rely on UB, so they're bad examples for specifically "optimizing without relying on UB".

umanwizard · on May 31, 2018

> In C any line of code, if care is not taken to use the correct compiler flags, can be a possible source of UB.

What do you mean?

pjmlp · on May 31, 2018

You need to turn on all warnings as errors, static analysers and pedantic modes depending how each compiler allows them.

ANSI C11 has 200 documented cases of UB, and each compiler might have additional cases, are you sure you can know all of them by heart while looking at a random line of C code?

pjmlp · on May 30, 2018

Languages like Ada and Modula-2 were already proving it long time ago.

Or to go even further in the past, NEWP, still alive on Unisys ClearPath mainframes.

eddyb · on May 30, 2018

The languages that you describe must look nothing like C (other than, ironically, syntax), and must have no untyped direct memory access pointer feature at all, which usually means they rely on a GC instead for memory safety.

kazinator · on May 30, 2018

We can easily imagine a language that is exactly like C in every regard, but free of some gratuitous behaviors. All standard-conforming C programs work in this dialect, and a good many others also work and are portable.

For instance, we could have a dialect of C in which this is required to print "0123":

   int i = 0;
   printf("%d%d%d%d\n", i++, i++, i++, i++);

A behavior is gratuitously undefined if it is left that way for no good reason, such that programming language constructs can be undefined simply for having the wrong form. That is to say, the input values are well-defined (i has a good initial value, which we can increment four times), and the individual operations are defined also (i++ is fine by itself). But for no possible value of i is the above printf call correct.

An example of undefined behavior which is not gratuitous is overflow on integer addition. Th expression i + j, where i and j are int, is not ipso facto undefined because of its form. Only for certain combinations of values of its operands is it undefined. We cannot simply banish that without banishing addition, and various ways of making overflow defined have drawbacks, like being expensive (e.g. target machine has no native support for the particular behavior, so extra instructions have to be generated) or super-expensive, with complicated representation and memory management (switching to bignums).

eddyb · on May 30, 2018

It's interesting to see which things Rust has defined where C refused to, such as fixing a strict evaluation order (mostly post-order on the AST), requiring signed integers to be 2's complement or masking shift amounts (`x << n` being `x << (n % bitwidth)`).

Where does C actually gain anything nowadays? Signed integer UB is mostly useful for optimizing misuses of `int` for unsigned values (https://news.ycombinator.com/item?id=17191295), and is evaluation order even relevant anymore? (I don't think clang can pass that information down to LLVM, at all)

The only example I gave which has known drawbacks is the shift one, where modern platforms differ in the behavior, and LLVM will only optimize out the masking of the shift amount on the platforms that have that same behavior in their shift instructions (I think x86, but not ARM).

However, even if you make all of these changes to C, there's still a lot of UB left, in the form of memory accesses, which is much harder to get rid of (see the other comments, some of which mention Rust as well).

kazinator · on May 30, 2018

> I don't think clang can pass that information down to LLVM, at all

I.e. you mean that the compiler front end has to choose an order without knowing which order will be good for LLVM, and there is no information to say "I don't actually need this specific order; pick another one that is better".

pcwalton · on May 31, 2018

I can't think of a case in which changing the evaluation order between sequence points in a way that LLVM can't prove is safe on its own actually improves codegen.

eddyb · on May 31, 2018

Wasn't the ordering thing in C for stack push order in calling conventions? At least that's one theory I've heard.

kazinator · on May 31, 2018

Unspecified evaluation order helps dumb compilers produce better code. Whenever a particular evaluation order is convenient in order fit some canned code generation pattern, like a function call or whatever, that desired evaluation order can just be blindly used, rather than following forced evaluation order (using temporaries to hold the results).

Compilers for languages with strict evaluation order can still perturb evaluation order when it is safe to do so: like when expressions do not have side effects, or have effects that don't mutually interfere and are not external. They can thus avoid or eliminate the temporaries.

C is still being specified like it's 1982 and compilers have to run in a few kilobytes of RAM, in a single pass, and go straight from source to target machine code.

zokier · on May 30, 2018

> and must have no untyped direct memory access pointer feature at all

Can you expand on why you feel this must be the case?

I'm thinking for example deference of pointer could be defined to be equivalent of platform-specific memory load instruction. It wouldn't be memory safe and would segfault like C, but it still wouldn't bring up nasal demons like C UB.

pcwalton · on May 30, 2018

If every source-language load must result in a memory load, then you at the very least killed scalar replacement of aggregates (a critical optimization), and, depending on how you define it, you might have killed register allocation too.

eddyb · on May 30, 2018

You get register allocation back if you deny taking the address of variables declared with the `register` keyword - and we've gone full circle!

kazinator · on May 30, 2018

Taking the addresses of variables defined register is denied; that's the only modern meaning of the specifier.

kazinator · on May 30, 2018

Give me the rule "every source language load of a structure member, array or global variable, or anything loaded indirectly through a pointer results in an actual load" and I will still crank out fast code, by explicitly using local variables for caching the values coming from those places.

pcwalton · on May 30, 2018

You'll also write unmaintainable code. To a large degree, optimizations exist to allow programmers to write maintainable code without sacrificing performance.

kazinator · on May 30, 2018

I will write absolutely clear code this way. I use that style anyway, quite often.

Optimization can work backwards in this regard.

For instance, let's consider CSE (common subexpression elimination). That allows some repetitive code to produce the same results as code that adheres to DRY.

E.g. stupid way to insert into a circular list:

   node->prev = prev;
   node->next = prev->next;
   prev->next->prev = node;
   prev->next = node;

The compiler has no idea whether node and prev are aliased or not. The assignment to node->next might be the same as prev->next, and so prev->next has to be reloaded even though it was just loaded in the previous line.

smart way: get a local variable for prev->next!

   node *next = prev->next;

   node->prev = prev;
   node->next = next;
   next->prev = node;
   prev->next = node;

Bonus: much more readable, lining up in neat columns. Just one arrow in each line. If this doesn't generate at most one load and four stores, the compiler is garbage.

Also note that we can reorder these four assignments in any of the 4! permutations and they produce the same result (unless there is aliasing, which would be unintentional and wrong regardless of the order).

Not the case in the original. For instance, it's important that prev->next is stored in node->next before the assignment to prev->next.

You can shoot yourself in the foot with local variables though. Caching is susceptible to staleness. You have to know when it is legitimate to keep using the cached value and when it must be reloaded or disused.

After our assignment prev->next = node, the next variable no longer represents the value of prev->next. In this case, since we are done, we don't care. If more code followed which still assumed that next is the original successor of prev (now the successor of node), that would be wrong.

pcwalton · on May 31, 2018

Do you also, for example, look up the magic multiplication number in Hacker's Delight every time you want to divide by a constant?

kazinator · on May 31, 2018

That silliness is not comparable to effective use of local variables to both streamline the source code and get better output from the compiler.

By the way, the magic multiplication number can be worked out, because it's just fixed point math. We take the 32:32 fixed point representation of 1 and divide by 17 to get an approximation of 1/17 in 32:32 fixed: 0x100000000 / 17 = 0xF0F0F0F. That's our magic number for dividing 32 bits by 17 by doing a 64 bit multiplication. For instance 90/17 is 90 * 0xF0F0F0F = 0x54B4B4B46. The integer part of this 32:32 fixed point value is in the upper 32 bits which is 5.

There are some subtleties there, plus considerations of whether we want signed or unsigned division. Better let the compiler deal with it. Arithmetic reductions are safe optimizations. It's hard to imagine what you could do wrong so that an arithmetic reduction breaks your code, given that it produces the same result and doesn't interact with some some memory aliasing where the compiler isn't informed about what you're doing.

And by the way, given this code:

  #include <stdio.h>

  struct node {
    struct node *next, *prev;
  };

  void ins_after_a(struct node *prev, struct node *node)
  {
    node->prev = prev;
    node->next = prev->next;
    prev->next->prev = node;
    prev->next = node;
  }

  void ins_after_b(struct node *prev, struct node *node)
  {
    struct node *next = prev->next;
    node->prev = prev;
    node->next = next;
    prev->next = node;
    next->prev = node;
  }

gcc 7.2.0 on Ubuntu 17 generates better code for the cleaner, streamlined second one with the local variable, for exactly the reason I gave. ins_after_a yields 6 movq instructions; in_after_b yields 5.

Better source that is easier to reason about; better machine code: all round win.

  ins_after_a:
  .LFB23:
          .cfi_startproc
          movq  (%rdi), %rax
          movq  %rdi, 8(%rsi)
          movq  %rax, (%rsi)
          movq  (%rdi), %rax  <-- wasteful re-load of (%rdi) due to aliasing suspicion
          movq  %rsi, 8(%rax)
          movq  %rsi, (%rdi)
          ret   
          .cfi_endproc
  .LFE23:
          .size ins_after_a, .-ins_after_a
          .p2align 4,,15
          .globl        ins_after_b
          .type ins_after_b, @function
  ins_after_b:
  .LFB24:
          .cfi_startproc
          movq  (%rdi), %rax
          movq  %rdi, 8(%rsi)
          movq  %rax, (%rsi)
          movq  %rsi, (%rdi)
          movq  %rsi, 8(%rax)
          ret   
          .cfi_endproc

The language could be specified that way (accesses to structs are memory loads) for all I care and that could be helpful.

comex · on May 31, 2018

In other words, TBAA is fundamentally limited in its ability to capture the programmer's expectations about aliasing, because it must assume that any two pointers that happen to have the same type could alias – even if a person reading the code would consider that obviously unreasonable.

In this case, you could avoid the wasteful re-load by marking the function arguments as `restrict`. One of the reasons Rust's memory model is interesting is that the language statically prevents mutable aliasing in most cases, so "restrict" comes for free… well, kind of. (It's actually rather difficult to nail down the precise guarantees, especially when your compiler's backend was originally designed for C.)

kazinator · on May 31, 2018

Ah, but by using restrict I'm saying, "please screw my C program with C99 stuff that potentially introduces undefined behavior".

That's what restrict does: it makes some behaviors undefined and otherwise doesn't change the semantics of the program. It's completely against the grain of moving to a safer language.

Why would I do that, if I can instead beautify the source and machine code while sticking to what was available in C90.

I think restrict is mainly intended as a way of competing against Fortran. If you're processing arrays referenced by pointers, and can get the compiler to believe that they do not overlap, then the compiler can unroll the loops and rearrange the accesses and calculations in the unrolled body. Like it can load four elements from a source arrays, do four calculations, and then store four elements into a target, rather than interleaving. That can be done by vectorized instructions.

Here is where our technique falls short: if we write the loop body with our load-calculate-store style, it cannot be unrolled and vectorized. We have pinned down an exact behavior for all possible cases of aliasing, like self-overlap with a displacement. Vectorizing unrolls do not preserve that behavior.

Manual unrolling isn't attractive because it's a guessing game. A good amount of unrolling on one machine may give the instruction cache indigestion on another machine.

kazinator · on May 30, 2018

Pointers are a red herring; misuses of pointer dereferences are the manageable form of UB that programmers understand well, in reference to memory models that are straightforward. It's all the nasty gratuitous UB that wrecks the C language: behavior that could be well-defined without changing the character of the C language, or making incorrect any existing programs that are correct.

Like UB in the preprocessor. WTF? Processing a bunch of tokens at compile time should be totally safe. But no: if the ## token pasting operator glues together two tokens which do not look like one token, the behavior is undefined.

It worked differently on different compilers 35 years ago and was coded as undefined. It being undefined gave the compiler writers no incentive to fix their implementations to some common behavior (like diagnosis of an invalid token paste).

pcwalton · on May 30, 2018

Right, because those languages are type- and memory-safe. C is not.

vyodaiken · on May 30, 2018

I'd like to see some actual evidence of some significant optimization that depends on UB.

eddyb · on May 30, 2018

See this sibling thread https://news.ycombinator.com/item?id=17189666 - typically anything touching memory needs invariants to be optimized, which in languages that can directly manipulate memory means there's also UB (code that can't be statically proven not to break those invariants).