Hacker News new | past | comments | ask | show | jobs | submit login

It’s been a long time since I worked with C, but my recollection was that a) strict aliasing allows for optimizations that are actually worthwhile, and b) it’s really easy to type pun in a defined way using unions anyway.



Type punning with unions was actually forbidden by C89. You were only ever supposed to read the union member which was last written to. This may have been relaxed in C17; I can only find a draft online, but it allows for type punning in unions as long as the member being read is not longer in size than the member last written to.


What the standard says doesn't really matter. Only what major compilers do matters. GCC has decreed that type punning through unions is supported, therefore it might as well be standard.


IIRC it was supported in C89 and described as implementation-defined and C99 changed the wording and mentions union type punning.


The optimizations can easily be gained back by "manual" loading and storing to temporary local variables.

The classical:

  int foo(float *f, int *x) {
    *x = 2;
    *f = 3.0f;
    return *x; // oh no, without typed based aliasing I have to load x again!
  }
Can obviously be rewritten to:

  int foo(float *f, int *x) {
    int z = 2;
    *x = z;
    *f = 3.0f;
    return z; // ah, thank you programmer, z has not had its address taken, it's obviously 2.
  }


That code does not violate the aliasing rules in any case.

The two functions you wrote are not the same; the first re-reads *x which may return a different value than 2 if *x was modified in between the first and third lines of the function by another thread (or hardware). However, since x is not marked volatile, the compiler will usually optimize the first function to behave the same as the second.


> strict aliasing allows for optimizations that are actually worthwhile

I don't think there are many sensible, real world examples.

A nice explanation of the optimizations the strict-aliasing rule allows: https://stackoverflow.com/a/99010/66088

The example given is:

    typedef struct Msg {
        unsigned int a;
        unsigned int b;
    } Msg;

    void SendWord(uint32_t);

    int main(void) {
        // Get a 32-bit buffer from the system
        uint32_t* buff = malloc(sizeof(Msg));

        // Alias that buffer through message
        Msg* msg = (Msg*)(buff);

        // Send a bunch of messages
        for (int i = 0; i < 10; ++i) {
            msg->a = i;
            msg->b = i+1;
            SendWord(buff[0]);
            SendWord(buff[1]);
        }
    }
The explanation is: with strict aliasing the compiler doesn't have to think about inserting instructions to reload the contents of buff every iteration of the loop.

The problem I have is that when we re-write the example to use a union, the generated code is the same regardless of whether we pass -fno-strict-aliasing or not. So this isn't a working example of an optimization enabled by strict aliasing. It makes no difference whether I build it with clang or gcc, for x86-64 or arm7. I don't think I did it wrong. We still have a memory load instruction in the loop. See https://godbolt.org/z/9xzq87d1r

Knowing whether a C compiler will make an optimization or not is all but impossible. The simplest and most reliable solution in this case is to do the loop hoisting optimization manually:

        uint32_t buff0 = buff[0];
        unit32_t buff1 = buff[1];
        for (int i = 0; i < 10; ++i) {
            msg->a = i;
            msg->b = i+1;
            SendWord(buff0);
            SendWord(buff1);
        }
Doing so removes the load instruction from the loop. See https://godbolt.org/z/ecGrvb3se

Note 1: The first thing that goes wrong for Stackoverflow example is that the compiler spots that malloc returns uninitialized data, so it can omit the reloading of buff in the loop anyway. In fact it removes the malloc too. Here's clang 18 doing that https://godbolt.org/z/97a8K73ss. I had to replace malloc with an undefined GetBuff() function, so the compiler couldn't assume the returned data was unintialized.

Note 2: Once we're calling GetBuff() instead of malloc(), the compiler has to assume that SendWord(buff[0]) could change buff, and therefore it has to reload it in the loop even with strict-aliasing enabled.


The strict aliasing stuff allows you to do "optimisations" across translation units that are otherwise unsound.

The compiler alias analysis is much more effective than those rules permit within a translation unit because it matters whether int* alias other int*.

And then we have link time optimisation, at which point the much better alias analysis runs across the whole program.

What remains therefore is a language semantically compromised to help primitive compilers that no longer exist to emit slightly better code.

This is a deeply annoying state of affairs.


Aliasing analysis is quite helpful for sophisticated compilers to generate good code.


Alias analysis is critical. Knowing what loads and stores can alias one another is a prerequisite for reordering them, hoisting operations out of loops and so forth. Therefore the compiler needs to do that work - but it needs to do it on values that are the same type as each other, not only on types that happen to differ.

Knowing that different types don't alias is a fast path in the analysis or a crutch for a lack of link time optimisation. The price is being unable to write code that does things like initialise an array using normal stores and then operates on it with atomic operations, implement some floating point operations, access network packets as structs, mmap hashtables from disk into C structs and so forth. An especially irritating one is the hostility to arrays that are sometimes a sequence of simd types and sometimes a sequence of uint64_ts.

Though C++ is slowly accumulating enough escape hatches to work around that (launder et al), C is distinctly lacking in the same.


Alias analysis is important. It's the C standard's type-based "strict aliasing" rules which are nonsense and should be disabled by default.

This is C. Here in these lands, we do things like cast float* to int* so that we can do evil bit level manipulation. The compiler is just gonna have to put that in its pipeline and compile it.


How does the version with buf0 and buf1 work? It looks like it sends always the same two values...


Hmmm, yes. I didn't understand what the code did.

Instead of creating those buff0 and buff1 variables before the loop, I should have done:

    for (int i = 0; i < 10; ++i) {
        unsigned a = i;
        unsigned b = i+1;
        msg->a = a;
        msg->b = b;
        SendWord(a);
        SendWord(b);   
    }
That gets rid of the load from the loop. https://godbolt.org/z/xsqWfxKzd




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: