I don't think I *disagree*, but could you articulate how strict aliasing makes t...

ambrop7 · on Oct 27, 2018

Example explanation: https://stackoverflow.com/a/99010/1020667 (feel free to look for better ones).

ben0x539 · on Oct 27, 2018

That code looks somewhat wonky and seems either legal to me, or illegal for non-aliasing reasons. In principle, taking a pointer to a blob of bytes and casting it to a pointer to your struct should be sound if you can assume that the blob of bytes was originally a valid object of that struct type. I can see how you might get it wrong if that's not in fact how the blob was originally formed or if you get some alignment details wrong, but I can't immediately think of any aliasing-specific problems.

ambrop7 · on Oct 27, 2018

The problem is that it is "potentially unsafe" depending on what OTHER code does. For example, if some other code happens to manipulate the same bytes as an array of uint16_t (but not char/uint8_t) it's wrong (like in that example). Similarly if some other code manipulates the same bytes as some other struct type, it's also wrong. Yes it is possible to do it this way and not have undefined behavior, but you are leaving trap doors open that you don't even know about.

int_19h · on Oct 27, 2018

C strict aliasing rules doesn't allow you to cast a pointer to a char array to an arbitrary type T, regardless of whether it has correct alignment and contains a valid representation of T. It allows you to memcpy from a char array to a T variable (assuming valid representation).

And yes, it does actually produce broken code at runtime when optimizations are enabled on some compilers, unless you explicitly opt out from optimizations that depend on strict aliasing.

clarry · on Oct 27, 2018

In fact the C aliasing rules don't say a single thing about casting. You are free to cast at will. What matters, as far as aliasing is concerned, is the effective type of an object for access through an lvalue. And the effective type is.. it depends. But it is not the same as the type of the pointee of whatever pointer you might or might not have.

Note that arrays are not lvalues, so the effective type of an object is never an array of char or any other array type. You can, however, access any object with a character type, and that does not change the effective type of said object. The standard explicitly permits this!

That means I could allocate some memory, copy an object in there, make a pointer-to-array-of-char, pass the pointer onwards, and let the next guy in row cast this pointer-to-array-of-char into a pointer-to-the-type-of-the object I previously copied, perfectly legal. And if I didn't copy an object, it is still perfectly legal. A random piece of memory has no effective type unless it's gained one due to a previous access through a non-character type.

int_19h · on Oct 27, 2018

Yes, you're right. When I said "cast a pointer to an array", I meant it more colloquially - as in declaring a variable of type char[...], taking a pointer to the first element, and casting that. And yes, it's about access rather than the actual cast, but in practice a cast to anything but void* is virtually always the first step to accessing it as the cast-to type (excepting some legacy POSIX APIs that use char* for this, because they predate the existence of void* ).

And yes, there's the exception to the usual rules that lets you access T via char* , and then there's the "common initial sequence" rule with structs. Suffice it to say that it's complicated, but that things that people usually think "just work", actually don't.

FWIW, I'm of the general opinion that the C (and C++) memory model is formulated in such vague terms that no-one really knows what it is. We have some sort of conceptual consensus, that kinda sorta works because everybody makes the same assumptions (that aren't really warranted by the standard, but are "common sense"). But once you start digging into things like lifetime and object type - in C++ especially - and coming up with weird corner cases, things break down pretty quickly.

ben0x539 · on Oct 28, 2018

Yeah, there's a certain level of hermeneutics involved in arguing about undefined behavior, it's a great source of unending entertainment because no one can prove you wrong _definitely_.

I've long had daydreams of a C or C++ implementation that went out of its way to dynamically track all those ephemeral distinctions described in the standard but not commonly made concrete in compiler diagnostics or emitted code. Maybe this implementation would take some liberties with regards to the prescribed space/time complexity of certain operations, but that'd be okay, since it's pedagogical and not for production use.

Wouldn't it be amazing to have every bit of undefined behavior at runtime (except for some of the more esoteric ones, I guess) result in diagnostic messages outlining the bad state that the participating objects are in and how they got there?

Like, at runtime, keep track of the dynamic type stored at each byte of memory, and then have debug logging output as the runtime evaluates the strict aliasing rules for each memory access until it finds a clause that makes it valid. Then we mere mortals can actually have arguments about these kinds of things and forward "falsifiable" theories!

I feel like setting this up in some sort of C interpreter can't be fundamentally nearly as hard as the inner workings of current-day optimizing compilers, so I keep thinking that someone must have done this already...

int_19h · on Oct 28, 2018

That's interesting - I had such thoughts as well, way back after my first experience writing significant quantities of C++ (03 back then) in the industry.

And yes, I think this would require an interpreter, and some kind of shadow memory as you describe (that keeps track of type and lifetime metadata associated with bytes). In fact, I think it would even have to be multi-level segmented memory, with segment per each object (including subobjects!), to fully enforce rules such as out-of-bounds array access, and comparing pointers to different objects.

It could be an interesting exercise. And if written in a verifiable language, the result could, in theory, be declared the specification for the memory model.

paavoova · on Oct 27, 2018

Nonetheless, it's quite commonly used in popular code. E.g. I was looking at some dynamic string libraries, and a popular one (sds) casts a struct header from a char * and modifies its values. This code is used in Redis.

int_19h · on Oct 27, 2018

And if you look at its build scripts, I bet there's -fno-strict-aliasing there.

All popular compilers let you opt out of these kinds of optimizations. But it's no longer standard C at that point - and it's specifically because of strict aliasing rules (which were being discussed in this thread).

The bugs you get when you break the rules without opting out of standard compliance are not theoretical, either. Just google for "strict aliasing bug" to see numerous horror stories.