I'm curious what exactly makes this undefined behavior.
And in particular, what about something like this?
struct Foo {
#ifdef __cplusplus
int bar() const { return bar_; }
private:
#endif
int bar_;
};
Or, taking this a step further:
struct _Foo;
typedef struct _Foo Foo;
// In C "struct _Foo" is never defined.
int Foo_bar(const Foo* foo) { return *(int*)foo; }
void Foo_setbar(Foo* foo) { *(int*)foo; }
Foo* Foo_new() { return malloc(sizeof(int)); }
#ifdef __cplusplus
struct _Foo {
void set_bar() { bar_ = bar; }
int bar() const { return bar_; }
private:
int bar_;
};
#endif
The above isn't ideal but it does provide encapsulation in a way that doesn't seem to violate strict aliasing (the memory location is consistently read/written as "int").
I think this is plenty ok. For one thing, If a struct as a member of type T, it's ok to access it through a pointer to T (and also the address of the struct is guaranteed to be identical to the address of the first member). For another, you are using dynamically allocated memory, so the only thing that matters is the type of the pointer when the access is finally made. It doesn't matter that it was a Foo* before, if what you dereference is an int*.
This is different from pretending that the address of a struct s { int a; double b; } is the address of a struct t { int a; long long c; } and accessing it through a pointer to that. If you do that, C compilers will (given the opportunity) assume that the write-through-a-pointer-to-struct-t does not modify any object of type “struct s”. This is what the example st1 in the article illustrates.
The latter is what I suspect plenty of socket implementations still do (because there are several types of sockets, represented by different struct types with a common prefix). It is possible to revise them carefully so that they do not break the rules, but I doubt this work has been done.
The ability to use pointers to structures with a Common initial Sequence goes back at least to 1974--before unions were invented. When C89 was written, it would have been plausible that an implementation could uphold the Common Initial Sequence guarantees for pointers without upholding them for unions, but rather less plausible that implementations could do the reverse. Thus, the Standard explicitly specified that the guarantee is usable for unions, but saw no need to redundantly specify that it also worked for pointers.
If compilers would recognize that operation involving a pointer/lvalue that is freshly visibly based on another is an action that at least potentially involves the latter, that would be sufficient to make code that relies upon the CIS work. Unfortunately, some compilers are willfully blind to such things.
And in particular, what about something like this?
Or, taking this a step further: The above isn't ideal but it does provide encapsulation in a way that doesn't seem to violate strict aliasing (the memory location is consistently read/written as "int").