I do optimal structure (and bit) packing without much thought because it is an o...

3pt14159 · on Jan 1, 2014

Why doesn't the compiler handle this for the programmer? Seems "free" in the no-tradeoffs sense.

colin_mccabe · on Jan 2, 2014

Why doesn't the compiler handle this for the programmer? Seems "free" in the no-tradeoffs sense.

Traditionally, C and C++ programs have been compiled by running the compiler on each .c file, and then linking together the results. If each invocation of the compiler could put the fields of a structure in arbitrary order, the different object files would not work together correctly. C structures do not include runtime type information and all offsets are calculated during the compile phase. The same issues come up with libraries. We need a stable application binary interface, or ABI.

Even if the ABI problem was solved, there are advantages to letting the programmer determine the order of fields in a structure. If you know that different threads are going to be using different parts of a structure, you will want to arrange things (or insert padding) so that the different threads are using different cache lines. This avoids so-called "false-sharing."

pedrocr · on Jan 2, 2014

>If each invocation of the compiler could put the fields of a structure in arbitrary order, the different object files would not work together correctly.

That just requires that your packing algorithm be deterministic and there's no reason for it not to be.

I suppose it would fail in cases where someone defines a 2 element struct in one file that's a subset of a 3 element struct in another file, and then casts the 3 element into the 2 element.

com2kid · on Jan 2, 2014

Which fails if I am using two different compilers are sending a structure over a network connection!

That is why I am familiar with struct packing actually, for creating protocols it is essential each end has the structure defined and laid out in the exact same way.

midas007 · on Jan 2, 2014

Erlang's type system with binary data-type is pretty awesome.

Example: https://groups.google.com/d/msg/erlang-programming/s39IQlk-d...

I wished more programming languages had baked in support for bitfields, bit arrays and endian conversion, because building interoperable, efficient binary clients without icky code generation is currently a PITA.

rlpb · on Jan 2, 2014

> That just requires that your packing algorithm be deterministic and there's no reason for it not to be.

The packing algorithm would also need to be static, since object code produced by different compilers, or by different compiler settings (eg. -O2 or -O3) is still expected to link together correctly without error.

Some kind of attribute declaration in the struct definition itself could manage this, assuming that the algorithm it specifies is deterministic. But it can't be completely automatic, since that would differ from the current (deterministic) algorithm.

TheZenPsycho · on Jan 2, 2014

The offsets of code and data blocks are already undetermined before linking. Even if the algorithm is non-deterministic, what's stopping the optimisation from not running until link time?

RogerL · on Jan 2, 2014

sizeof (foo) would not be known until link time.

static_assert(sizeof(foo)==64, "you didn't align foo correctly") would not be possible.

chj · on Jan 2, 2014

Sometimes you need to access a structure from assembler code, so the structure layout has to be manually set up.

DrewAllyn · on Jan 2, 2014

But why not let the programmer opt in to manually ordering the field, and let the compiler do it the rest of the time? Similar to how the inline specifier works for functions, the compiler can inline things that it thinks will be faster if inlined, but if the programmer knows that can't be done, or must be done, they can say so.

kristoffer · on Jan 1, 2014

The elements of a C struct must be sequentially ordered in memory as specified in the code, that's defined in the standard.

MaulingMonkey · on Jan 1, 2014

In turn, various C code is then written depending on this expectation, meaning a non-compliant compiler would break a significant number of codebases.

antimagic · on Jan 2, 2014

On the other hand, I for one would happily throw such brittle, unportable code under the bus in the name of automatically reduce cache line missing.

About the only safe usage of structure memory layout that I can think of is offsetof, and even then, I think you could make offsetof's behaviour definable in the ABI to make it deterministic.

dfox · on Jan 2, 2014

That unportable code includes CPython, Sun JVM, essentially anything GUI related in Windows and anything that uses BSD sockets.

MaulingMonkey · on Jan 5, 2014

> On the other hand, I for one would happily throw such brittle, unportable code under the bus in the name of automatically reduce cache line missing.

Spoken like someone who's never been forced to fix such code left behind by previous coders.

> About the only safe usage of structure memory layout that I can think of is offsetof, and even then, I think you could make offsetof's behaviour definable in the ABI to make it deterministic.

Sure, especially considering offsetof is standard defined and allowed to be compiler implemented AFAIK. That said, you bring up a great point: You're talking about breaking decades old ABIs if you were to enable such memory layout reordering by default. Good luck tracking down updated .libs or original source to build .libs from for every single one of your dependencies!

EpicEng · on Jan 5, 2014

I'm not sure you know what "unportable" means. Every program compiled by a compliant compiler will work. "unportable" doesn't apply to compilers which do the wrong thing, it applies to programs which do not assume anything outside what is in the specification.

dfox · on Jan 2, 2014

Actually, C standard does not directly specify that. Only fact about structure layout that is explicitly required by standard is that address of first field is same as address of the structure itself, other constraints on structure layout can be deduced from other requirements scattered through the standard, but nothing in standard requires in memory field order to be same as declaration order.

kristoffer · on Jan 2, 2014

"A structure type describes a sequentially allocated nonempty set of member objects (and, in certain circumstances, an incomplete array), each of which has an optionally specified name and possibly distinct type"

I think sequentially allocated states that fact. From what information do you mean it can be deduced?

dfox · on Jan 2, 2014

It seems that my memory does not serve me well :)

ISO C actually explicitly specifies that the order is same in §6.7.2.1.13.

The "deduced" remark was essentially intended to cover constraints that come from 6.2.6.1.2 ("...objects are composed of contiguous sequences of one or more bytes..."), ie. fields cannot be (conceptually) outside &foo and ((char*)&foo)+sizeof(foo), which can be hard to implement on sufficiently weird architectures.

AshleysBrain · on Jan 2, 2014

It would break things. If you have a structure that is part of a network transmission or file format, reordering the fields would read the wrong data. On top of that C programmers are used to doing memory hacking type things like taking the address of fields and in some case may need to rely on the certain memory layout of a struct being what they declared it as.

kristoffer · on Jan 2, 2014

You can't use a struct in a portable way for sending directly on the network or writing/reading to disk. You must always properly serialize the data despite the fact that the _order_ of the elements is defined.

merraksh · on Jan 1, 2014

I would assume you can't safely do arithmetic within a structure, for instance. As an example, take a

  struct {
    char a; 
    char b; 
    char c;
    char *p;
    char d;
  }

A valgrind error that points at "0 bytes within an area of memory..." would signal an error originating somewhere in the struct, but who knows whether it's p or a.

pandaman · on Jan 1, 2014

I believe one of the reasons for adding "class" types in C++ was to break from the C's restrictions on the struct layout. But even then the trouble of breaking binary compatibility was too much to pay for freeing programmers from minding their own data.

archangel_one · on Jan 1, 2014

No, the same restrictions apply to class as to struct. The most relevant difference in C++ is that the compiler can reorder data members that are in different visibility specifications (public, protected, private) but not within one of those blocks, to maintain compatibility with C.

haberman · on Jan 1, 2014

Well you only have guaranteed compatibility with C if a class is standard-layout, which requires that all members have the same access control (public, protected, private). So if there is any rule in C++ that mandates ordering within a "block" when there are multiple such "blocks", it's not for compatibility with C.

pandaman · on Jan 1, 2014

This is what I said. Even though they could change the rules for the classes they did not because it's far from "free".

pmr_ · on Jan 2, 2014

Funnily, this is one of the freedoms granted by the standard which neither gcc, clang, or MSVC actually make use of.

pandaman · on Jan 2, 2014

Oh, putting "class" in the first sentence was a mistake as people hit downvote without reading the rest. I do not say that "class" allows rearranging members (other than C++ access-based rearrangement that applies to structs as well). Really. Please read.

jheriko · on Jan 2, 2014

this is one of those classic things where we could actually improve C for performance. the standard is restricting here - data that requires to be identical in all cases for network, file etc. purposes should be marked with a special keyword - or for legacy purposes we mark the ones we want the compiler to order for us.

its more complicated than just keeping the size down though - you might want things to start on 64 or 128 or 256-bit boundaries as well - e.g. if your class contains aligned vector members and is itself aligned.

AYBABTME · on Jan 2, 2014

Because if the compiler reorders your fields under your foot, you won't be able to know how things a serialized. You lose control over the memory layout.

valleyer · on Jan 1, 2014

Starting with "the i7"? What microarchitecture do you mean?

pjscott · on Jan 1, 2014

The Nehalem microarchitecture made aligned and unaligned memory accesses have the same latency and throughput, except in weird edge cases like loads which span an L1 cache line boundary.

userbinator · on Jan 2, 2014

It used to be the case that unaligned memory accesses were twice or more slower because they literally required multiple bus cycles, but for x86 that hasn't been true since the Core 2 at least: http://lemire.me/blog/archives/2012/05/31/data-alignment-for...

valleyer · on Jan 1, 2014

Thanks.