Hacker News new | past | comments | ask | show | jobs | submit login

I do optimal structure (and bit) packing without much thought because it is an old habit. As the article states, I have noticed that the only other people that do habitual careful structure optimization these days have been doing low-level and high-performance code as long as I have. Most programmers are oblivious to it.

The reasons you would do it today are different than a decade ago and the rules have changed because the processors have changed. To add two clarifying points to the original article:

- The main reason to do optimal structure packing today is to reduce cache line misses. Because cache line misses are so expensive it is a big net performance gain in many cases to have the code do a little more work if it reduces cache line fills; optimal structure packing is basically a "free" way of minimizing cache misses.

- On modern Intel microarchitectures, alignment matters much less for performance than it used to. One of the big changes starting with the i7 is that unaligned memory accesses have approximately the same cost as aligned memory accesses. This is a pretty radical change to the optimization assumptions for structure layout. Consequently, it is possible to do very tight memory packing without the severe performance penalty traditionally implied.

What constitutes "optimal" structure packing is architecture dependent. The original C structure rules were designed in part to allow the structures to be portable above all else. If you design highly optimized structures for a Haswell processor, code may run much more slowly or create a CPU exception and crash on other architectures, so keep these tradeoffs in mind. The article is discussing basic structure packing which typically has easily predictable behavior almost anywhere C compiles.




Why doesn't the compiler handle this for the programmer? Seems "free" in the no-tradeoffs sense.


Why doesn't the compiler handle this for the programmer? Seems "free" in the no-tradeoffs sense.

Traditionally, C and C++ programs have been compiled by running the compiler on each .c file, and then linking together the results. If each invocation of the compiler could put the fields of a structure in arbitrary order, the different object files would not work together correctly. C structures do not include runtime type information and all offsets are calculated during the compile phase. The same issues come up with libraries. We need a stable application binary interface, or ABI.

Even if the ABI problem was solved, there are advantages to letting the programmer determine the order of fields in a structure. If you know that different threads are going to be using different parts of a structure, you will want to arrange things (or insert padding) so that the different threads are using different cache lines. This avoids so-called "false-sharing."


>If each invocation of the compiler could put the fields of a structure in arbitrary order, the different object files would not work together correctly.

That just requires that your packing algorithm be deterministic and there's no reason for it not to be.

I suppose it would fail in cases where someone defines a 2 element struct in one file that's a subset of a 3 element struct in another file, and then casts the 3 element into the 2 element.


Which fails if I am using two different compilers are sending a structure over a network connection!

That is why I am familiar with struct packing actually, for creating protocols it is essential each end has the structure defined and laid out in the exact same way.


Erlang's type system with binary data-type is pretty awesome.

Example: https://groups.google.com/d/msg/erlang-programming/s39IQlk-d...

I wished more programming languages had baked in support for bitfields, bit arrays and endian conversion, because building interoperable, efficient binary clients without icky code generation is currently a PITA.


> That just requires that your packing algorithm be deterministic and there's no reason for it not to be.

The packing algorithm would also need to be static, since object code produced by different compilers, or by different compiler settings (eg. -O2 or -O3) is still expected to link together correctly without error.

Some kind of attribute declaration in the struct definition itself could manage this, assuming that the algorithm it specifies is deterministic. But it can't be completely automatic, since that would differ from the current (deterministic) algorithm.


The offsets of code and data blocks are already undetermined before linking. Even if the algorithm is non-deterministic, what's stopping the optimisation from not running until link time?


sizeof (foo) would not be known until link time.

static_assert(sizeof(foo)==64, "you didn't align foo correctly") would not be possible.


Sometimes you need to access a structure from assembler code, so the structure layout has to be manually set up.


But why not let the programmer opt in to manually ordering the field, and let the compiler do it the rest of the time? Similar to how the inline specifier works for functions, the compiler can inline things that it thinks will be faster if inlined, but if the programmer knows that can't be done, or must be done, they can say so.


The elements of a C struct must be sequentially ordered in memory as specified in the code, that's defined in the standard.


In turn, various C code is then written depending on this expectation, meaning a non-compliant compiler would break a significant number of codebases.


On the other hand, I for one would happily throw such brittle, unportable code under the bus in the name of automatically reduce cache line missing.

About the only safe usage of structure memory layout that I can think of is offsetof, and even then, I think you could make offsetof's behaviour definable in the ABI to make it deterministic.


That unportable code includes CPython, Sun JVM, essentially anything GUI related in Windows and anything that uses BSD sockets.


> On the other hand, I for one would happily throw such brittle, unportable code under the bus in the name of automatically reduce cache line missing.

Spoken like someone who's never been forced to fix such code left behind by previous coders.

> About the only safe usage of structure memory layout that I can think of is offsetof, and even then, I think you could make offsetof's behaviour definable in the ABI to make it deterministic.

Sure, especially considering offsetof is standard defined and allowed to be compiler implemented AFAIK. That said, you bring up a great point: You're talking about breaking decades old ABIs if you were to enable such memory layout reordering by default. Good luck tracking down updated .libs or original source to build .libs from for every single one of your dependencies!


I'm not sure you know what "unportable" means. Every program compiled by a compliant compiler will work. "unportable" doesn't apply to compilers which do the wrong thing, it applies to programs which do not assume anything outside what is in the specification.


Actually, C standard does not directly specify that. Only fact about structure layout that is explicitly required by standard is that address of first field is same as address of the structure itself, other constraints on structure layout can be deduced from other requirements scattered through the standard, but nothing in standard requires in memory field order to be same as declaration order.


"A structure type describes a sequentially allocated nonempty set of member objects (and, in certain circumstances, an incomplete array), each of which has an optionally specified name and possibly distinct type"

I think sequentially allocated states that fact. From what information do you mean it can be deduced?


It seems that my memory does not serve me well :)

ISO C actually explicitly specifies that the order is same in §6.7.2.1.13.

The "deduced" remark was essentially intended to cover constraints that come from 6.2.6.1.2 ("...objects are composed of contiguous sequences of one or more bytes..."), ie. fields cannot be (conceptually) outside &foo and ((char*)&foo)+sizeof(foo), which can be hard to implement on sufficiently weird architectures.


It would break things. If you have a structure that is part of a network transmission or file format, reordering the fields would read the wrong data. On top of that C programmers are used to doing memory hacking type things like taking the address of fields and in some case may need to rely on the certain memory layout of a struct being what they declared it as.


You can't use a struct in a portable way for sending directly on the network or writing/reading to disk. You must always properly serialize the data despite the fact that the _order_ of the elements is defined.


I would assume you can't safely do arithmetic within a structure, for instance. As an example, take a

  struct {
    char a; 
    char b; 
    char c;
    char *p;
    char d;
  }
A valgrind error that points at "0 bytes within an area of memory..." would signal an error originating somewhere in the struct, but who knows whether it's p or a.


I believe one of the reasons for adding "class" types in C++ was to break from the C's restrictions on the struct layout. But even then the trouble of breaking binary compatibility was too much to pay for freeing programmers from minding their own data.


No, the same restrictions apply to class as to struct. The most relevant difference in C++ is that the compiler can reorder data members that are in different visibility specifications (public, protected, private) but not within one of those blocks, to maintain compatibility with C.


Well you only have guaranteed compatibility with C if a class is standard-layout, which requires that all members have the same access control (public, protected, private). So if there is any rule in C++ that mandates ordering within a "block" when there are multiple such "blocks", it's not for compatibility with C.


This is what I said. Even though they could change the rules for the classes they did not because it's far from "free".


Funnily, this is one of the freedoms granted by the standard which neither gcc, clang, or MSVC actually make use of.


Oh, putting "class" in the first sentence was a mistake as people hit downvote without reading the rest. I do not say that "class" allows rearranging members (other than C++ access-based rearrangement that applies to structs as well). Really. Please read.


this is one of those classic things where we could actually improve C for performance. the standard is restricting here - data that requires to be identical in all cases for network, file etc. purposes should be marked with a special keyword - or for legacy purposes we mark the ones we want the compiler to order for us.

its more complicated than just keeping the size down though - you might want things to start on 64 or 128 or 256-bit boundaries as well - e.g. if your class contains aligned vector members and is itself aligned.


Because if the compiler reorders your fields under your foot, you won't be able to know how things a serialized. You lose control over the memory layout.


Starting with "the i7"? What microarchitecture do you mean?


The Nehalem microarchitecture made aligned and unaligned memory accesses have the same latency and throughput, except in weird edge cases like loads which span an L1 cache line boundary.


It used to be the case that unaligned memory accesses were twice or more slower because they literally required multiple bus cycles, but for x86 that hasn't been true since the Core 2 at least: http://lemire.me/blog/archives/2012/05/31/data-alignment-for...


Thanks.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: