Hacker News new | past | comments | ask | show | jobs | submit login

One trick that's not mentioned is unbundling the struct. Suppose you have a struct with a pointer and a character in it, and a huge array of those structs. If you resent the padding tax, refactor your code to use and pass around two arrays instead, one of pointers and the other of chars.



Just a note, this is sometimes called SOA (Structure of Arrays) or AOS -> SOA refactoring and comes up a lot when trying to take advantage of vector operations, too.


It's interesting this is making a revival in C for efficiency. When I was first learning to code ('90s), this style of "parallel arrays" was seen as a scripting-language wart. When you semantically want an array-of-struct in a bash script, you resort to parallel arrays instead, because it's what you can do easily.


> When I was first learning to code ('90s), this style of "parallel arrays" was seen as a scripting-language wart.

When I first learned to program in the 80s, this was known as "Fortran order" (as opposed to Pascal or C order), and in the later nineties, I think that "column major" and "row major" became popular, and they still are.

This goes back to sometimes in the 60's, with APL and Fortran doing column major on purpose, BASIC by accident, and most of the rest doing Row major.


> this style of "parallel arrays" was seen as a scripting-language wart

It was only a wart because they didn't support compound types at all (and in some cases they did support compound types but people were not taught them when moving to a better language and kept their old techniques). It is compounded by the fact most scripting languages are untyped, so structure packing (the main justification for using arrays like that in more modern languages (for general memory use, stack size, or cache-hit optimisations)) isn't nearly as relevant.

Basically if you see parallel arrays used without a comment stating that it is done for packing reasons or similar, it is there either because the language does not support compound structures well or because the programmer knew no better. In a language where structure packing can be useful, it is a perfectly valid optimisation (though for small data not in tight loops, it may be an optimisation too far).


It's also worth noting that it comes at a cost of locality for an individual row, so isn't always an optimization.


In fact, in Java this is the only way to simulate structs as POD rather than paying for the overhead of an array of objects.


Sorry, POD?


plain old data


Ah, thanks.


Not the only way. You can interleave fields in a ByteBuffer.


It's also great for minimizing cache misses if you find yourself iterating over arrays of large structs and just operating on one field from each of them.


The ISPC[1] language has syntax for requesting the compiler perform this transformation for you, giving you the performance benefits for free without requiring manually writing out the code or even paying the price in syntactic clarity (you can still write out p[0].x+p[1].x instead of px[0]+px[1]).

[1] http://ispc.github.io/


Some lisp systems implemented pairs this way. A global array of CARs and a global array of CDRs


Working with unzipped structures (xs+ys instead of xys) can be a bit of a pain. It seems like a transformation that a compiler should be able to do easily, given enough information about structure and (ugh) aliasing. I wish C had real array types. :(


Happily, in Haskell the standard array library does the struct of arrays format for unboxed arrays. It's super handy!


I thought the array package was deprecated. Its mutable API is nicer than vector’s, though, if I recall correctly.


Vector does the SOA represetation too.


yup, Vector is the one i have mind. (Vector is also an array lib)




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: