One trick that's not mentioned is unbundling the struct. Suppose you have a stru...

tokenrove · on Jan 1, 2014

Just a note, this is sometimes called SOA (Structure of Arrays) or AOS -> SOA refactoring and comes up a lot when trying to take advantage of vector operations, too.

_delirium · on Jan 1, 2014

It's interesting this is making a revival in C for efficiency. When I was first learning to code ('90s), this style of "parallel arrays" was seen as a scripting-language wart. When you semantically want an array-of-struct in a bash script, you resort to parallel arrays instead, because it's what you can do easily.

beagle3 · on Jan 2, 2014

> When I was first learning to code ('90s), this style of "parallel arrays" was seen as a scripting-language wart.

When I first learned to program in the 80s, this was known as "Fortran order" (as opposed to Pascal or C order), and in the later nineties, I think that "column major" and "row major" became popular, and they still are.

This goes back to sometimes in the 60's, with APL and Fortran doing column major on purpose, BASIC by accident, and most of the rest doing Row major.

dspillett · on Jan 2, 2014

> this style of "parallel arrays" was seen as a scripting-language wart

It was only a wart because they didn't support compound types at all (and in some cases they did support compound types but people were not taught them when moving to a better language and kept their old techniques). It is compounded by the fact most scripting languages are untyped, so structure packing (the main justification for using arrays like that in more modern languages (for general memory use, stack size, or cache-hit optimisations)) isn't nearly as relevant.

Basically if you see parallel arrays used without a comment stating that it is done for packing reasons or similar, it is there either because the language does not support compound structures well or because the programmer knew no better. In a language where structure packing can be useful, it is a perfectly valid optimisation (though for small data not in tight loops, it may be an optimisation too far).

dllthomas · on Jan 2, 2014

It's also worth noting that it comes at a cost of locality for an individual row, so isn't always an optimization.

danieldk · on Jan 1, 2014

In fact, in Java this is the only way to simulate structs as POD rather than paying for the overhead of an array of objects.

wging · on Jan 1, 2014

Sorry, POD?

brendano · on Jan 1, 2014

plain old data

wging · on Jan 5, 2014

Ah, thanks.

finnw · on Jan 2, 2014

Not the only way. You can interleave fields in a ByteBuffer.

rcfox · on Jan 2, 2014

It's also great for minimizing cache misses if you find yourself iterating over arrays of large structs and just operating on one field from each of them.

slavak · on Jan 2, 2014

The ISPC[1] language has syntax for requesting the compiler perform this transformation for you, giving you the performance benefits for free without requiring manually writing out the code or even paying the price in syntactic clarity (you can still write out p[0].x+p[1].x instead of px[0]+px[1]).

[1] http://ispc.github.io/

PurplePanda · on Jan 2, 2014

Some lisp systems implemented pairs this way. A global array of CARs and a global array of CDRs

evincarofautumn · on Jan 1, 2014

Working with unzipped structures (xs+ys instead of xys) can be a bit of a pain. It seems like a transformation that a compiler should be able to do easily, given enough information about structure and (ugh) aliasing. I wish C had real array types. :(

carterschonwald · on Jan 1, 2014

Happily, in Haskell the standard array library does the struct of arrays format for unboxed arrays. It's super handy!

evincarofautumn · on Jan 2, 2014

I thought the array package was deprecated. Its mutable API is nicer than vector’s, though, if I recall correctly.

Kutta · on Jan 2, 2014

Vector does the SOA represetation too.

carterschonwald · on Jan 3, 2014

yup, Vector is the one i have mind. (Vector is also an array lib)