One trick that's not mentioned is unbundling the struct. Suppose you have a struct with a pointer and a character in it, and a huge array of those structs. If you resent the padding tax, refactor your code to use and pass around two arrays instead, one of pointers and the other of chars.
Just a note, this is sometimes called SOA (Structure of Arrays) or AOS -> SOA refactoring and comes up a lot when trying to take advantage of vector operations, too.
It's interesting this is making a revival in C for efficiency. When I was first learning to code ('90s), this style of "parallel arrays" was seen as a scripting-language wart. When you semantically want an array-of-struct in a bash script, you resort to parallel arrays instead, because it's what you can do easily.
> When I was first learning to code ('90s), this style of "parallel arrays" was seen as a scripting-language wart.
When I first learned to program in the 80s, this was known as "Fortran order" (as opposed to Pascal or C order), and in the later nineties, I think that "column major" and "row major" became popular, and they still are.
This goes back to sometimes in the 60's, with APL and Fortran doing column major on purpose, BASIC by accident, and most of the rest doing Row major.
> this style of "parallel arrays" was seen as a scripting-language wart
It was only a wart because they didn't support compound types at all (and in some cases they did support compound types but people were not taught them when moving to a better language and kept their old techniques). It is compounded by the fact most scripting languages are untyped, so structure packing (the main justification for using arrays like that in more modern languages (for general memory use, stack size, or cache-hit optimisations)) isn't nearly as relevant.
Basically if you see parallel arrays used without a comment stating that it is done for packing reasons or similar, it is there either because the language does not support compound structures well or because the programmer knew no better. In a language where structure packing can be useful, it is a perfectly valid optimisation (though for small data not in tight loops, it may be an optimisation too far).
It's also great for minimizing cache misses if you find yourself iterating over arrays of large structs and just operating on one field from each of them.
The ISPC[1] language has syntax for requesting the compiler perform this transformation for you, giving you the performance benefits for free without requiring manually writing out the code or even paying the price in syntactic clarity (you can still write out p[0].x+p[1].x instead of px[0]+px[1]).
Working with unzipped structures (xs+ys instead of xys) can be a bit of a pain. It seems like a transformation that a compiler should be able to do easily, given enough information about structure and (ugh) aliasing. I wish C had real array types. :(