You can get some pretty unbelievable performance gains out of a single writer and arrays of structs.
Bonus points if you figure out a way to have an array per type of struct pre-allocated with more elements than you will ever need. Even if you use a GC language you can almost eliminate collections with this approach.
Even the array of structs is a non-ideal approach, as structs are usually viewed as a static collection of data.
But if you look at the hot loop, it usually boils down to a funnel - not unlike a furnace.
Lots of highly spacious needing raw materials are gathered and passed through, to be condensed into relatively small output.
So the ideal structure is a sort of union-struct, that compresses the results down each step of the algo, keeping it all in cache, while keeping it slim..
Bonus points if you figure out a way to have an array per type of struct pre-allocated with more elements than you will ever need. Even if you use a GC language you can almost eliminate collections with this approach.