I've been bitten by reordering. In my case, the toolchain developers implemented the reordering step in the assembler as an extra optimization step (on by default of course), so I had to disassemble the binary to even find the problem. They had redefined the assembly language semantics to require "volatile" keywords wherever you needed ordering maintained. I turned that particular optimization off.