Interesting post, thanks. It certainly seems like llvm-mos' codegen is improving markedly modulo some concerns brought up over bloat. But I also note that a number of the performance recommendations (prefer using more globals, avoid passing structures presumably due to the limited hardware stack) aren't exactly current programming convention - and for that matter, they're exactly what you would do writing assembly by hand. These recommendations would be true for cc65 too of course, but it still makes the point that using C to develop on 6502 is just making the dog walk, not the dog walking well.
The AtariAge results are fairly out-of-date; I think that's even before we started doing whole-program zero-page allocation.
The only real 6502-specific C caveat left for llvm-mos is that you should strongly prefer structs of arrays to arrays of structs; and that's not even that 6502-specific. Otherwise, standard C gives fairly tight assembly.
That being said, every couple hundred lines of generated assembly for any reasonably-sized C program will contain at least one WTF, from a human point of view. Removing those WTFs one at a time is the long tail of a compiler engineer. Still, I'm not going anywhere!