Interesting.. I guess I should have mentioned gcc 4.5.2 on a Xeon X5670, 2.93GHz, which is 12M cache. Changing it to ('0' + j % 10) has no change in overall speed for me.
OK, I just tested with gcc 4.6.0, and unless I've screwed something up, it looks like (at -O4) it actually optimizes these into the exact same code. As in the generated ELFs are byte-for-byte identical. Impressive.
Well, if you really want to go apples-to-oranges and specialize as much as possible for the sequentially increasing case we're dealing with here, you could just do your arithmetic directly on the ASCII string itself, complete with a little ripple-carry loop...I bet that'd be fast. If I'm remembering correctly, I think x86 actually has dedicated instructions for ASCII arithmetic (though they're probably so unoptimized in a modern microarchitecture it'd probably be faster to avoid them).