> As someone who's worked on old Macs and has also done lots of 32 -> 64-bit porting, this is the sort of trick that works wonderfully...until it doesn't. And then you've got a nightmare on your hands.
That's why you hide the trick behind a zero-cost abstraction which checks at compile-time if the platform supports this
Modern optimizing C++ compilers (especially with Link Time Optimization enabled) are pretty amazing and can very often actually achieve that abstraction collapsing..
But, of course, always measure.
While it's true that modern compilers are wondrous things, checking whether they're clever enough to optimize away a particular construct - and to do so correctly, and to continue doing so in the next release - still takes time. If the same optimization can be done at a higher level, such that it will apply for any correct (but not necessarily clever) compiler, that's preferable. In my experience that's practically all the time. The best compiler optimizations IMO are the ones that can't be easily done at the source level.
Yeah, but an even more important system programming trick is to measure what you're doing every time. Performance optimization at this level is never about just trusting tools. If you aren't willing to be reading generated machine code and checking perf counters, you aren't really going to get much benefit.
And if you're willing to check your optimizations by reading disassembly, tricks like stuffing tag bits into the bottom of aligned pointer values is pretty routine.
That's why you hide the trick behind a zero-cost abstraction which checks at compile-time if the platform supports this