Instead of repurposing top bits you can also repurpose the Bits beyond alignment...

eschneider · on Sept 20, 2019

As someone who's worked on old Macs and has also done lots of 32 -> 64-bit porting, this is the sort of trick that works wonderfully...until it doesn't. And then you've got a nightmare on your hands.

I'm not saying never do that (ok, maybe I am...) But definitely think long and hard about how long your code will be around before you do it.

jcelerier · on Sept 20, 2019

> As someone who's worked on old Macs and has also done lots of 32 -> 64-bit porting, this is the sort of trick that works wonderfully...until it doesn't. And then you've got a nightmare on your hands.

That's why you hide the trick behind a zero-cost abstraction which checks at compile-time if the platform supports this

notacoward · on Sept 20, 2019

One of my favorite system programming tricks is to never believe that a "zero cost abstraction" lives up to the name.

jjuhl · on Sept 20, 2019

Modern optimizing C++ compilers (especially with Link Time Optimization enabled) are pretty amazing and can very often actually achieve that abstraction collapsing.. But, of course, always measure.

notacoward · on Sept 20, 2019

While it's true that modern compilers are wondrous things, checking whether they're clever enough to optimize away a particular construct - and to do so correctly, and to continue doing so in the next release - still takes time. If the same optimization can be done at a higher level, such that it will apply for any correct (but not necessarily clever) compiler, that's preferable. In my experience that's practically all the time. The best compiler optimizations IMO are the ones that can't be easily done at the source level.

hinkley · on Sept 20, 2019

Sometimes programmer advice needs disclaimers like prescription medicines.

† Offer only valid in the contiguous United States. Offer cannot be used in combination with any other offers. See stores for details.

‡ When not in use, return Happy Fun Ball™ to its protective lead case.

ajross · on Sept 20, 2019

Yeah, but an even more important system programming trick is to measure what you're doing every time. Performance optimization at this level is never about just trusting tools. If you aren't willing to be reading generated machine code and checking perf counters, you aren't really going to get much benefit.

And if you're willing to check your optimizations by reading disassembly, tricks like stuffing tag bits into the bottom of aligned pointer values is pretty routine.

saagarjha · on Sept 20, 2019

The abstraction necessary to make tagged pointers platform-dependent is quite small and trivially removed by any optimizing compiler.

vardump · on Sept 20, 2019

Yeah, don't believe in dogmas, measure. Zero cost abstractions sometimes are anything but.

hinkley · on Sept 20, 2019

And sometimes the surprises can be in the other direction.

Cache thrashing is a thing. And perf analysis tools have bookkeeping errors.

BeeOnRope · on Sept 20, 2019

Those failures were all about top bits, not low bits though weren't they?

What problems did lower bit use cause? You always had to mask it away even on the old boxes, no?

astrobe_ · on Sept 20, 2019

I'm curious, when does it fail?

groby_b · on Sept 20, 2019

That does not hold for Intel x86 architecture chips, which are perfectly happy to have unaligned integers.

As for struct members, the alignment is (of course) "implementation defined", which is the fancy way of throwing up your hands and saying "whatever". (Since C++11 we actually have alignas(), which at least gives manual control)

tynorf · on Sept 20, 2019

Many (most? all?) allocators are guaranteed to give you back aligned pointers.

For instance, jemalloc(3) on amd64 platforms will give back 16-byte aligned pointers:

  [...] The allocated space is suitably aligned
  (after possible pointer coercion) for storage of any type of object.

astrobe_ · on Sept 20, 2019

One can construct an aligned pointer on a chunk of allocated memory by asking for the needed quantity plus the size of the alignment and the "nudge" the pointer you get to the next word boundary (and you can do something with the "wasted" byte before it, like storing the size of the object).

But that sort of trick is only worth it if you are writing a compiler and its runtime, an interpreter, a memory allocator (in particular GCs) or at the very last some sort of high performance library (you would return your "featured pointer" as an opaque type to the user).