Thanks! This gives a general picture. Can you also explain the bit twiddling bit...

to3m · on Nov 22, 2016

Roughly speaking, find index of most significant set bit. (So 255 gives you 7, 254 gives you 7, 256 gives you 8, etc.) At a guess, they are comparing things 8/16 bytes at a time, and getting a packed set of byte flags where bytes are set to 0xff to indicate inequality (so, imagining it 4 bytes at a time, comparing 0x12345678 and 0x12125656 would give you 0x00ff00ff) and they want to see where the first different byte is. In this example, the index is 24. (Then: 4-24/8=1. So the first different byte is at offset 1.)

As to how it works: you use the v&-v trick to clear all but the top bit of your comparison result. Because you've only got 1 bit set, this value is a power of two; if you multiply another value by this one, you're getting the equivalent of a left shift by that power (which is relevant - you need to be thinking in shifts rather than multiplies).

Because of the way the predicate results are formed, you only have 5 possible values you're going to get here: 2^31, 2^23, 2^15, 2^7, or 0. The outputs these correspond to:

    31 -> 0
    23 -> 1
    15 -> 2
    7  -> 3
    0  -> (special case - all identical)

So you need a value that when shifted left 31 bits can be easily turned into 0, when shifted left 23 can be turned into 1, and so on.

A suitable value for this is something like 3<<22|2<<14|1<<6. (If I've made a mistake there, which I probably have, hopefully the working will make it obvious how to fix it.) Recalling that multiplying by 2^N gives you a left shift by N:

When multiplied by 2^31, you get 0.

When multiplied by 2^23, you get 1<<29.

When multiplied by 2^15, you get 2<<29|1<<21.

When multiplied by 2^7, you get 3<<29|2<<21|1<<13.

In all cases, you can shift the result right 29 bits (unsigned shift, or signed shift and mask with 3 afterwards) and get the value you're looking for: 0, 1, 2, or 3.

Zero you can check for before doing the multiply; or you could pop a 4 in at the bottom or something so that the result would be 4 in that case. (This is what I figured you'd want originally and that's why the result starts at bit 29. But actually I guess you'd probably check first.)

This extends naturally to wider comparisons. I think they're doing it 64 bits at a time in this case.

Exercise for the reader: (maybe these are too easy?? - however they weren't instantly obvious to me)

- Why is x * 2^N equivalent to x<<N?

- What if you have more than 1 bit set? Like x * (2^N+2^M)?

- Figure out some rules for generating useful constants like these

al2o3cr · on Nov 22, 2016

Highly recommend the book "Hacker's Delight" by Henry S. Warren, Jr for more bit hax0ring like this.

naasking · on Nov 22, 2016

Or check this reference:

https://graphics.stanford.edu/~seander/bithacks.html