Bit Twiddling Hacks (2005)

thom · on Oct 27, 2022

My favourite bit of bit twiddling doesn't seem to appear here - how to enumerate all the subsets of bits set. Like so many bit-twiddling hacks, I first saw this in chess engines:

https://www.chessprogramming.org/Traversing_Subsets_of_a_Set

svat · on Oct 28, 2022

And as expected, this one is covered by Knuth in his "Bitwise Tricks and Techniques" section of TAOCP (pages 133–202 of Volume 4A, which start with "Now comes the fun part"). Specifically, this one is equation (84) on page 150, or page 18 of the draft (pre-fascicle 1a) here: https://cs.stanford.edu/~knuth/fasc1a.ps.gz

Note that as he mentions, this method actually walks through the subsets of a set in "proper" order (increasing order of the numbers): the operation (x-X)&X actually finds the next largest x' that is a subset of X.

I had to work out an example in detail to understand it better: Suppose we're running through subsets of {2, 3, 5, 7}, represented by the bitset X=0b10101100:

    (76543210)
    …10101100

Suppose we're currently at the subset {2, 5}, represented by x = 0b100100:

       (543210)
    x = 100100

Then we want to go to the next subset {3, 5}. We can do this by:

• Taking (x|X̅), in this case:

            (9876543210)
    (x|X̅) = …1101110111

• Adding 1 to it:

              (9876543210)
    (x|X̅)+1 = …1101111000

• Then keeping only the ones that are still compatible with X:

                 (543210)
    ((x|X̅)+1)&X = 101000

With some algebra we can see that we get the same result by doing (x-X)&X.

Basically, "the next subset" means "find the smallest element not in the current subset, add it to the current subset, and remove everything smaller than it" which is exactly what's going on with the +1 and carries.

moonchild · on Oct 28, 2022

Wow, that's beautiful. On modern hardware, I would probably have just used the pext hammer, which would perform fine, but not with anything approaching the elegance of that solution.

chucksmash · on Oct 28, 2022

I could swear this site shut down for lack of hosting resources in early 2019. All I can find now in reference to this is a reddit thread saying it was DOS'd around that time. I remember when I could access the site intermittently, they were displaying a banner saying they were going to take it offline permanently.

Wonderful to see things didn't work out that way, thanks for posting.

thom · on Oct 28, 2022

Oh, that would be a tragedy. Honestly it is up there with c2.com for me in the hours I've spend just marvelling at people's knowledge!

rockwotj · on Oct 28, 2022

I used this trick to land my first Silicon Valley when I was asked to generate the power set in an interview. It seemed to impress my interviewer a lot.

raphlinus · on Oct 28, 2022

I love these kinds of things and use them in GPU programming, among other things. Things have changed in a variety of ways: population count and count-trailing-zeros are generally available as fast instructions now. Multiply is also now just as fast as other operations, so is not to be avoided.

A couple examples. [1] computes the sum of the number of bytes used of four consecutive segments of a bezier path - each segment can be lineto, quadto, or curveto, can be the end of a path or not, and be i16 or f32. 4 tag bytes are packed into a 32 bit word, it computes all these, then sums them together.

[2] linearizes a recursive subdivision into an iterative loop. The stack depth is represented as the number of zeros in a word, so pushing the stack is a left shift, and popping is a right shift. It turns out you want to pop multiple levels at once, and the number of levels is computed by countTrailingZeros. ([2] is experimental Rust code, but I will adapt this into a compute shader)

[1]: https://github.com/linebender/piet-gpu/blob/main/piet-wgsl/s...

[2]: https://github.com/linebender/kurbo/blob/euler/src/euler.rs#...

BeeOnRope · on Oct 28, 2022

Multiply is not as cheap as other arithmetic operations such as addition yet, though it has certainly gotten a lot cheaper (and many of these older bithack guides target CPUs that may not have a multiply at all).

As an example, contemporary Intel CPUs can do an addition in a single cycle (latency) but multiplies take 3. They can do 4 independent additions every cycle, but only one multiply.

moonchild · on Oct 28, 2022

That's true on CPUs, but I think the parent was talking about GPUs. AFAIK (I have not been able to get very much detailed information on GPU performance characteristics), GPUs tend to do better at such things, in part due to the great width and lower clocks (eg trig functions in just a few clock cycles).

AVX512 may not have killed the GPU market, but consider SIMD on CPUs for flavour: float FMA and add have identical throughput on intel.

nullc · on Oct 27, 2022

If you like this, you'll love the book "Hacker's Delight": https://en.wikipedia.org/wiki/Hacker%27s_Delight

bugfix-66 · on Oct 27, 2022

And you may enjoy the BUGFIX-66 puzzles, e.g.:

Upside-Down Arithmetic Shift: https://bugfix-66.com/76b563beb6f4e61801fce4e835be862fb3dbbe...

Too Much Padding: https://bugfix-66.com/4873075d23c096af3e6103b1b883f65a32f7b7...

XOR Swap Misguided By Mask: https://bugfix-66.com/762ee0db0776b3450e01cb05622d136db78554...

Fumbling Bit Fields: https://bugfix-66.com/014fee7b1b170fa85315640df44a1049af23a8...

The Bit That Decides It: https://bugfix-66.com/fdb8bb4fa84cf810aa25ff40c88a13c1874410...

Noncontiguous Subtraction (Not Quite): https://bugfix-66.com/adbc40fa7c8c838d8d34bcd35525313dc91dea...

And so on...

sbf501 · on Oct 28, 2022

That book is pretty amazing. I have it, and use it as a reference, but it is one of those things where you don't know if a trick exists until you've read about it, because many of the optimizations are completely non-intuitive. I've tried reading the book cover to cover, and it is not a casual read. I mean, it's know "Knuth's Computing" series, but it is for series embedded enthusiasts.

EDIT: I was just comparing the book and this site and the site is basically a subset.

nullc · on Oct 28, 2022

I read it from cover to cover (well, I'm sure I skimmed some details) and it paid off very quickly-- multiple times in the following months I encountered problems that I knew the book had solutions to.

e.g. page 246-247 has some newtons method based modular inverse mod 2^n.

_448 · on Oct 27, 2022

Here are two more(most of these websites link to each other):

http://aggregate.org/MAGIC/

https://www.inwap.com/pdp10/hbaker/hakmem/hakmem.html

I think this trend of publishing bit twiddling tricks started when Dr. Dobb's Journal in it's early days published an article about various such tricks. If someone has a link to that article then please could you share it? I think the article's title was "Bit Magic".

Edit:

There were two DDJ articles:

"Binary Magic Numbers - Some Applications and Algorithms" by Edwin E. Freed in Volume 8, Number 4, April, 1983; and

"More on Binary Magic Numbers" by Dale Wilson from Volume 9, Number 3, March, 1984.

philipkglass · on Oct 28, 2022

There are scans of DDJ on on archive.org.

"Binary Magic Numbers":

https://archive.org/details/dr_dobbs_journal_vol_08/page/176...

"More on Binary Magic Numbers":

https://archive.org/details/dr_dobbs_journal_vol_09/page/202...

jihadjihad · on Oct 27, 2022

Great page. A few years ago I saw it posted here on HN and bookmarked it, and would play around with a few of the (simpler) examples every now and then.

Last year I was in an interview and something around writing a program to find out if an integer is a power of 2 came up. I remembered the example from the doc above and used that (you basically AND your integer with itself minus one, which makes sense because any power of 2 is going to be 1 followed by some zeroes...see below). I got the job!

fernly · on Oct 28, 2022

Hah. Back in the stone ages, I posed this question in my column at Dr. Dobb's Journal and I have never forgotten the best reply. The above algorithm in Forth is

DUP DUP NEG AND =

Jach · on Oct 28, 2022

For a while I experimented with giving easy problems like that during 'phone' screens (usually over google meet, with a shared text editor link), and whatever their first answer was I always asked if they could think of another way to implement it. I liked the simple "is this odd/even?" question, and could prod for alternates by asking to use or not use a loop/modulus. My favorite alternate answer was someone doing the bit-and moral equivalent in base 10 by converting the number to a string and checking the last character. I felt like it gave good bang-for-buck with respect to https://sites.google.com/site/steveyegge2/five-essential-pho... in that it takes hardly any time, it's probably about as good as fizzbuzz in catching out applicants who for whatever reason literally can't seem to program (only encountered two of those but that was only after someone else had screened them and passed them on), and it's an opportunity to show awareness of the 'bit nature' of these things (not very important to the work and lacking it isn't a sufficient reason to reject, but all else equal, I'd rather take someone with that awareness than someone without).

tslater2006 · on Oct 27, 2022

> 10000

> & 1111

> ______

> 1

Wouldn't this result in 0 and not 1?

knome · on Oct 28, 2022

Missing a 'not'.

You would use `bool( nn and not (nn&(nn-1)) )`

Let's try for 15

    start 0b1111
    sans1 0b1110
    anded 0b1110
    not   0b0000

Now 8

    start 0b1000
    sans1 0b0111
    anded 0b0000
    not   0b0001

It's basically a claim that, in binary, only powers of two won't have any overlapping bits between the initial number and that number minus one.

Doesn't work for 0 so you have to special case it.

sumnole · on Oct 28, 2022

0 is correct. AND(bit1,bit2) returns 1 if and only if both input bits are 1.

AND'ing each bit of the two numbers yields only 0 bits:

ZoomZoomZoom · on Oct 28, 2022

For all interested in the topic , there's a renowned classic of a cookbook: "Matters Computational" by Jörg Arndt.

Available freely: http://www.jjj.de/fxt/#fxtbook

tromp · on Oct 28, 2022

In the game of Connect-4, bit twiddling allows amazing efficiency in board representation, board hashing, and win detection [1].

[1] https://github.com/denkspuren/BitboardC4/blob/master/Bitboar...

zmodem · on Oct 28, 2022

Same for Othello: https://www.hanshq.net/othello.html#bitboards

metadat · on Oct 27, 2022

A classic, this page continues to serve as a top-notch reference on obscure, historical programmer tricks.