Hacker News new | past | comments | ask | show | jobs | submit login
Bit Twiddling Hacks (2005) (stanford.edu)
136 points by memorable on Oct 27, 2022 | hide | past | favorite | 25 comments



My favourite bit of bit twiddling doesn't seem to appear here - how to enumerate all the subsets of bits set. Like so many bit-twiddling hacks, I first saw this in chess engines:

https://www.chessprogramming.org/Traversing_Subsets_of_a_Set


And as expected, this one is covered by Knuth in his "Bitwise Tricks and Techniques" section of TAOCP (pages 133–202 of Volume 4A, which start with "Now comes the fun part"). Specifically, this one is equation (84) on page 150, or page 18 of the draft (pre-fascicle 1a) here: https://cs.stanford.edu/~knuth/fasc1a.ps.gz

Note that as he mentions, this method actually walks through the subsets of a set in "proper" order (increasing order of the numbers): the operation (x-X)&X actually finds the next largest x' that is a subset of X.

I had to work out an example in detail to understand it better: Suppose we're running through subsets of {2, 3, 5, 7}, represented by the bitset X=0b10101100:

    (76543210)
    …10101100
Suppose we're currently at the subset {2, 5}, represented by x = 0b100100:

       (543210)
    x = 100100
Then we want to go to the next subset {3, 5}. We can do this by:

• Taking (x|X̅), in this case:

            (9876543210)
    (x|X̅) = …1101110111
• Adding 1 to it:

              (9876543210)
    (x|X̅)+1 = …1101111000
    
• Then keeping only the ones that are still compatible with X:

                 (543210)
    ((x|X̅)+1)&X = 101000
With some algebra we can see that we get the same result by doing (x-X)&X.

Basically, "the next subset" means "find the smallest element not in the current subset, add it to the current subset, and remove everything smaller than it" which is exactly what's going on with the +1 and carries.


Wow, that's beautiful. On modern hardware, I would probably have just used the pext hammer, which would perform fine, but not with anything approaching the elegance of that solution.


I could swear this site shut down for lack of hosting resources in early 2019. All I can find now in reference to this is a reddit thread saying it was DOS'd around that time. I remember when I could access the site intermittently, they were displaying a banner saying they were going to take it offline permanently.

Wonderful to see things didn't work out that way, thanks for posting.


Oh, that would be a tragedy. Honestly it is up there with c2.com for me in the hours I've spend just marvelling at people's knowledge!


I used this trick to land my first Silicon Valley when I was asked to generate the power set in an interview. It seemed to impress my interviewer a lot.


I love these kinds of things and use them in GPU programming, among other things. Things have changed in a variety of ways: population count and count-trailing-zeros are generally available as fast instructions now. Multiply is also now just as fast as other operations, so is not to be avoided.

A couple examples. [1] computes the sum of the number of bytes used of four consecutive segments of a bezier path - each segment can be lineto, quadto, or curveto, can be the end of a path or not, and be i16 or f32. 4 tag bytes are packed into a 32 bit word, it computes all these, then sums them together.

[2] linearizes a recursive subdivision into an iterative loop. The stack depth is represented as the number of zeros in a word, so pushing the stack is a left shift, and popping is a right shift. It turns out you want to pop multiple levels at once, and the number of levels is computed by countTrailingZeros. ([2] is experimental Rust code, but I will adapt this into a compute shader)

[1]: https://github.com/linebender/piet-gpu/blob/main/piet-wgsl/s...

[2]: https://github.com/linebender/kurbo/blob/euler/src/euler.rs#...


Multiply is not as cheap as other arithmetic operations such as addition yet, though it has certainly gotten a lot cheaper (and many of these older bithack guides target CPUs that may not have a multiply at all).

As an example, contemporary Intel CPUs can do an addition in a single cycle (latency) but multiplies take 3. They can do 4 independent additions every cycle, but only one multiply.


That's true on CPUs, but I think the parent was talking about GPUs. AFAIK (I have not been able to get very much detailed information on GPU performance characteristics), GPUs tend to do better at such things, in part due to the great width and lower clocks (eg trig functions in just a few clock cycles).

AVX512 may not have killed the GPU market, but consider SIMD on CPUs for flavour: float FMA and add have identical throughput on intel.


If you like this, you'll love the book "Hacker's Delight": https://en.wikipedia.org/wiki/Hacker%27s_Delight



That book is pretty amazing. I have it, and use it as a reference, but it is one of those things where you don't know if a trick exists until you've read about it, because many of the optimizations are completely non-intuitive. I've tried reading the book cover to cover, and it is not a casual read. I mean, it's know "Knuth's Computing" series, but it is for series embedded enthusiasts.

EDIT: I was just comparing the book and this site and the site is basically a subset.


I read it from cover to cover (well, I'm sure I skimmed some details) and it paid off very quickly-- multiple times in the following months I encountered problems that I knew the book had solutions to.

e.g. page 246-247 has some newtons method based modular inverse mod 2^n.


Here are two more(most of these websites link to each other):

http://aggregate.org/MAGIC/

https://www.inwap.com/pdp10/hbaker/hakmem/hakmem.html

I think this trend of publishing bit twiddling tricks started when Dr. Dobb's Journal in it's early days published an article about various such tricks. If someone has a link to that article then please could you share it? I think the article's title was "Bit Magic".

Edit:

There were two DDJ articles:

"Binary Magic Numbers - Some Applications and Algorithms" by Edwin E. Freed in Volume 8, Number 4, April, 1983; and

"More on Binary Magic Numbers" by Dale Wilson from Volume 9, Number 3, March, 1984.


There are scans of DDJ on on archive.org.

"Binary Magic Numbers":

https://archive.org/details/dr_dobbs_journal_vol_08/page/176...

"More on Binary Magic Numbers":

https://archive.org/details/dr_dobbs_journal_vol_09/page/202...


Great page. A few years ago I saw it posted here on HN and bookmarked it, and would play around with a few of the (simpler) examples every now and then.

Last year I was in an interview and something around writing a program to find out if an integer is a power of 2 came up. I remembered the example from the doc above and used that (you basically AND your integer with itself minus one, which makes sense because any power of 2 is going to be 1 followed by some zeroes...see below). I got the job!

   10000
  & 1111
  ______
       1


Hah. Back in the stone ages, I posed this question in my column at Dr. Dobb's Journal and I have never forgotten the best reply. The above algorithm in Forth is

DUP DUP NEG AND =


For a while I experimented with giving easy problems like that during 'phone' screens (usually over google meet, with a shared text editor link), and whatever their first answer was I always asked if they could think of another way to implement it. I liked the simple "is this odd/even?" question, and could prod for alternates by asking to use or not use a loop/modulus. My favorite alternate answer was someone doing the bit-and moral equivalent in base 10 by converting the number to a string and checking the last character. I felt like it gave good bang-for-buck with respect to https://sites.google.com/site/steveyegge2/five-essential-pho... in that it takes hardly any time, it's probably about as good as fizzbuzz in catching out applicants who for whatever reason literally can't seem to program (only encountered two of those but that was only after someone else had screened them and passed them on), and it's an opportunity to show awareness of the 'bit nature' of these things (not very important to the work and lacking it isn't a sufficient reason to reject, but all else equal, I'd rather take someone with that awareness than someone without).


> 10000

> & 1111

> ______

> 1

Wouldn't this result in 0 and not 1?


Missing a 'not'.

You would use `bool( nn and not (nn&(nn-1)) )`

Let's try for 15

    start 0b1111
    sans1 0b1110
    anded 0b1110
    not   0b0000
    
Now 8

    start 0b1000
    sans1 0b0111
    anded 0b0000
    not   0b0001
It's basically a claim that, in binary, only powers of two won't have any overlapping bits between the initial number and that number minus one.

Doesn't work for 0 so you have to special case it.


0 is correct. AND(bit1,bit2) returns 1 if and only if both input bits are 1.

AND'ing each bit of the two numbers yields only 0 bits:

   10000
  &01111
  ______
   00000


For all interested in the topic , there's a renowned classic of a cookbook: "Matters Computational" by Jörg Arndt.

Available freely: http://www.jjj.de/fxt/#fxtbook


In the game of Connect-4, bit twiddling allows amazing efficiency in board representation, board hashing, and win detection [1].

[1] https://github.com/denkspuren/BitboardC4/blob/master/Bitboar...



A classic, this page continues to serve as a top-notch reference on obscure, historical programmer tricks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: