A discussion of systems programming optimization that doesn't start with single-writer ring buffers starts on the wrong foot.
Those other tricks are excellent, and I use all of them, in cases where they work at all. But, e.g., seeking a way not to need to take a lock at all should come before discovering a low-contention locking protocol.
Readers should note that packing spare bits into the bottom bits of suitably aligned pointers is more portable than using high bits. Any page-aligned pointer has at least 12 bits free at the bottom, and any malloc'd pointer has at least 2, more often 4.
Ring buffer counters into a power-of-2 sized buffer can be incremented without bound, enabling use of ordinary arithmetic on them, and high bits masked off cheaply on each use. [But use 64 bits!]
Probably the most neglected primitive data structure is the bitmapped set. A `uint32_t` gives you a universe of 32 elements; a byte is enough for the days of the week. The popcount native instruction is very valuable here, usually expressed as `__builtin_popcount` in source code. C++98's `std::bitset` provides Standard, portable access to it, but C++20 offers `std::popcount` directly.
[I add here that storing things in high bits of addresses is very likely to fail on systems with ASLR, and that I have learned MSVC bitsets have a very slow popcount.]
std::bitset is one. But you can just use an unsigned int, or array of them. In C++ or C, set intersection is &, union is |, complement is ~. Cardinality is __builtin_popcount, or std::bitset<>::count(). Membership test for m is (s & (1<<m)). For a larger set, represented as an array of unsigned, it's (s[m>>5] & (1<<(m&0x1f)). std::bitset encapsulates all this, optimally. There are similarly efficient implementations for lots of other useful operations: consult HAKMEM.
Bitsets are useful in more places than you might guess, because they can be used to filter out, with typically a single instruction, a majority of uninteresting cases in searches, leaving fewer cases to be examined with more precise and expensive operations. You record interesting properties of elements of a collection upon insertion as bits in a word stored alongside (or, better, in an index); and then first check the bits during any search. It is easy to speed up an amazing variety of programs by an order of magnitude, sometimes much more, with a tiny change.
For example, you can store an int value for each of a (possibly very large) collection of lines of text, representing the set of less-common letters that appear in the line. Searching for lines that contain a string, you first check it against the set for the string. Any line that doesn't have them all certainly doesn't have the string.
Leibniz (inventor of calculus) used this method very heavily in his own work. Before computers--and even into the 1960s--it was the most important way of automating data processing. Back then, you used cards that had a row of either holes or notches along one edge, and selected matching cards from a stack by inserting a rod through the stack at a selected spot, and lifting.
>Leibniz (inventor of calculus) used this method very heavily in his own work. Before computers--and even into the 1960s--it was the most important way of automating data processing. Back then, you used cards that had a row of either holes or notches along one edge, and selected matching cards from a stack by inserting a rod through the stack at a selected spot, and lifting.
Hey, do you have a reference for that? I've been doing some research into Leibniz's calculators, and I've been finding few sources.
There is quite a lot about Leibniz's calculating machine designs on Wikipedia.
I think I found out about Leibniz's bitwise activities in Neal Stephenson's Quicksilver, but he invented the modern notions of both sets and digital logic, according to Wikipedia. He would have used the cards in catalogging libraries.
Yup I used bit sets to write a super duper fast Sudoku solver. For examples you can detect naked singles extremely fast by just and-ing the bitsets in a given row, column or house, taking the complement, and checking if a unique bit is set.
But they used to be essential for automating chess. A 64-bit word represents a board, and a set of words represents the positions of each type of piece, one word for white pawns, etc. Another word shows places threatened by the white pawns, and ORing them shows all the threatened places.
A discussion of systems programming optimization that doesn't start with single-writer ring buffers starts on the wrong foot.
Those other tricks are excellent, and I use all of them, in cases where they work at all. But, e.g., seeking a way not to need to take a lock at all should come before discovering a low-contention locking protocol.
Readers should note that packing spare bits into the bottom bits of suitably aligned pointers is more portable than using high bits. Any page-aligned pointer has at least 12 bits free at the bottom, and any malloc'd pointer has at least 2, more often 4.
Ring buffer counters into a power-of-2 sized buffer can be incremented without bound, enabling use of ordinary arithmetic on them, and high bits masked off cheaply on each use. [But use 64 bits!]
Probably the most neglected primitive data structure is the bitmapped set. A `uint32_t` gives you a universe of 32 elements; a byte is enough for the days of the week. The popcount native instruction is very valuable here, usually expressed as `__builtin_popcount` in source code. C++98's `std::bitset` provides Standard, portable access to it, but C++20 offers `std::popcount` directly.
[I add here that storing things in high bits of addresses is very likely to fail on systems with ASLR, and that I have learned MSVC bitsets have a very slow popcount.]