Using Uninitialized Memory for Fun and Profit (2008)

userbinator · on Dec 7, 2014

Due to multilevel caches I doubt this would be so applicable to modern systems - the index-set representation takes several times more memory and naturally requires lots of random accesses, which are bad for caches. In some ways, it's no longer a time-space tradeoff: smaller is faster.

Also, initialising the space doesn't take so long; REP STOS on x86 is fast since it can write an entire cache line at once: http://ptspts.blogspot.ca/2014/10/on-speed-of-memset.html

prestonbriggs · on Dec 7, 2014

I'll bet it's still faster in plenty of circumstances. It's hard to beat superior asymptotic complexity.

frozenport · on Dec 7, 2014

The 'magic' here is that you can quickly iterate over the data because it lives in 'dense'. In dense cases 'm'='n' and we loose. The apparent faster clear performance is because the length ('n') is stored as an auxiliary value, which is not unique to this scheme. As pointed out before, the extra logic will break vectorization and result in poor performance.

prestonbriggs · on Dec 7, 2014

You can't just assert "we win" or "we lose"; you need to measure different the different possible implementations for your particular application. In the paper I wrote with Linda Torczon (cited by Russ Cox), we did exactly that. In that application (a graph coloring register allocator), we were very sparse (n was significantly less than m) and we cleared the working set quite often - the win was pretty significant.

IsTom · on Dec 7, 2014

It's true if a reasonable part of data can fit into cache. What if you've got a 2 billion subset of 64-bit integers?

prestonbriggs · on Dec 7, 2014

If your universe has 2^64 elements, you don't want to use bit vectors or this scheme due to giant memory consumption. Instead, you'll probably want to use a hash table.

One data structure can't be the best for every situation. It's why they pay engineers the big ducks [sic].

gizmo686 · on Dec 7, 2014

Python uses a similar trick in its internal allocater (pymalloc) [1]. Given a memory address, pymalloc determines if it controls it by computing what would be the base of the arena containing said address. It then finds the index of the arena (or trash) at a constant offset from the base. It then uses the index to check a vector of all the arenas that it controls.

[1]http://svn.python.org/projects/python/trunk/Misc/README.valg...

crispweed · on Dec 7, 2014

Something in between traditional bit vectors and sparse sets is described here:

http://upcoder.com/9/fast-resettable-flag-vector/

This is designed for a fast clear, but doesn't actually have constant time clear, it's just O(n) clear with very small (amortized) constant. It's a bit more cache friendly, though, I think, and gave us some nice speedups in practice.

mannykannot · on Dec 7, 2014

I think I have seen this method used in a data structure called something like a 'B.* set' or 'B.* index', but I can't remember the exact name. Does anyone have a clue as to what I am (mis)remembering?

Update: I remembered that the algorithm had something to do with checking if an item is not in a set, which quickly led me to Bloom Filter - not particularly similar, as it turns out.

blt · on Dec 7, 2014

Ahhhh, I love hacks like this :) But it is super branchy; I would definitely profile before assuming it's an improvement.

im3w1l · on Dec 7, 2014

Reading from uninitialized memory in C seems to be a bad idea:

http://blog.frama-c.com/index.php?post/2013/03/13/indetermin...

crispweed · on Dec 7, 2014

I didn't read all that, but it seems like the issue there is that, if the compiler sees you are reading from uninitialised memory, then it can effectively 'choose' a value for the contents of that memory to suit it's optimisation purposes. Is that right?

If so, then this isn't actually a problem in the case of sparse sets. With sparse sets we don't care what is in the uninitialised memory and if the compiler wants to get tricky and choose values for this memory (which I don't think actually applies in this context) that doesn't change the correctness of the data structure..

marvy · on Dec 7, 2014

If you read it, you'll see that things are much worse than that.

crispweed · on Dec 7, 2014

Ok, yes. Reading to the end of stuff is a good thing. :)

deckar01 · on Dec 7, 2014

> “Xoring an uninitialized variable with whatever other source of entropy you already have cannot hurt”, the conventional thinking goes. Conventional thinking is wrong. Your typical modern compiler deletes the code that gathers the original entropy, since it is only going to be xored with an uninitialized variable.

The sparse array in the OP can't be optimized in this way at compile time.

gizmo686 · on Dec 7, 2014

You should be able to avoid those issue if you use a volotile variable.

IsTom · on Dec 7, 2014

It only concerns automatic objects though, right?