Using Uninitialized Memory for Fun and Profit

chmike · on Feb 28, 2010

That data structure can be actually used for a log structured database where the sparse vector is the log file and the dense vector is the record index.

The index of the dense vector is then the record identifier, and the dense vector contains the offset of the valid record in the log database.

The record identifier is a handle to the record data. It can be reused when the record is deleted but will remain the same for the lifetime of the record.

The dense vector is very compact and may be stored and retrieved efficiently, even if stored in the log itself. The record key index stores the association between the key and the record identifier.

The GC or crash recovery process can easily locate valid records by checking if their offset matches the one found in the dense index.

So this data structure is not just of historical interest. Thanks for the link.

antirez · on Feb 28, 2010

An interesting implementation of this trick can be found in the LZF compression lib: http://oldhome.schmorp.de/marc/liblzf.html

Basically the hash table used to compare the current input with already seen input is uninitialized, since the two values are anyway compared bit-per-bit to actually make sure there is a match.

jrockway · on Feb 28, 2010

malloc + memset takes about .24 seconds per gig on my machine. So be sure that your dataset is really, really sparse before you try this.

alttab · on Feb 28, 2010

I'm not sure if your comment suggests that .24 seconds is fast enough not to worry about it, or not fast enough.

In my time at IBM doing very performance oriented C development, saying that a server operation only took .24 seconds would get you laughed out of the development meeting as the operations were designed to take 12 microseconds.

jrockway · on Feb 28, 2010

The article is about latency; if you use its techniques and your dataset is not sparse, you will use more time and space overall. But, you'll have your first item in memory sooner.

There are few operations faster than writing a 0 to a memory address.

barrkel · on Feb 28, 2010

Writing 0 to a register is, and that is what this is trying to compete with in practice. It's an approach that scales beyond 32 or 64 members for certain kinds of operations.

ramchip · on Feb 28, 2010

Allocating and initializing a huge array of memory at start-up can take a second, and subsequent server operations take 12 microseconds. Nobody's talking about allocating a gig of memory every time a server gets a request.

alttab · on March 3, 2010

One thing to consider is we also aren't using small development boxes. These were Power7 zSeries servers with usually 16 cores (8 in hardware) and up to 32GB ram. You would be surprised what these machines are capable of. Of course, I'm not saying I can allocate a gig in .24 seconds and set it to zero, but I'm not saying its not possible either (I simply don't know as I no longer have access to these machines).

DougBTX · on Feb 28, 2010

Well, why not? I'm sure you could put a gig of space to use calculating an extended social network or what not. 0.2 seconds could be the difference between expanding use up to a couple of gig per request vs rewriting from scratch or not implementing the feature altogether.

tumult · on Feb 28, 2010

Also for people new to C programming, don't forget about calloc!

bdonlan · on Feb 28, 2010

It's worth noting that on modern OSes, the OS will forcibly zero out memory when it's allocated from the OS, so if you're allocating a really large sparse set (large enough to not come from existing free heap RAM), and don't intend to clear it a lot, sparse sets won't save any time, and calloc (which can skip the zeroing if it's getting new pages from the OS) will probably be a better idea.

barrkel · on Feb 28, 2010

.24 seconds isn't fast at all.

amalcon · on Feb 28, 2010

Per gig. How long will it take to do the rest of the processing to fill a 1G hashtable?

barrkel · on Feb 28, 2010

This is a bitset we're talking about, not a hash map of any kind.

barrkel · on Feb 28, 2010

Liveness analysis - the motivating example - is typically in the form of a set with one element per local variable. Most practical routines will have less than 32 or 64 variables, however, so even in practice a 32-bit or 64-bit word will be better. Also, you often have to compute other set operations, such as union, intersection and difference, rather than just membership and iteration.

rsc · on March 7, 2010

You still have to do something for the functions with many local variables. In those cases you have little choice but to allow all N variables into the set, even if comparatively few are in any particular set at one time.

Sparse sets are also a great way to implement NFA state sets, where again you have a large number of possible set members but most sets are small, and you don't want to pay the O(all possible states) cost over and over.