Little-Known Awesome Algorithms: Fenwick Range Trees

chime · on Sept 16, 2012

Very clever but replacing O(1) with O(log n) for updates is probably worse than the benefit gained by O(log n) range-sum instead of O(n) when the updates are happening constantly, in real-time - think Twitter Firehose. In practice, it would make a lot more sense to pre-define the buckets and update those when doing insert/update/delete. Even 1m buckets is just 1-4MB of data, depending on how many counts do you want (2^8 - 2^32). That would give you O(1) update and O(k) range-sum where k = number of buckets. You want k to be smaller than n otherwise you're back to the naive implementation.

psykotic · on Sept 16, 2012

Fenwick trees were invented for context modeling with arithmetic coding. Reads and writes are perfectly balanced. For every symbol, there's one write to update the model and one read to determine the cumulative probability.

randartie · on Sept 16, 2012

You're right, this technique is useful when the ratio of reads to writes is just right!

joe_the_user · on Sept 16, 2012

Hmm...

    A Fenwick Binary Indexed Tree allows you to calculate 
    the sum of elements in an array for any given index 
    range in O(Log(N)) time

I can modify a B-tree so each branch node keeps track of the sum total of values of nodes under it and the absolute number of nodes under it. It's pretty easy to find the sum of elements within a range, the range of elements that equal a given sum and so-forth. All O(log(n)).

Why the weirdo data structure?

pjscott · on Sept 16, 2012

It's not just B-trees (and really, it would be simpler in a binary tree instead of a B-tree). In general, you can get those same asymptotic times with any balanced tree that stores at each node its number of children and their sum. The term for this kind of data structure is a monoid-cached tree, and it's a broadly useful pattern for doing fast incremental updates in a lot of different situations. Guy Steele has a few good examples in this talk (slides linked below the video):

http://vimeo.com/6624203

utopkara · on Sept 16, 2012

Obviously, B-trees have many other uses. But, i) Fenwick algorithm is more efficient when it comes to the hard numbers (the k is small), ii) it is very easy to implement.

randartie · on Sept 16, 2012

The implementation of that b-tree is actually pretty difficult (I've attempted in the past, but it depends on your skill of course) and the constant factor is much larger. (It's somewhat difficult to implement because when you remove a node, tricky things happen and you need to keep the sub-tree sizes consistent). Granted, whatever floats your boat!

digitalnalogika · on Sept 16, 2012

I used this structure (although I knew it by totally different name) a lot in time constrained contests, the code is very short and easy to implement.

Also:

  idx -= (1 << ctz(idx));

I used to write this as:

  idx -= idx&-idx;

joe_the_user · on Sept 16, 2012

Sure B-trees are hard to write from the ground up but there are plenty of good implementations already the Internet that can be modified.

The extra functionality is a simple enough addition to any binary-tree-type data structure that I was annoyed that Boost/Qt didn't have it already included.

zvrba · on Sept 16, 2012

> Sure B-trees are hard to write from the ground up but there are plenty of good implementations already the Internet that can be modified.

Care to name some, also having a non-restrictive license? (That excludes GPL).

joe_the_user · on Sept 16, 2012

Try

http://www.touc.org/btree.html

This c++ template is fairly simple and I have tested it a reasonable amount.

bo1024 · on Sept 16, 2012

As far as I can tell, this is almost the exact same idea, but makes some optimizations by flattening the tree into an array ahead of time (since we know every leaf will be present) and changing the bookkeeping slightly.

joe_the_user · on Sept 16, 2012

I would imagine...

But the method I give is pretty generic rather than special-purpose. Such a generic method have both the advantage that they can be extended relatively easily and the advantage that you can understand what's happening relatively easily, allowing easier debugging.

chengsun · on Sept 16, 2012

This useful data structure confusingly goes by several different names.

Binary indexed tree: https://community.topcoder.com/tc?module=Static&d1=tutor...

Cumulative frequency table (which interestingly enough Simon Tatham claims to have independently invented): http://www.chiark.greenend.org.uk/~sgtatham/algorithms/cumul...