Very clever but replacing O(1) with O(log n) for updates is probably worse than the benefit gained by O(log n) range-sum instead of O(n) when the updates are happening constantly, in real-time - think Twitter Firehose. In practice, it would make a lot more sense to pre-define the buckets and update those when doing insert/update/delete. Even 1m buckets is just 1-4MB of data, depending on how many counts do you want (2^8 - 2^32). That would give you O(1) update and O(k) range-sum where k = number of buckets. You want k to be smaller than n otherwise you're back to the naive implementation.
Fenwick trees were invented for context modeling with arithmetic coding. Reads and writes are perfectly balanced. For every symbol, there's one write to update the model and one read to determine the cumulative probability.
A Fenwick Binary Indexed Tree allows you to calculate
the sum of elements in an array for any given index
range in O(Log(N)) time
I can modify a B-tree so each branch node keeps track of the sum total of values of nodes under it and the absolute number of nodes under it. It's pretty easy to find the sum of elements within a range, the range of elements that equal a given sum and so-forth. All O(log(n)).
It's not just B-trees (and really, it would be simpler in a binary tree instead of a B-tree). In general, you can get those same asymptotic times with any balanced tree that stores at each node its number of children and their sum. The term for this kind of data structure is a monoid-cached tree, and it's a broadly useful pattern for doing fast incremental updates in a lot of different situations. Guy Steele has a few good examples in this talk (slides linked below the video):
Obviously, B-trees have many other uses. But, i) Fenwick algorithm is more efficient when it comes to the hard numbers (the k is small), ii) it is very easy to implement.
The implementation of that b-tree is actually pretty difficult (I've attempted in the past, but it depends on your skill of course) and the constant factor is much larger. (It's somewhat difficult to implement because when you remove a node, tricky things happen and you need to keep the sub-tree sizes consistent). Granted, whatever floats your boat!
Sure B-trees are hard to write from the ground up but there are plenty of good implementations already the Internet that can be modified.
The extra functionality is a simple enough addition to any binary-tree-type data structure that I was annoyed that Boost/Qt didn't have it already included.
As far as I can tell, this is almost the exact same idea, but makes some optimizations by flattening the tree into an array ahead of time (since we know every leaf will be present) and changing the bookkeeping slightly.
But the method I give is pretty generic rather than special-purpose. Such a generic method have both the advantage that they can be extended relatively easily and the advantage that you can understand what's happening relatively easily, allowing easier debugging.