Hacker News new | past | comments | ask | show | jobs | submit login

You can even send the code lengths with at worst one bit per symbol, by sending the number of internal nodes at each level of a complete binary tree structure, this can be a concatenation of unary numbers and there's n-1 internal nodes for a complete binary tree of n leaves.

Your example isn't a complete/perfect tree, one of the length 4 symbols could be 3, for level sequence 1,1,1,2. Converted to internal node counts (excluding root) this is 1,1,1 and could be encoded 0,0,0 (commas for illustration only). That's a little boring, another example is:

    A = 00
    B = 01
    C = 10
    D = 110
    E = 111
level sequence 0,3,2, internal node counts 2,1, encoded as 10,0.

I'm not sure if this is well known, I thought I'd discovered it but there's an equivalent coding in Narimani and Khosravifard 2008, "The supertree of the compact codes."

A. You can avoid encoding the number of leaves by ending the string with impossible sequence 0,11, that is 1 internal node branching into at least 3 at the next level.

B. Or, you can use this as a 1-to-1 mapping between binary strings and canonical trees. If you read 2m-1 1s after a row with m internal nodes, you know this row has 2m so the final 0 isn't needed. (e.g. 2,1 would just be 1,0 instead of 10,0). This can't be combined with A since there are no longer impossible sequences, but at most (on a tree with all symbols the same length) it only saves lg(n) bits, so if you want to save bits you're better off with A.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: