Hacker News new | past | comments | ask | show | jobs | submit login

Information theory to the rescue! Well, not really.

8 decimal digits takes 26.6 bits. An ordered list of 10^6 of these takes 3.17 MiB. The information contained in the ordering is lg(10^6!) ~= 2.20 MiB [0]. So as an unordered multiset, the information content is 0.96 MiB. It's at least theoretically possible to store the input in 1 MiB of working memory. But only just; in fact it's significant that the problem specifies 2^20 bytes, because for an SI megabyte (10^6 bytes), it wouldn't work.

I don't think it's actually possible though. The answers here don't do it. LZMA accomplishes nothing.

[0] Stirling's approximation in base 2: lg(n!) ~= n lg(n) - n/ln(2)




> The information contained in the ordering is lg(10^6!)

Can you point me to a resource that discusses the information theory behind this claim? I'm interested in learning more, but don't know what to search for.


You have N elements, so they have N! possible permutations. The ordering of the list is one such permutation; to describe one message out of a language of N! possible messages, it takes lg(N!) bits.


Interesting. Thanks for explaining. Are there ways to store (multi)sets that take advantage of this fact to reduce size? I can't envision a way to utilize this information meaningfully. It seems like any storage scheme would imply some ordering.


Yes, the other commenter ChuckMcM showed one

https://news.ycombinator.com/item?id=4680259

An ordering of a storage scheme doesn't always store information. E.g. if the list is sorted, the ordering is completely determined by the data -- it's redundant.

For storing integers, an idea to is store the pairwise differences between sorted elements. E.g. [30,10,20] -> [10-0,20-10,30-20] = [10,10,10]. If you have N integers of typical size X, the differences will be much smaller, typically on the scale of X/N. With variable-width integers, X/N takes lg(N) fewer bits to encode than X. So you save N lg(N) bits (asymptotically the same as lg(N!)), compared to storing the integers literally.


> Yes, the other commenter ChuckMcM showed one

I saw that comment. I'm not clear how that's not still storing ordered data, or rather how the order there is not storing information.

Obviously my information theory knowledge is weak.

> if the list is sorted, the ordering is completely determined by the data -- it's redundant.

I don't understand this. In what way is the information redundant? If there's 3.17 MB of data in the ordered list, and 2.2 MB of data in the ordering itself, the information stored in the ordering cannot be redundant, because that would mean >4.4MB of information is stored in the ordered list.


> So as an unordered multiset, the information content is 0.96 MiB

Could you please explain this more or point to other links?


The second answer does it.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: