Hacker News new | past | comments | ask | show | jobs | submit login

> What am I missing? Clearly this cannot work because otherwise everyone would do this as we accept O(NlogN) as the norm!

Let's assume a perfect hash that never collides or leaves gaps, the best case for this.

For a radix sort in some base with K digits, N can only go so high before you have duplicates. Specifically, base^K >= N

Take the log of both sides. K >= logN

So with arbitrarily large numbers the time taken to sort is actually NlogN.

With fixed-size numbers it's "O(K*N)" but K is larger than logN would be.




I see. Then Radix Sort never really was faster than traditional sorts if we go by Big-O notation alone, right?

If I understand your post correctly, Radix sorts in O(log(maxsize) * N) instead of O(NlogN). So why do people say Radix Sort is O(N)?


People say that because it's true, for a bounded key size, and in practice many radix sort implementations have a large fixed key size, so the log factor just doesn't come up.

E.g., if you use a 64-bit key, with a default implementation, you support up to 2^64 elements with your O(n) sort. No one has a machine with more than 2^64 elements today, so the the sort is in practical terms very much like O(n).

OTOH the O(n log n) of comparison based sorts are very real: the log term is in n, the number of inputs, and it is very obvious when you plot sort performance.

What is interesting though is when you start optimizing radix sort in certain ways, e.g., eliminating passes based on the high bits being zero (in an LSD sort) or using an MSD sort which stops when the bucket size gets smaller than some threshold, the log n factor actually appears again: since you often need ~log n passes to complete the sort rather than say a fixed number like log 64 (base = digit bits) = 8 for 64-bit keys. The base of the logarithm is larger though (2^radix bits), so the hidden constant factor is lower than comparison sort where the base is generally 2.


One answer is that if you allow duplicates then it can handle larger amounts of items without any slowdown.

Another answer is that people are silly and constant vs. log factors aren't intuitive.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: