Hacker News new | past | comments | ask | show | jobs | submit login

Note that the range of integer values to consider is not specified. We don't know if they are signed and 32 or 64bit values.

The histogram is indeed the best algorithm and the one I would use. I call it the hash select. Note that it may be applied recursively, by narrowing the bins. Quick select is then in fact the hash select with only two bins.

Note also that since histogram filling is distributed, one needs to add up the histograms to find the one containing the median value. This is a good reason for not using histograms with 2^32 or 2^64 bins.

A 1024 bin histogram, would allow to find the median value of 32 bit integers in at most 4 iterations, and 64bit integers in at most 7 iterations.

The algorithm to identify the bin containing the median value is then very similar. One adds up the number of all lower values, until the bin containing the n/2 th value is found. This bin contains the median value.

Instead of adding up all histogram, which may be done pairwise (OlogN), one could do this for each bin, progressively, until the bin containing the median value is found. I guess there is a nice optimal and distributed algorithm to find out there. But we get the general picture.

Another optimization would be to adjust the histogram boundaries to fit the biggest and lowest value in the value set.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: