Hacker News new | past | comments | ask | show | jobs | submit login

Okay, that makes sense. As the author I intentionally write beginner material with some slop to convey the intuition rather than exact lemmas. This is not what you're looking for and that's fine.

I still will keep your criticism at the back of my head and be more wary about sweeping generalisations going forward. Thanks.

It would be nice if someone thought of all edge cases and wrote a formally correct treatment, though! (The statistician's version rather than the practitioner's version, I suppose.)




I'll just leave a final comment: if you restrict yourself to the arithmetic mean, then you can use Cantelli's inequality to make some claims about the distance between the expectation and the median of a random variable in a way that only depends on the variance/st.dev.

See: https://en.wikipedia.org/wiki/Chebyshev%27s_inequality#Cante...

On the other hand, you do not actually know the (population) expectation or (population) variance: you can only estimate them, given some samples (and, quite often, they can be undefined/unbounded).

Also, as I was trying to demonstrate in my previous comment, most "averages" are poor estimators for the expectation of a random variable (compared to the arithmetic sample mean), the same way that min(data) or max(data) are poor estimators for the expectation of a random variable, so it seems a bit "dangerous" to make such a general broad claim (again, in my humble opinion).


I was not aware you were the author. I apologize if anything in my delivery came across as harsh.

I would just suggest considering whether the "any (sample) average is a rough approximation of the (population) median" is a necessary claim in your exposition (particularly as it is stated).

Given this is supposed to be "beginner material", it would seem important not to say something that can mislead beginners and give them an incorrect intuition about "averages" (in my humble opinion). Note that adding the "but only for 'stable' distributions" caveat doesn't really solve things, since that term is not clearly defined and begginers would certainly not know what it means a priori.

I know this may came across as pedantic or nitpicky, but I would really like you to understand why such a general statement, technically, cannot possible be true (unless you really extend the meaning of "roughly"). When I read what is written, I see two claims, in fact (marked between curly braces):

> A statistic known as “average” is intentionally {designed to fall in the middle of the range}. {Roughly half of your measurements will be above average, and the other half below it}.

The first claim suggests that any average approximates the "midrange" (i.e., 0.5*(max(data)+min(data)), a point that minimizes the L_inf norm w.r.t. your data points). The second claim suggests that any average approximates the "median" (i.e., a point that minimizes the L_1 norm w.r.t. your data points).

The main problem here, as I see it, is that there is an infinite number of different possible means, densely convering the space between min(data) and max(data). Thus, unless you are ok claiming that both min(data) and max(data) are reasonable rough estimates of the median and the midrange, you should avoid such strong and general claim (in my humble opinion).

Note: you can choose a "generalized mean" that is arbitrarily close to min(data) or arbitrarily close to max(data); for example, see https://en.wikipedia.org/wiki/Generalized_mean

Either way... I lied... I did read some of the rest, and some of it was interesting (particularly the part about the magic constant), but the lack of formal correctness in a few claims did put me off from reading through all of it.

Once again, have a nice day, and please don't be discouraged by the harshness of my comments.


I really do appreciate the criticism. You're factually correct, of course!

I also see now that statement about means comes off as more definitive than I meant it to be. When I find the time to I will try to soften the wording and make it clear that it's not strictly true.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: