Hacker News new | past | comments | ask | show | jobs | submit login

> specific errors you'd reach by using a standard deviation

> ..."overweighing tail events."

That's it. Sigma puts too much weight on outliers, giving them the power to distort summary stats. MAD doesn't.




It's really pointless to argue about the "best" deviation algorithm, at least on the basis of how it responds to outliers. The process of identifying and ignoring/deprecating outliers isn't something that can or should be lumped in with a simple notion of deviation, be it RMS, MAD, SD, or whatever. Any simple algorithm that you come up with to represent one data set may fail badly with another for this reason (and others.)

Outliers need to be removed, or at least understood, before performing any statistical calculations.


I don't think that answer helps. How are you assigning "too much" weight on outliers, and the process behind deciding the right amount of weight? Can you think of any concrete examples?


Obviously "too much" depends on the subject matter.

But the point is that,

1. STD gives a lot more weight to outliers than MAD.

2. People constantly hear STD and think it means MAD, for all the reasons the article mentions.

The argument isn't that STD always gives too much. It is that it gives a lot more than people expect.

The extreme example given in the article is that a statistical process can have infinite STD, but finite MAD. In other cases, say income, the STD might be double the MAD. That's bad if you think STD means MAD.

Anyhow, this could be solved by educating people on what STD actually means, or by just using MAD. The article apparently thinks the latter is more practical, especially since the benefits of STD have decreased over time.


In this case what Taleb is concerned with is decision making. The right amount of weight is what allows human beings to make good decisions. He believes that MAD is much more intuitive to humans and therefore leads to better decisions.


In that case, I'm looking for a concrete example.


What's an example?


2,-2,2,-2,2,-2,2,-2,2,-2,2,-2,2,-2,2,-2,2,-2,-1000

MAD: 54.5 STDV: 229.4

edit: OK I get that you wanted an example of what "too much" weight is. If you're looking for "how much the next datapoint will deviate from the mean, on average", then the MAD will tell you that, not the STDV. Except in some specific fields (maths, physics), people are much more interested in the MAD than the STDV, but all they get to make decisions is the STDV.


In many cases outliers are extremely important. One that comes to mind is high spenders in mobile games.

Trust me, if analysis was as simple as getting rid of outliers, treating everything as Gaussian, and retrieving simple summary statistics, then good data scientists wouldn't be paid $150k+ :)


same thing with venture capital ;) sometimes average are uninteresting, you just want one good outlier


The MAD of that data is not 54.5.

Here's how you calc MAD:

(1) Find the median of your data which is -2

(2) Generate the absolute deviations of your data from this median which is {4,0,4,0,4,0,4,0,4,0,4,0,4,0,4,0,4,0,998}

(3) Find the median of the absolute deviations which is 4.

It's ironic that Taleb prefers a statistic that ignores extreme examples (i.e. black swans) but he nevers seems to make sense to me. I've found MAD useful in dealing with noisy data.


> It's ironic that Taleb prefers a statistic that ignores extreme examples (i.e. black swans)

No that it incorrect on two fronts.

1) MAD does not "ignore" extreme examples, it just weights them the same as other examples. Nassim argues that the weighting of extreme examples in STD is excessive and makes STD less intuitive. I really don't know how you could say that MAD "ignores" extreme examples - they obviously do influence MAD.

2) The act of computing MAD or STD on a sample of observations has no relevance to Black Swan theory. In Black Swan, Nassim defines a black swan event as an unexpected event of large magnitude or consequence. Hence, by definition, an event that has already been observed cannot be a black swan event.

To put it another way, Nassim's main point in Black Swan is that using historical observations to estimate forward risk renders one fragile to Black Swan events - you could use any dispersion metric and this is still the case.


I went with Taleb's proposed definitions: "Do you take every observation: square it, average the total, then take the square root? Or do you remove the sign and calculate the average?"

edit: apparently this is consistent with https://en.wikipedia.org/wiki/Average_absolute_deviation I'm not sure what you referred to


In my experience MAD refers to either Median Absolute Deviation or the Mean Absolute Deviation. I was using the median version which is a pretty common "robust" statistic. Although I have occasionally seen the mean version it seems to be less common in practice.

https://en.wikipedia.org/wiki/Median_absolute_deviation

Take a look at the Wikipedia you linked. No version of Average Absolute Deviation is consistent with Taleb's definition. No squaring, no square root. Sounds more like a geometric mean.

This is exactly what is so frustrating about Taleb. His ideas only partly makes sense. He often seems to see the problem but his solutions are poorly thought out. Of course, he thinks his solutions are perfect and everyone else is an idiot.


In what field do you work that the median absolute deviation is used at all, let alone more than the mean absolute deviation?

When he talked about mean absolute deviation being sqrt(pi/2) sigma did that not make it abundantly clear what he was discussing?

>No squaring, no square root. Sounds more like a geometric mean

Do you even know what the geometric mean is? (It has a root function so your statement just sounds stupid)

Dispersion functions are built off the distance function under the metric you want to use. Standard deviation uses the L2 metric, which implies a euclidean distance function. (L2 corresponds to summing pow(u-x,2) and pow(sum,-2) as your functions)

Mean absolute deviation takes the L1 metric, which implies pow(1) and pow(-1). This becomes summing pow(abs(u-x),1) and then pow(sum,-1), which, needless to say is the same thing as averaging the absolute differences.

Hence the lack of any squaring or square rooting


It depends on whether the numbers provided are the actual data points themselves, or the deviation from median (the second is what the article provided).


Also mildly ironic for Taleb to ignore black swans after writing a book about black swans ;)


>Except in some specific fields (maths, physics), people are much more interested in the MAD than the STDV, but all they get to make decisions is the STDV.

com'n guys, it all comes down to whether you like more romb or circle :) Interesting that MOND (modified Newtonian), if true, would suggest that a circle at very big distances looks like square (notice not like romb :), so physics may start to like it more.





Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: