I confess that I found this article unhelpful. There are interesting tidbits in ...

TravisDick · on May 20, 2016

I agree. For both the MAD and STD, we are trying to reduce information about the "spread" of a distribution to a single number. Any such reduction must lose information, so you should pick whichever quantity is suitable for your needs.

E.g., in the article they mention that the Pareto distribution has finite MAD but infinite variance. This is meant to be an argument against using the STD, but actually the infinite variance tells us something really important that the MAD does not: the law of large numbers /does not apply/ for the Pareto distribution!

I think the real message should be to avoid blindly applying techniques and tools (especially formal ones) without thinking about why or what they capture.

scottlocklin · on May 20, 2016

Taleb is exasperating. Pareto-Levy distributions are statistical nihilism the way Taleb talks about them. Data is very often approximately normal. Or can be approximated with something like Student-T. That includes estimators for volatility in stock returns. If you assume your risk profile can be characterized with standard deviation, well, you're an asshole. It also can't be characterized with MAD.

Then you have stuff like this: "MAD is more accurate in sample measurements" -what does this even mean?

Elrac · on May 20, 2016

Thank you for saying this! I'm just a struggling armchair intellectual, but it seems to me like every half year Taleb comes up with something to loudly hand-wring about, something that nobody else gives a damn about because they're not in the attention whoring business.

eruditely · on May 21, 2016

No, he's completely legitimate actually. He's just so far ahead of people that they can't tell. There's a good Kahneman quote saying he's top 100 intellectuals.

http://realworldrisk.com/ http://realworldrisk.com/clients

https://scholar.google.com/citations?user=64BtMdsAAAAJ&hl=en

jamez1 · on May 24, 2016

>Data is very often approximately normal

>If you assume your risk profile can be characterized with standard deviation, well, you're an asshole.

Did you know that standard deviation can be used to describe a normal distribution? Or did you contradict yourself on purpose?

eanzenberg · on May 20, 2016

Bingo bingo bingo. Reducing the spread of a distribution to a single number is correct for very special distributions. Beyond these special distributions you have to do more study, take more measurements, do more simulations to understand what you have underlying your mean or median.

JDiculous · on May 20, 2016

Standard deviation is not the best terminology to use because it sounds like it's referring to the mean average deviation (MAD) rather than the square root of all the summed squares.

And when humans think of mean deviation, it's more intuitive to think of deviation in terms of regular units in relation to the mean rather than the square root of the sum of squared deviations. The former more accurately reflects human intuition.

reso · on May 20, 2016

This is what Taleb is saying. MAD is more intuitive to humans, and we can see this in particular because experienced statisticians, when asked to describe what standard deviation "means", actually describe MAD.

kevhito · on May 21, 2016

I don't understand. The usual explanation I hear (and that I think of) when explaining what a STD of x is, fall somewhere along the lines of "most (about 2/3rds) of the data will be within +/- x of the average". Is this wrong?

If not, can you give me an example of the typical description people give for STD that actually describes MAD?

closed · on May 21, 2016

Yes, that is wrong. It sounds like you might be thinking about the standard deviation of normally distributed data. In this case, you can say something like, "the probability an observation will be within about [mean-2sd, mean+2sd] is 95%".

But that's assuming the distribution is normal. In other cases, this doesn't hold, but there are more general statements, like Chebyshev's inequality.

Chebyshev's inequality: https://en.wikipedia.org/wiki/Chebyshev%27s_inequality

EDIT:

I have no idea when people would describe SD as MAD, but wouldn't be too surprised, since people first coming into statistics often seem to have trouble conceptualizing how a squared difference could be viewed. It would be surprising if a trained statistician mixed the two up, because SD and MAD arise from something they should be familiar with--Lebesgue spaces.

juped · on May 22, 2016

I've got a very non-statistics math background, but what you say suggests that there would be a nice way to visualize standard deviations two-dimensionally (since they arise from an L_2 norm), and that it's the one-dimensional "bell curve cross-section width" pictures that confuse people.

OliverJones · on May 20, 2016

> specific errors you'd reach by using a standard deviation

> ..."overweighing tail events."

That's it. Sigma puts too much weight on outliers, giving them the power to distort summary stats. MAD doesn't.

CamperBob2 · on May 20, 2016

It's really pointless to argue about the "best" deviation algorithm, at least on the basis of how it responds to outliers. The process of identifying and ignoring/deprecating outliers isn't something that can or should be lumped in with a simple notion of deviation, be it RMS, MAD, SD, or whatever. Any simple algorithm that you come up with to represent one data set may fail badly with another for this reason (and others.)

Outliers need to be removed, or at least understood, before performing any statistical calculations.

ori_b · on May 20, 2016

I don't think that answer helps. How are you assigning "too much" weight on outliers, and the process behind deciding the right amount of weight? Can you think of any concrete examples?

azakai · on May 20, 2016

Obviously "too much" depends on the subject matter.

But the point is that,

1. STD gives a lot more weight to outliers than MAD.

2. People constantly hear STD and think it means MAD, for all the reasons the article mentions.

The argument isn't that STD always gives too much. It is that it gives a lot more than people expect.

The extreme example given in the article is that a statistical process can have infinite STD, but finite MAD. In other cases, say income, the STD might be double the MAD. That's bad if you think STD means MAD.

Anyhow, this could be solved by educating people on what STD actually means, or by just using MAD. The article apparently thinks the latter is more practical, especially since the benefits of STD have decreased over time.

reso · on May 20, 2016

In this case what Taleb is concerned with is decision making. The right amount of weight is what allows human beings to make good decisions. He believes that MAD is much more intuitive to humans and therefore leads to better decisions.

ori_b · on May 21, 2016

In that case, I'm looking for a concrete example.

sillysaurus3 · on May 20, 2016

What's an example?

cissou · on May 20, 2016

2,-2,2,-2,2,-2,2,-2,2,-2,2,-2,2,-2,2,-2,2,-2,-1000

MAD: 54.5 STDV: 229.4

edit: OK I get that you wanted an example of what "too much" weight is. If you're looking for "how much the next datapoint will deviate from the mean, on average", then the MAD will tell you that, not the STDV. Except in some specific fields (maths, physics), people are much more interested in the MAD than the STDV, but all they get to make decisions is the STDV.

eanzenberg · on May 20, 2016

In many cases outliers are extremely important. One that comes to mind is high spenders in mobile games.

Trust me, if analysis was as simple as getting rid of outliers, treating everything as Gaussian, and retrieving simple summary statistics, then good data scientists wouldn't be paid $150k+ :)

cissou · on May 21, 2016

same thing with venture capital ;) sometimes average are uninteresting, you just want one good outlier

spikels · on May 20, 2016

The MAD of that data is not 54.5.

Here's how you calc MAD:

(1) Find the median of your data which is -2

(2) Generate the absolute deviations of your data from this median which is {4,0,4,0,4,0,4,0,4,0,4,0,4,0,4,0,4,0,998}

(3) Find the median of the absolute deviations which is 4.

It's ironic that Taleb prefers a statistic that ignores extreme examples (i.e. black swans) but he nevers seems to make sense to me. I've found MAD useful in dealing with noisy data.

j15t · on May 21, 2016

> It's ironic that Taleb prefers a statistic that ignores extreme examples (i.e. black swans)

No that it incorrect on two fronts.

1) MAD does not "ignore" extreme examples, it just weights them the same as other examples. Nassim argues that the weighting of extreme examples in STD is excessive and makes STD less intuitive. I really don't know how you could say that MAD "ignores" extreme examples - they obviously do influence MAD.

2) The act of computing MAD or STD on a sample of observations has no relevance to Black Swan theory. In Black Swan, Nassim defines a black swan event as an unexpected event of large magnitude or consequence. Hence, by definition, an event that has already been observed cannot be a black swan event.

To put it another way, Nassim's main point in Black Swan is that using historical observations to estimate forward risk renders one fragile to Black Swan events - you could use any dispersion metric and this is still the case.

cissou · on May 20, 2016

I went with Taleb's proposed definitions: "Do you take every observation: square it, average the total, then take the square root? Or do you remove the sign and calculate the average?"

edit: apparently this is consistent with https://en.wikipedia.org/wiki/Average_absolute_deviation I'm not sure what you referred to

spikels · on May 20, 2016

In my experience MAD refers to either Median Absolute Deviation or the Mean Absolute Deviation. I was using the median version which is a pretty common "robust" statistic. Although I have occasionally seen the mean version it seems to be less common in practice.

https://en.wikipedia.org/wiki/Median_absolute_deviation

Take a look at the Wikipedia you linked. No version of Average Absolute Deviation is consistent with Taleb's definition. No squaring, no square root. Sounds more like a geometric mean.

This is exactly what is so frustrating about Taleb. His ideas only partly makes sense. He often seems to see the problem but his solutions are poorly thought out. Of course, he thinks his solutions are perfect and everyone else is an idiot.

jamez1 · on May 23, 2016

In what field do you work that the median absolute deviation is used at all, let alone more than the mean absolute deviation?

When he talked about mean absolute deviation being sqrt(pi/2) sigma did that not make it abundantly clear what he was discussing?

>No squaring, no square root. Sounds more like a geometric mean

Do you even know what the geometric mean is? (It has a root function so your statement just sounds stupid)

Dispersion functions are built off the distance function under the metric you want to use. Standard deviation uses the L2 metric, which implies a euclidean distance function. (L2 corresponds to summing pow(u-x,2) and pow(sum,-2) as your functions)

Mean absolute deviation takes the L1 metric, which implies pow(1) and pow(-1). This becomes summing pow(abs(u-x),1) and then pow(sum,-1), which, needless to say is the same thing as averaging the absolute differences.

Hence the lack of any squaring or square rooting

vacri · on May 20, 2016

It depends on whether the numbers provided are the actual data points themselves, or the deviation from median (the second is what the article provided).

Elrac · on May 20, 2016

Also mildly ironic for Taleb to ignore black swans after writing a book about black swans ;)

trhway · on May 20, 2016

>Except in some specific fields (maths, physics), people are much more interested in the MAD than the STDV, but all they get to make decisions is the STDV.

com'n guys, it all comes down to whether you like more romb or circle :) Interesting that MOND (modified Newtonian), if true, would suggest that a circle at very big distances looks like square (notice not like romb :), so physics may start to like it more.

known · on May 20, 2016

http://www.teachmefinance.com/standarddeviation.html

coldtea · on May 20, 2016

>More accurate how?

By better predicting (being closer to) any unseen (e.g. future) sample values. What else did you expect "accurate" to mean?

>Less volatile, not overweighing tail events: what inference would I make incorrectly by using the standard deviation?

You would over-estimate the importance of tail events (e.g. rare big values).

I mean, what you ask is trivially answered by the statements in the article. Perhaps it's the terminology you find confusing?

hyperpape · on May 21, 2016

I guess I thought it meant something more than "they're different values and people sometimes think one is the other".

coldtea · on May 21, 2016

He also says that, but elsewhere in the article. In that part, "more accurate" is used as

(a) "it has better statistical properties"

rather than

(b) "it's more accurate to call it X".