Great writeup. I think this should go hand in hand with articles explaining why ...

Tloewald · on May 20, 2016

Really?

> Do you take every observation: square it, average the total, then take the square root? Or do you remove the sign and calculate the average?

Neither of these is even an attempt to measure average daily temperature variation. (Assuming any reasonable definition...)

If you're talking about variation from day to day, then looking at differences in max and min, or at given points in time, or on the same day in different years, would be some approaches. Averaging absolute values makes no sense whatsoever -- if the observations were 20 and -20 the result would be the same as if they were 20 and 20. And calculating the standard deviation of the observations is calculating something else again (it might be standard deviation, or it might just be something dumb). Neither of these are problems with standard deviation.

It's sad if newspapers or their readers don't know what standard deviation means, but they're pretty much innumerate across the board so it's not clear whether further muddying terminology is going to help anyone.

scottlamb · on May 20, 2016

They used the word "change" in the paragraph before the one you quoted. I think by "observation" in this paragraph, they mean "x_i - M(x)" as in https://en.wikipedia.org/wiki/Average_absolute_deviation: the difference between the current value and some measure of central tendency (mean, median, mode). It's not explained well, but if you make this assumption, the article makes sense.

enugu · on May 20, 2016

The averaging of absolute values in the article's context is happening around a mean of 0(averaging the variations). In general, one would average the absolute values of the difference from the mean.

Yes, it would be silly to convert a -20 to 20 if the mean isn't 0.

The advantage with STD is that it is easier to compute relative to MAD in a situation like a random walk. For instance when one has a +1, -1 equal probability one dimensional random walk, so the mean is 0, X = sum of X_i, E(X²) = n is straightforward, whereas computing E(|X|) is not so easy.

Tloewald · on May 20, 2016

That's as clear as mud. What does a -20 vs. a 20 mean? That it was colder at noon than at midnight?

Assuming that there was some sensible definition (e.g. deviation from the past mean temperature, which is not at all what was stated) then "average deviation" has two obvious interpretations. Re-labeling "standard deviation" or "mean deviation" would not be helpful in this case, and the -20 and 20 values STILL don't help the case.

enugu · on May 20, 2016

The article isn't clear but the deviation is either from the sample mean or the mean of the underlying random variable. You are trying to find the expected difference from the average and hence the absolute value(otherwise, the positive and negative deviations cancel out).

Tloewald · on May 22, 2016

Average max, min, or average average?

stdbrouw · on May 20, 2016

> A big faux pas I saw a lot was using normality-assumed parametric tests on non-normal data where the skew was clearly significant (i.e. you couldn't get away with it like some non-normal data dists.)

Depends on how much data you have. With a couple hundred observations, you can have as much skew as you like and the normal approximation will still be pretty good. I only mention this because there are a lot of misconceptions about how statistics relies on normal data, when really it mostly just relies on the distribution of the mean being normal, which is pretty much a given because of the central limit theorem. There are much worse sins and abuses -- which yes, unfortunately you do see all the time in scientific papers.

TravisDick · on May 20, 2016

In general it is a misconception that real data always follows a normal distribution. It is true that if you sum many /independent/ random quantities, then the result is approximately normal (e.g., the central limit theorem and generalizations). But real data tends not to be independent. Many real world quantities follow extremely skewed distributions. E.g, Zipf's law, Korcak's law, Pareto's laws.

For a concrete example, if you look at the distribution of the number of friends users have on social networks, you might expect that 95% of people have the mean number of friends +/- a few standard deviations (since this would be the case for a normal distribution). It would be virtually impossible for someone with a number of friends that is say thousands of standard deviations away to exist, yet there will be many such users in social networks (celebrities, bot networks, etc). In reality, the empirical distribution in this case follows an extremely skewed distribution.

dnautics · on May 20, 2016

> It is true that if you sum many /independent/ random quantities, then the result is approximately normal...

Don't forget defined variances! The end result could be more generally levy-alpha (alpha < 2.0), as it is with many financial instruments.... Normality requires defined variance.

stdbrouw · on May 20, 2016

And yet a Pareto distribution still has a mean, and the sampling distribution of that mean is approximately normally distributed.

Of course I'm not claiming that you can just pretend that a Pareto distribution is a normal distribution, but statistical tests are generally concerned with differences in means (group A does on average 25% better than group B) so it's the sampling distribution we're interested in, not the parent distribution.

You make a good point about autocorrelation and dependent data, but that's a very different issue. To riff on your example about social networks, you'd have dependent data if you're trying to see what kind of news articles people like to read, if those preferences turn out to be mostly guided by what friends are reading.

hrzn · on May 20, 2016

"And yet a Pareto distribution still has a mean, and the sampling distribution of that mean is approximately normally distributed."

This is often wrong. The central limit theorem requires finite variance and some (yet common) Pareto distributions have infinite variance.

CorvusCrypto · on May 20, 2016

yeah, that's why I had to use the parens saying that it was clearly data that would affect their stats (think about studies where you can only extract data from 5 rats per group, etc.). Also why I said normality-assumed parametric tests. There are many other parametric tests which aren't based on "normality" assumptions. But yes as you said, you can absolutely use normality-assumed parametric tests on non-normal data within reason and your conditions you listed are within reason. Of course this is my opinion and my opinion could just as easily be entirely wrong as decided by the expert community :D

And as always, your allowances depend on the test you are performing :P Also gotta love statistics for keeping a large list of customary exceptions determined by the community too.

jamez1 · on May 24, 2016

>you can absolutely use normality-assumed parametric tests on non-normal data within reason and your conditions you listed are within reason

I would be careful throwing around this kind of rhetoric.

There are special cases where the parametric assumption can be relaxed, it is not a general rule. This is why the word 'parametric' is used.

jamez1 · on May 23, 2016

A parametric test is making an assumption about the distribution of the random variable, not it's mean.

Inside many of these tests, they might make use of CLT assumptions of sample means, but that doesn't mean they don't still depend on the distribution assumptions.

Skewness is a major consideration and can lead to completely different inferences.

Let's say we wanted to find the median and our distribution was assumed to be normal. Under no skewness, the sample mean would be a good approximation. Under skewness the sample mean would be a very bad approximation.

If the test in question is solely about the mean of the random variable, and nothing else about the distribution, then it's possible that the normality assumption only need to extend as far as the sample mean (ala t-test). But that's hardly a parametric test anymore is it?

stdbrouw · on May 23, 2016

It's not clear to me what kinds of tests you are referring to. Ordinary least squares regression, for example, is all about estimating conditional means, and it is very much parametric. Just finding the best estimates for the parameters of a one-dimensional distribution is usually not particularly interesting, is certainly not what statisticians spend most of their time on and in any case nobody's suggesting that the population mean is always equal to the population median.

jamez1 · on May 24, 2016

Why do you think we have the classification 'parametric' if the only thing that matters is the distribution of the sample mean? If it's all going to converge to be normal as you say, why is there parametric and non-parametric tests?

stdbrouw · on May 24, 2016

Regression works like this: E[Y|X] = Xβ. It is parametric because you model the conditional mean as a weighted sum of various predictors, and these beta "weights" are your parameters. This is true of ordinary regression, Poisson regression, binomial (logistic) regression and so on. An example of nonparametric regression would be something like regression splines.

Why are there nonparametric tests? Because for small sample sizes you can't always trust the normal approximation, and as you state this might be due to something like skew. This takes nothing away from the fact that inferential statistics is almost always about comparisons of means. And yes, the t-test is a parametric test, of which the Mann-Whitney or Wilcoxon would be the nonparametric equivalents.