The danger of using averages alone in web analytics

NyxWulf · on May 23, 2012

I agree with many of the points in the article, however it is conflating two unrelated things. Using simple averages in isolation is almost always a mistake. The real story in a system almost always lies in it's variation and not in the average. There are many ways to look at that, one of them is a histogram, although I tend to prefer time series graphs with moving ranges. One of my favorites is the Xmr chart or any of a number of statistical process control charts. Using those you can get a much better understanding of systems over time.

The transition to Real User metrics is not at all related to using an average. All recording and instrumentation processes will benefit through using more sophisticated and nuanced tools than a simple average. Real user metrics are an important data point, but they lack certain information you can get from external monitoring systems. We use both pingdom and catchpoint, by far my favorite is Catchpoint because I can see things like what ISP is involved with a slow request, what geographic region, etc. I can also get scatter plots and nice statistical graphs around median, geometric mean, 75th, 95th, 99th percentile.

So in short the main points are good, simple averages are misleading. Capturing end user performance data is good. Not using external monitoring though isn't a good idea because there a number of things that you can identify if you have that insight.

joshfraser · on May 23, 2012

While you're right that they are two separate topics, they are quite closely related. Both are about the search for truth and getting a clearer picture of what's actually happening on your site.

Looking at your 95th percentile with RUM data means something totally different than looking at your 95th percentile with synthetic data. With RUM you are looking at your actual visitors across every page on your site. With synthetic testing (Keynote, Catchpoint, etc) you're looking at a small sample of pages from random nodes in random locations around the world. The problem with synthetic testing is that it makes all sorts of assumptions about your visitors like their browser, geography, connection speed, state of browser cache, etc. This doesn't mean that synthetic testing isn't useful (it is!), but it's important to recognize the shortcomings of your methodology whether that's from looking at an average or looking only at synthetic data.

bluesmoon · on May 23, 2012

Disclaimer: My company also does web performance analytics.

I did a talk a couple of years ago on the statistics of web performance where I cover things like median, arithmetic mean, geometric mean, margin of error and sample sizes to carry out proper data analysis. Slides available here: http://www.slideshare.net/bluesmoon/index-3441823

To zashapiro's point about Geometric Mean... while it tends to be superior in the ideal case where the distribution is perfectly Log-normal, on average, most distributions tend to sway from a perfect log-normal. The median gives you a slightly better measure of central tendency in this case.

Secondly, there's the problem with user perception of geometric standard deviation (and consequently margin of error). Unlike arithmetic standard deviation which is +-, the geometric standard deviation is */, which means it's not visually symmetric... humans have an easier time visualizing additive symmetry than they do with multiplicative symmetry.

At LogNormal.com, we track the median, arithmetic mean, geometric mean a whole bunch of percentiles and margins of error and a complete distribution curve.

joshfraser · on May 23, 2012

Philip, thanks for sharing the link to those slides. I didn't think it made sense to go that deep on my article, but it's a nice reference for anyone who wants to really dig in.

bluesmoon · on May 23, 2012

True, it was hard enough explaining those concepts in person ;)

jmduke · on May 23, 2012

Or, put more simply.

The average of [0,100,200] and [99,100,101] are both 100. And yet these two data sets are clearly different.

Measures of central tendency should always be supported by measures of dispersion (range, standard deviation, etc.) Not just with web analytics.

shabble · on May 23, 2012

For a compelling example, see Anscombe's Quartet: https://en.wikipedia.org/wiki/Anscombe%27s_quartet

zashapiro · on May 23, 2012

Well said.

zashapiro · on May 23, 2012

I'm surprised you didn't you mention geometric means. Seems like that would also be a relevant way to look at performance data.

pathdependent · on May 23, 2012

Sometimes that's better.

I think the more important point is that knowledge of the underlying distribution -- normal (Gaussian), log-normal, exponential, power law, Weibull, etc -- is very important. The more skewed the underlying distribution, the less relevant the mean becomes -- and it can cause you to make some very bad inferences.

joshfraser · on May 23, 2012

Great point. While geometric means are a little tricker for people to understand, they are definitely one of the better ways of looking at performance data since they describe the central tendency of the data. I personally prefer using the 95th or 99th percentiles because they force you to keep the mindset that you need to be fast for everyone (sans outliers of course).

saetaes · on May 23, 2012

Agreed, it's a great alternative to mean for performance data - Keynote even has it as an optional aggregation function when viewing data. Heck, if you have your performance data in a MySQL database, it's as simple as 'select exp(avg(ln(myVal)))', and doesn't require you to install a UDF, as you would if you wanted to find the median.