Hacker News new | past | comments | ask | show | jobs | submit login

You know what is interesting about that article? The total lack of statistical insight.

"The average temperature of 51.1 degrees F was 8.6 degrees above the 20th century average for March and 0.5 degrees F warmer than the previous warmest March in 1910."

Without standard deviations, who cares?

And clicking through to the main article, I'm surprised by how many words can be written without saying anything of consequence. Even the graph showing Green Bay temperature looks like nothing more than a plot of noise.

Edit: Now that I think about it, the lack of basic statistical disclosures is pretty damning evidence that nothing significant has happened.




Did you just read the 6 sentence summary without clicking through and looking for the data? Here's a paper used in the analysis which you just read that talks about the uncertainty and standard deviations: ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/v2/monthly/algorithm-uncertainty/

> Edit: Now that I think about it, the lack of basic statistical disclosures is pretty damning evidence that nothing significant has happened.

No.


I happen to work with NOAA and NASA scientists on coupling Earth System Models. NOAA does understand elementary descriptive statistics (so does NASA). Sometimes average temperature distributions are skewed and do not have a finite standard deviation. This is the case for the distribution of annual change in global average temperature, which deviates from a normal distribution (skewed to the right). Sample estimates of standard deviations of global average annual temperature change are meaningless because the standard deviation isn't finite. Instead you'll see measures that are meaningful (mean, min, max). Don't assume that all distributions have finite variance.


I understand what you're saying, but it's also kind of begging the question. We can't calculate a standard deviation because temperatures are rising, but we also can't test if temperatures are unusually warm because we don't have a standard deviation?


I think the message is simple: standard deviations don't provide insight when your main concern is extreme values (max/min temperatures, in this case).


That's equally circular. How do you know it's an extreme value?


It's at either end of a set of values?


Set of values: [ 80.1, 80.2, 79.8, 80.1, 80.0, 79.9, 80.1, 80.1 ]

New value: 80.3. It's a record high, but is it an extreme value? i.e., are record highs extreme values by definition?


I'm not expert with these data, but your example is not a fair analog. Let's go back to the actual problem.

From a casual glance at the graphs showing past hot years (each one shows 5), NOAA appears to have records for most cities going back to the 1950s (probably 100 years in some cases).

So, in a set of 63 (1950-2012) values, which are themselves average temperatures (and thus representative of some kind of trend, not just a single hot day), the procedure picks out the 5 highest values of average temperature.

It's reasonable to expect that the sequences corresponding to these 5 highest values will be unusual when compared to the other 58. Nothing weird about that.

Of course, if they had only 10 years data, as in your example above, or if they weren't using monthlong averages, it would be a different story.


I linked this elsewhere, but in case people didn't notice...

The raw temperature data is available at http://www.ncdc.noaa.gov/temp-and-precip/time-series/index.p...

The standard deviation is 2.89.


Standard deviations are one measure of a probability distribution. Order statistics (max and min) are another, and those are given in these graphs in the supplemental information and data page, linked to in the overview section.

http://www.ncdc.noaa.gov/special-reports/march-2012-heat-wav...


I don't think we're looking at the same graphs. The ones I see just plot YTD average temps for the five warmest years. Even the "min" charts are still for the same five years. Except it's different years for every city. There's no way I could get a good sense of the distribution from these charts.


I appreciate that you went ahead and clicked through to the data available, rather than critiquing a press release for not containing enough hard data. Here's why I think the plots did not show standard deviations --

Standard deviations are not an appropriate summary when talking about extreme values.

We're basically conditioning on extreme values ("plot temperatures of all cities having their hottest March on record in 2012"). So, while the standard deviation might be a useful summary of:

  P(temperature on day t),
a standard deviation would not be a useful summary of:

  P(temperature on day t | average temperature is highest recorded)
In general, after you condition in this way, the events ("sample paths" of the function "temperature at time t") don't look like the rest of the population.

They're oddballs, and by definition, there are only a few of them -- only one March can be the hottest on record. Maybe the best comparison would be the second-hottest March, so plot that one.

There would certainly, for example, be no reason to believe this population looks Gaussian in the tail. In fact, it would probably be misleading to make any assumption about what a "typical" extreme sample looks like. This is probably why just other sample paths (temperature as a function of t) for that location were given, rather than some summary statistic. You can't average oddballs.

There are probably locales for which standard deviation is not ever a useful summary. I'm thinking of places which are subject to strong unpredictable variations, like Santa Ana winds, that are more a 0/1 phenomenon. This would result in a multimodal distribution of temperature, which would make standard deviation misleading. That is, it could be really cold, or rather warm, but unlikely to be in-between.


I think you're confused by the graphs too. They don't plot daily temps, and they aren't even necessarily for the hottest Marches on record. It's all YTD averages. 1998 January in most north east cities was really an extreme value, but if I'm reading the graph right, it was possibly a colder than average March, but the YTD values are buoyed up by January.

Look at DC. Until the last week in March, 1998 is flat or even trending down. That seems like the actual daily temps were possibly below average, no? Whatever, YTD may be pretty to look at it, but I don't think they're good for interpretation. http://www1.ncdc.noaa.gov/pub/data/cmb/images/us/2012/mar/wa...

I just mention this because "temperature at time t" is not what we're looking at.


I don't think you are reading that correctly. If you look at the dip after the spike then all of the spike has already disappeared. Also, as you move to the right it takes an ever larger difference to move the number. Think of it this way, after 10 samples it takes a number 11 above the average to increase the average by one, but after 20 samples it takes a number 21 above the average to increase the average by 1.

So, while the 1998 graph and the 2012 graph look close at the end the actually daily values in 2012 needed to be significantly higher than 1998 values to create that much separation when they so recently crossed each other.

Infact it's actually 1976 that had the most comparable march from the sample given. However, because those years where chosen because their jan - march average was so far above the average it does not really show how abnormal 2012 was. Consider, where the average is and how many colder years it takes to average out to that.


You're right, "temp. at time t" is not plotted. I don't know the reason for that -- presumably they're using YTD average as a running diagnostic to tell if the current year is especially hot.


Do you get standard deviations when you watch the summary today's weather on the news? I think the statistical detail was appropriate.

If you are seriously proposing that standard deviation makes an average 8.6 degrees above normal insignificant then you are going to have to present some pretty compelling evidence. NOAA do generally know how to do some basic math.


A standard deviation of greater than 4.48 degrees would make 8.6 degrees above average insignificant, assuming a normal distribution (alpha = 0.05, two-sided Z-test).


sigh

I do understand what standard deviation is, and I do understand that it is possible that a 8.6 degrees difference would be significant. What I meant was that NOAA wouldn't be highlighting this unless it was.

Anyway, it's public data, and we can settle this pretty easily, right?

http://www.ncdc.noaa.gov/temp-and-precip/time-series/index.p... has the raw data in CSV form going back to 1895.

The standard deviation is 2.89 (assuming my calculations are correct)


I other words, (again, if your calculations are correct) if this were a typical HN discussion about A/B testing a conversion goal, we'd all be ridiculing anybody who suggested a result almost 3 standard deviations from the mean _wasn't_ "statistically significant", yet climate change (sceptics|denialists) will no doubt _still_ argue the relevance of this…


Not quite. If this were a discussion about A/B testing, we'd be ridiculing anybody who said "I made $8.60 more with my new design."


The bigger problem (as others have alluded to) is that we don't really know what the tail of the distribution looks like. There's no obvious reason to think it's gaussian.


If the standard deviation is 10, then 8.6 degrees above the mean is insignificant. Providing a number like 8.6 without the s.d. is presenting zero evidence. That's like page two of my statistics textbook.


It's a press release. Most people don't know what a standard deviation is.


I was replying to the parent comment, not the story, and the claim that we could somehow determine the significance of 8.6 without knowing more. As for press releases, "It was hot" is probably a sufficient level of detail for most people. :)


We are talking about weather. A little common sense will let you decide if 8.6 degrees is significant.

http://news.ycombinator.com/item?id=3820795 for the calculations if you insist.


> Now that I think about it, the lack of basic statistical disclosures is pretty damning evidence that nothing significant has happened.

I agree with the rest of your post, but come on, this is just silly. Absence of evidence (on one particular website) is certainly not evidence of absence.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: