Hacker News new | past | comments | ask | show | jobs | submit login

Standard deviations are one measure of a probability distribution. Order statistics (max and min) are another, and those are given in these graphs in the supplemental information and data page, linked to in the overview section.

http://www.ncdc.noaa.gov/special-reports/march-2012-heat-wav...




I don't think we're looking at the same graphs. The ones I see just plot YTD average temps for the five warmest years. Even the "min" charts are still for the same five years. Except it's different years for every city. There's no way I could get a good sense of the distribution from these charts.


I appreciate that you went ahead and clicked through to the data available, rather than critiquing a press release for not containing enough hard data. Here's why I think the plots did not show standard deviations --

Standard deviations are not an appropriate summary when talking about extreme values.

We're basically conditioning on extreme values ("plot temperatures of all cities having their hottest March on record in 2012"). So, while the standard deviation might be a useful summary of:

  P(temperature on day t),
a standard deviation would not be a useful summary of:

  P(temperature on day t | average temperature is highest recorded)
In general, after you condition in this way, the events ("sample paths" of the function "temperature at time t") don't look like the rest of the population.

They're oddballs, and by definition, there are only a few of them -- only one March can be the hottest on record. Maybe the best comparison would be the second-hottest March, so plot that one.

There would certainly, for example, be no reason to believe this population looks Gaussian in the tail. In fact, it would probably be misleading to make any assumption about what a "typical" extreme sample looks like. This is probably why just other sample paths (temperature as a function of t) for that location were given, rather than some summary statistic. You can't average oddballs.

There are probably locales for which standard deviation is not ever a useful summary. I'm thinking of places which are subject to strong unpredictable variations, like Santa Ana winds, that are more a 0/1 phenomenon. This would result in a multimodal distribution of temperature, which would make standard deviation misleading. That is, it could be really cold, or rather warm, but unlikely to be in-between.


I think you're confused by the graphs too. They don't plot daily temps, and they aren't even necessarily for the hottest Marches on record. It's all YTD averages. 1998 January in most north east cities was really an extreme value, but if I'm reading the graph right, it was possibly a colder than average March, but the YTD values are buoyed up by January.

Look at DC. Until the last week in March, 1998 is flat or even trending down. That seems like the actual daily temps were possibly below average, no? Whatever, YTD may be pretty to look at it, but I don't think they're good for interpretation. http://www1.ncdc.noaa.gov/pub/data/cmb/images/us/2012/mar/wa...

I just mention this because "temperature at time t" is not what we're looking at.


I don't think you are reading that correctly. If you look at the dip after the spike then all of the spike has already disappeared. Also, as you move to the right it takes an ever larger difference to move the number. Think of it this way, after 10 samples it takes a number 11 above the average to increase the average by one, but after 20 samples it takes a number 21 above the average to increase the average by 1.

So, while the 1998 graph and the 2012 graph look close at the end the actually daily values in 2012 needed to be significantly higher than 1998 values to create that much separation when they so recently crossed each other.

Infact it's actually 1976 that had the most comparable march from the sample given. However, because those years where chosen because their jan - march average was so far above the average it does not really show how abnormal 2012 was. Consider, where the average is and how many colder years it takes to average out to that.


You're right, "temp. at time t" is not plotted. I don't know the reason for that -- presumably they're using YTD average as a running diagnostic to tell if the current year is especially hot.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: