Hacker News new | past | comments | ask | show | jobs | submit login

You're probably thinking of a predictive interval



It is a very common misconception and one of my technical crusades. I keep fighting, but I think I have lost. Not knowing what the "uncertainty interval" represents (is it, loosely speaking, an expectation about a mean/true value or about the distribution of unobserved values?) could be even more dangerous, in theory, than using no uncertainty interval at all.

I say in theory because, in my experience in the tech industry, with the usual exceptions, uncertainty intervals, for example on a graph, are interpreted by those making decisions as aesthetic components of the graph ("the gray bands look good here") and not as anything even marginally related to a prediction.


Agreed! I also think it's extremely important as practitioners to know what we're even trying to estimate. Expected value (i.e. least squares regression) is the usual first thing to go for, does that even matter? We're probably actually interested in something like an upper quantile for planning purposes. And then the whole model component of it, the interval that's being simultaneously estimated is model driven and if that's wrong, then the interval is meaningless. There's a lot of space for super interesting and impactful work in this area IMO, once you (the practitioner) think more critically about the objective. And then don't even get me started on interventions and causal inference...


From a statistical point of view, I agree that there is a lot of interesting and impactful work to be done on estimating predictive intervals, more in ML than in traditional statistical modeling.

I have more doubts when it comes to actions taken when considering properly estimated predictive intervals. Even I, who have a good knowledge of statistical modeling, after hearing "the median survival time for this disease is 5 years," do not stop to think that the median is calculated/estimated on an empirical distribution, so there are people who presumably die after 2 years, others after 8. Well, that depends on the variance.

But if I am so strongly drawn to a central estimate, is there any chance for others not so used to thinking about distributions?


> We're probably actually interested in something like an upper quantile for planning purposes.

True. But a conditional quantile is much harder to accurately estimate from data than a conditional expectation (particularly if you are talking about extreme quantiles).


Oh absolutely, so it's all the more important to be precise in what we're estimating and for what purpose, and to be honest about our ability to estimate it with appropriate uncertainty quantification (such as by using conformal prediction methods/bootstrapping).


> is it, loosely speaking, an expectation about a mean/true value or about the distribution of unobserved values

If you don't mind typing it out, what do you mean formally here?


I think they mean either what is E[x| y] (standard regression point estimate) along with a confidence interval (this assumes that the mean is a meaningful quantity), or the interval s.t. F(x | y) -- the PDF of x -- is between .025 and .975 (the 95% predictive interval centered around .5). The point is that the width of the confidence interval around the point estimate of the mean converges to 0 as you add more data because you have more information to estimate this point estimate, while the predictive interval does not, it converges to the interval composed of the aleatoric uncertainty of the data generating distribution of x conditioned on the measured covariates y


That's exactly what I was talking about. The nature of the uncertainly intervals is made even more nebulous when not using formal notation, something I was guilty of doing in my comment--even if I used the word "loosely" for that purpose.

If you think about linear regression, it makes sense, given the assumptions of linear regression, that confidence interval E[x|y] is narrower around the mean of x and y.

If I had to choose between the two, confidence intervals in a forecasting context are less useful in the context of decision-making, while prediction intervals are, in my opinion, always needed.


Ah, that makes sense. The word expectation was really throwing me off, along with the fact that, in the kind of forecasting setting of this post, the mean and confidence interval (used in the correct sense) are not meaningful, while the quantile or 'predictive interval' are meaningful.


> Not knowing what the "uncertainty interval" represents (is it, loosely speaking, an expectation about a mean/true value or about the distribution of unobserved values?) could be even more dangerous, in theory, than using no uncertainty interval at all.

And, from what I understand, this is what is happening in this article.

The person is providing an uncertainty interval for their mean estimator and not for future observations (i.e., the error bars reflect the uncertainty of the mean estimator, not the uncertainty over observations).

Like you said: before adding error bars, it probably makes sense to think a bit about what type of uncertainty those error bars are supposed to represent.


Thanks, this finally clarifies for me what the article was actually doing!

And it's very different from what I expected, and it doesn't make a lot of sense to me. I guess if statisticians already believe your model, then they want to see the error bars on the model. But I would expect if someone gives me a forecast with "error bars", those would relate to how accurate they think the forecast would be.


Yes, that term captures what I'm talking about.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: