Interpretable Machine Learning

fractionalhare · on Nov 29, 2020

I can't evaluate the entire book in one sitting, so I read through the section on linear regression. EDIT: Also read replies to this comment.

In section 4.1, Linear Regression, subsection "Normality":

> It is assumed that the target outcome given the features follows a normal distribution. If this assumption is violated, the estimated confidence intervals of the feature weights are invalid.

This is incorrect. Linear regression does not require the outcomes follow a normal distribution. Rather, it requires that the residual errors are normally distributed.

The next subsection, "Homoscedasticity", is technically correct but incomplete. A more general assumption of linear regression is stationarity. Heteroskedasticity is one, but not the only, way to violate stationarity. In my opinion this section should focus on stationarity more generally than homoskedasticity, because constant variance is a necessary but insufficient assumption.

In the following subsection, "Independence", I don't see any error. I think this would be a good opportunity to talk about autocorrelation of the residuals and outcomes, because this is why linear regression cannot generally be used for time series data. Autocorrelation (or correlation via an underlying temporal trend) both violate independence.

I generally like section 4.1.6. I think reasonable people could disagree on whether linear models are relatively explainable, but overall the technicals in this section are correct. Linear models are simpler to understand and very powerful, but they do break down when most of the correlative relationship is nonlinear (e.g. monotonic or exponential) or when there are too many features to capture.

civilized · on Nov 29, 2020

> This is incorrect. Linear regression does not require the outcomes follow a normal distribution. Rather, it requires that the residual errors are normally distributed.

No, the author's formulation and yours are mathematically equivalent statements, and specifying the conditional distribution of the outcomes is the version that generalizes to GLMs.

fractionalhare · on Nov 29, 2020

Ah, I didn't see what "given" was doing in that sentence. It looks like you (and therefore the author) are correct. I concede that point :)

6gvONxR4sf7o · on Nov 29, 2020

Even then, linear regression doesn’t assume that. Particular ways to interpret the coefficients will assume that, but other interpretations are more general. The question of what the coefficients represent when the simplest assumptions break down is interesting and important to understand if you want to use it in the real world where assumptions are never really true, but often close enough.

delib · on Nov 29, 2020

Yes, the target does not need to follow a normal distribution, but the target conditional on the features (which is what the original authors wrote) follows a normal distribution if you make the assumption that the errors are normally distributed. The two statements are equivalent for the standard linear regression model.