This really ought to be better known. I think a large part of the reason that it isn't is because most books that cover linear models in general require a background in linear algebra, and there are very few people teaching from that standpoint outside of the advanced undergraduate/beginning graduate level.
Another thing that I wish was more widely known is that a linear model is linear in its parameters, not the data. You can apply arbitrary transformations to the data and still have a linear model as long as what you're fitting is of the form Ey = \beta_0 + \beta_1 f_1(X) + \beta_2 f_2(X) + ... + \beta_p f_p(X).
The idea that linear models are linear in the parameters and not the data is a bit confusing. I know the effect of this is that you fit curves with "linear" models, but I don't feel like I fully understand this. Can you explain further or link to some good resources?
Each data point is a bunch of features x_1, x_2, ..., x_n.
You can make new features for your data points using whatever functions you like -- it doesn't matter if they're linear. Let's say we add two new features x_{n+1} = f(x_1, x_2) and x_{n+2} = g(x_2, x_3).
Now if we train a linear model on the new expanded set of features, it's linear in those features. It's not linear in the original data though, because of the new features that we introduced: x_{n+1} and x_{n+2}.
Another thing that I wish was more widely known is that a linear model is linear in its parameters, not the data. You can apply arbitrary transformations to the data and still have a linear model as long as what you're fitting is of the form Ey = \beta_0 + \beta_1 f_1(X) + \beta_2 f_2(X) + ... + \beta_p f_p(X).