In desperation I asked Fermi whether he was not impressed by the agreement between our calculated numbers and his measured numbers. He replied, “How many arbitrary parameters did you use for your calculations?” I thought for a moment about our cut-off procedures and said, “Four.” He said, “I remember my friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk.
That anecdote was worth it just for 'Johnny von Neumann' ;-)
I've generally heard that most scientists of the day considered von Neumann the smartest man in science (he'd have to be - he single handedly revolutionized several branches of CS, Physics, Math, Economics, etc. Somehow hearing him called 'Johnny' makes him much less intimidating though ...
Funny, but not as good as the econ paper I once came across extolling the virtues of a method which allowed one to "sidestep the issues associated with negative degrees of freedom" (or something to that effect.) In other words, fitting a line to one data point :-)
Why yes indeed. Although it's more an issue of the ratio of parameters to data points. Consider, for example: if you had 20 data points and used 20 parameters to fit them, you would generate a model that fits all 20 parameters perfectly! But the model would probably be useless for any new data points, because it takes into account all the irrelevant aspects of the 20 data points (their "randomness").
It is probably a case of overfitting, but it could also be the case that they are fitting an existing model that really has 17 parameters. Without more context, it's a little hard to judge. If I remember, I'll look at the full paper tomorrow. It might make a nice example.
Probably, but all I see is a graph and the abstract. It also depends on how the model is being used in the paper. Unless I'm mistaken, Nature puts papers through peer review, which for me personally, anyways, means I'd want to actually read the whole paper before fire and pitchforks.
That's pretty outrageous if the claim in the paper is true - this is a really common problem that I ran into as an undergraduate doing modeling to fit bacterial conjugation rates to differential equations for predator prey dynamics.
In general, it's OK to use more parameters than data points even, if you properly use regularization (such as weight decay), for example. Other times, even significantly fewer parameters than data points can be wrong.
That sounds pretty ad-hoc. Sure, you can throw an L1 or L2 regularizer on your objective function, but it should be well-motivated.
Should probably just use Gaussian process regression if you want to do inference over the space of all[1] functions in a principled (i.e. Bayesian) manner.
1. (or the space of all polynomial functions or something. I forget)
I don't think there's any consensus in the statistics community on which of the many ways to do inference over "nearly all" functions is the better one; nonparametric regression is basically a whole field, and pretty in flux over the past 10 years.
Quote from Freeman Dyson:
In desperation I asked Fermi whether he was not impressed by the agreement between our calculated numbers and his measured numbers. He replied, “How many arbitrary parameters did you use for your calculations?” I thought for a moment about our cut-off procedures and said, “Four.” He said, “I remember my friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk.