Hacker News new | past | comments | ask | show | jobs | submit login

"The real trick is to see how well your model extrapolates from the data you have out into the future."

That is the most common way to show the modeller is not shamelessly overfitting.:-| Another way, though, is less common but not vanishingly uncommon: the model may be so much simpler than the data it fits that overfitting is not a plausible explanation. (Roughly there are too many bits of entropy in the match to the data to have been packed into the model no matter how careless or dishonest you might have been about overfitting.) E.g., quantum mechanics is fundamentally pretty simple --- I can't quantify it exactly, but I think 5 pages of LaTeX output, in a sort of telegraphic elevator pitch cheat sheet style, would suffice to explain it to 1903 Einstein or Planck well enough that they could quickly figure out how to do calculations. Indeed, one page might suffice. And there are only a few adjustable parameters (particle/nucleus masses, Planck's constant, and less than a dozen others). And it matches sizable tables of spectroscopic data to more than six significant figures. (Though admittedly I dunno whether the non-hydrogen calculations would have been practical in 1903.) For the usual information-theoretical reasons, overfitting is not a real possibility: even if you don't check QM with spectroscopic measurements on previously unstudied substances, you can be pretty sure that QM is a good model. (Of course you still have to worry about it potentially breaking down in areas you haven't investigated yet, but at least it impressively captures regularities in the area you have investigated.)




It's not just a question of how the model extrapolates from the input data itself. The actual input data may be in question as well, because there are always judgments involved in deciding how to measure, what "unreasonable" datapoints will be discarded, etc.

Read, for example, here:

"It is indisputable that a theory that is inconsistent with empirical data is a poor theory. No theory should be accepted merely because of the beauty of its logic or because it leads to conclusions that are ideologically welcome or politically convenient. Yet it is naive in the extreme to suppose that facts – especially the facts of the social sciences – speak for themselves. Not only is it true that sound analysis is unavoidably a judgment-laden mix of rigorous reasoning (“theory”) with careful observation of the facts; it is also true that the facts themselves are in large part the product of theorizing. ..."

http://cafehayek.com/2015/04/theorizing-about-the-facts-ther...


While the general gist of your argument is right, I think there are some non-trivial ways to overfit. There are some 25 constants in the standard model apparently that describe the world around us to enormous precision. This is so little information that of course the trivial 'overfitting by encoding observations directly' will fail, but we could still be overfitting by having an excess number of variables: perhaps there's really some mechanism in neutrino physics that explains neutrino oscillation without needing some constants to describe how it really happens. This might in turn boost tremendously our predictive precision for neutrino oscillation to match the precision of the other more fundamental variables in the model, for example. But I think you're right that it's so little data that we have some strong information theoretic guarantees that at least the model will have predictive power matching the precision of previous measurements.


Well - that's true apart from co-incidence. You can have a very simple theory which says "x is directly caused by y" and there is a lot of good data, and a great fit. But it's kist a co-incidence and breaks down immediately.

Occam's razor is a rule of thumb and an aesthetic boon, but nothing more.

The real test is that you have a theory that is meaningful and has explanatory power. If it grants insight on the mechanisms that are driving the relationships or generating the data and these make sense - you are pretty golden.

Another one is that the theory makes unexpected predictions that you can then test. This is a real winner, and why complex physics is so well regarded.


I think the information theoretic approach to modeling concerns actually implies such "simpler is better" approaches as Occam's Razor. At least that's my take on [http://arxiv.org/abs/cond-mat/9601030], which derives a quantitative form of it.


I haven't read that paper, and the abstract makes my head spin! I'll have a look later, and try and figure out the argument. I agree with you that things like the I-measure are based on the idea that simpler is good, and it works well in practice - both in Machine Learning and in the real world - which is why humans tend to prefer it. But (the paper you cite aside) I don't know of a deep reason why simple is preferred by nature.

Also there is a deep cognitive bias here, perhaps we lack the machinery to understand the world as it really is!


> Occam's razor is a rule of thumb and an aesthetic boon, but nothing more.

Occam's razor is a bit more than that. It isn't just that given a theory X and a theory Y = X + ε, both of which fit the facts, you should prefer X because it's "cleaner" or more aesthetically pleasing or whatever. You should prefer X because you can prove it is more likely to be true.

https://en.wikipedia.org/wiki/Solomonoff%27s_theory_of_induc...


Do you happen to have this one to five pages of QM equations somewhere as reference? I would be very interested in reading that.


No, it was a thought experiment I made up, not an exercise I've ever seen performed: how abbreviated a description of quantum mechanics could I get away with and still convey the idea to on-the-eve-of-QM scientists?

The QM equations are naturally very short, the stuff that I would worry about expressing concisely are concepts like what probability amplitude is, how it connects to prior-to-QM concepts of probability, the interpretation of what it means to make an ideal measurement, stuff like that. I don't know of any bright concise formulation of that stuff, and I'm not sure how I'd do it. I am fairly sure, though, that 5 pages could get the job done well enough to connect to spectroscopic observations.

Note also in the original story it was intended to be given to Einstein and Planck, deeply knowledgeable in classical physics, so it'd be natural to use analogies that would be more meaningful to them than to the typical CS/EE-oriented HN reader. For example, I'd probably try to motivate the probability amplitude by detailed mathematical analogy to the wave amplitudes described by the classical wave PDEs that E. and P. knew backwards and forwards, and I don't think a concise version written that way would work as well for a typical member of the HN audience.


Scott Aaronson has some good motivation for "QM falls out naturally if you try and use a 2-norm for your probalities instead of a 1-norm." See http://www.scottaaronson.com/democritus/lec9.html


I believe he is referring to the 'postulates of quantum mechanics', you can find several formats from a quick google search.

Dirac, 1929: "The fundamental laws necessary for the mathematical treatment of a large part of physics, and the whole of chemistry are thus completely known, and the difficulty lies only in the fact that application of these laws leads to equations that are too complex to be solved."


I think you can do it, but you'd probably want to start with density matrices, or use the Heisenberg picture to keep your wavefunction super-simple. If we're talking to geniuses then maybe we can include a one-off statement, 'if det ρ = 0 so ρ = ψ ψ† for some "column vector" ψ, then the squared magnitudes of ψ's components are probabilities to be in that component's corresponding state.' to get the gist of it.


This sounds like a fantastic exercise to assign to physics majors in some sort of capstone class. What a neat idea. I may have to try this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: