"The real trick is to see how well your model extrapolates from the data you hav...

CWuestefeld · on June 24, 2015

It's not just a question of how the model extrapolates from the input data itself. The actual input data may be in question as well, because there are always judgments involved in deciding how to measure, what "unreasonable" datapoints will be discarded, etc.

Read, for example, here:

"It is indisputable that a theory that is inconsistent with empirical data is a poor theory. No theory should be accepted merely because of the beauty of its logic or because it leads to conclusions that are ideologically welcome or politically convenient. Yet it is naive in the extreme to suppose that facts – especially the facts of the social sciences – speak for themselves. Not only is it true that sound analysis is unavoidably a judgment-laden mix of rigorous reasoning (“theory”) with careful observation of the facts; it is also true that the facts themselves are in large part the product of theorizing. ..."

http://cafehayek.com/2015/04/theorizing-about-the-facts-ther...

darkmighty · on June 24, 2015

While the general gist of your argument is right, I think there are some non-trivial ways to overfit. There are some 25 constants in the standard model apparently that describe the world around us to enormous precision. This is so little information that of course the trivial 'overfitting by encoding observations directly' will fail, but we could still be overfitting by having an excess number of variables: perhaps there's really some mechanism in neutrino physics that explains neutrino oscillation without needing some constants to describe how it really happens. This might in turn boost tremendously our predictive precision for neutrino oscillation to match the precision of the other more fundamental variables in the model, for example. But I think you're right that it's so little data that we have some strong information theoretic guarantees that at least the model will have predictive power matching the precision of previous measurements.

sgt101 · on June 24, 2015

Well - that's true apart from co-incidence. You can have a very simple theory which says "x is directly caused by y" and there is a lot of good data, and a great fit. But it's kist a co-incidence and breaks down immediately.

Occam's razor is a rule of thumb and an aesthetic boon, but nothing more.

The real test is that you have a theory that is meaningful and has explanatory power. If it grants insight on the mechanisms that are driving the relationships or generating the data and these make sense - you are pretty golden.

Another one is that the theory makes unexpected predictions that you can then test. This is a real winner, and why complex physics is so well regarded.

jwp729 · on June 24, 2015

I think the information theoretic approach to modeling concerns actually implies such "simpler is better" approaches as Occam's Razor. At least that's my take on [http://arxiv.org/abs/cond-mat/9601030], which derives a quantitative form of it.

sgt101 · on June 24, 2015

I haven't read that paper, and the abstract makes my head spin! I'll have a look later, and try and figure out the argument. I agree with you that things like the I-measure are based on the idea that simpler is good, and it works well in practice - both in Machine Learning and in the real world - which is why humans tend to prefer it. But (the paper you cite aside) I don't know of a deep reason why simple is preferred by nature.

Also there is a deep cognitive bias here, perhaps we lack the machinery to understand the world as it really is!

deciplex · on June 25, 2015

> Occam's razor is a rule of thumb and an aesthetic boon, but nothing more.

Occam's razor is a bit more than that. It isn't just that given a theory X and a theory Y = X + ε, both of which fit the facts, you should prefer X because it's "cleaner" or more aesthetically pleasing or whatever. You should prefer X because you can prove it is more likely to be true.

https://en.wikipedia.org/wiki/Solomonoff%27s_theory_of_induc...

possibilistic · on June 24, 2015

Do you happen to have this one to five pages of QM equations somewhere as reference? I would be very interested in reading that.

wnewman · on June 24, 2015

No, it was a thought experiment I made up, not an exercise I've ever seen performed: how abbreviated a description of quantum mechanics could I get away with and still convey the idea to on-the-eve-of-QM scientists?

The QM equations are naturally very short, the stuff that I would worry about expressing concisely are concepts like what probability amplitude is, how it connects to prior-to-QM concepts of probability, the interpretation of what it means to make an ideal measurement, stuff like that. I don't know of any bright concise formulation of that stuff, and I'm not sure how I'd do it. I am fairly sure, though, that 5 pages could get the job done well enough to connect to spectroscopic observations.

Note also in the original story it was intended to be given to Einstein and Planck, deeply knowledgeable in classical physics, so it'd be natural to use analogies that would be more meaningful to them than to the typical CS/EE-oriented HN reader. For example, I'd probably try to motivate the probability amplitude by detailed mathematical analogy to the wave amplitudes described by the classical wave PDEs that E. and P. knew backwards and forwards, and I don't think a concise version written that way would work as well for a typical member of the HN audience.

eru · on June 25, 2015

Scott Aaronson has some good motivation for "QM falls out naturally if you try and use a 2-norm for your probalities instead of a 1-norm." See http://www.scottaaronson.com/democritus/lec9.html

GlueChemist · on June 24, 2015

I believe he is referring to the 'postulates of quantum mechanics', you can find several formats from a quick google search.

Dirac, 1929: "The fundamental laws necessary for the mathematical treatment of a large part of physics, and the whole of chemistry are thus completely known, and the difficulty lies only in the fact that application of these laws leads to equations that are too complex to be solved."

drostie · on June 24, 2015

I think you can do it, but you'd probably want to start with density matrices, or use the Heisenberg picture to keep your wavefunction super-simple. If we're talking to geniuses then maybe we can include a one-off statement, 'if det ρ = 0 so ρ = ψ ψ† for some "column vector" ψ, then the squared magnitudes of ψ's components are probabilities to be in that component's corresponding state.' to get the gist of it.

Steuard · on June 24, 2015

This sounds like a fantastic exercise to assign to physics majors in some sort of capstone class. What a neat idea. I may have to try this.