Hacker News new | past | comments | ask | show | jobs | submit login

You are misunderstanding both BMA and Gaussian mixtures. The posterior on the world state is a distribution over Models x Model Params.

To make predictions about the world, you compute an nexpected value (average) over Models. The explicit assumption of both GM and BMA is that only one gaussian or only one model is correct - you just don't know which one, and therefore need to average over your uncertainty to take into account all possibilities.

Idea averaging (as described in the article) is about averaging over world states not models.




> The explicit assumption of both GM and BMA is that only one gaussian or only one model is correct - you just don't know which one, and therefore need to average over your uncertainty to take into account all possibilities.

This is so wrong I don't even know where to start. In fact, this is basically the notion of frequentism! One of the very most fundamental ideas of Bayesian reasoning is that there is no one true set of parameters nor is there one true model. There is only your state of knowledge about the space of all possible parameter sets or all possible models. I'm very surprised to see you, of all people, claiming this. Even a cursory Google search of BMA fundamentals disconfirms what you are saying, e.g. [1]

[1] http://www.stat.colostate.edu/~jah/papers/statsci.pdf

> Madigan and Raftery (1994) note that averaging over all the models in this fashion provides better average predictive ability, as measured by a logarithmic scoring rule, than using any single model Mj, conditional on M. Considerable empirical evidence now exists to support this theoretical claim...

and so on.


This is so wrong I don't even know where to start. In fact, this is basically the notion of frequentism!

Frequentism doesn't even allow you to represent your belief with a probability distribution.

One of the very most fundamental ideas of Bayesian reasoning is that there is no one true set of parameters nor is there one true model. There is only your state of knowledge about the space of all possible parameter sets or all possible models.

Bayesian reasoning says there is one true set of parameters/model, you just don't know which one it is. The posterior distribution allows you to represent relative degrees of belief and figure out which model/parameter is more likely to be true.

Assuming you gather enough data, your posterior distribution will eventually approximate a delta function centered around that one true model. This is also what happens when you do BMA or gaussian mixtures.

The fact that model averaging provides better predictive ability doesn't contradict what I said.


> Assuming you gather enough data, your posterior distribution will eventually approximate a delta function centered around that one true model. This is also what happens when you do BMA or gaussian mixtures.

What happens when your data is actually distributed according to a mixture of gaussians (c.f. Pearson's crabs).


In a situation like that, you have a space of models M and a space of populations P. The state space is M x P, and the model M generates the population P (probabilistically). In Pearson's crab example, there are some models where an individual crab can come from one or more gaussians.

The generative model (first choose gaussian, then draw from it) is the specific model. As you gather a sufficiently large sample from the population P, eventually you'll converge to a delta-function on the space of M.

So in Pearson's crab example, given enough data, you'll eventually either converge to the single model of 2 gaussian crab species, or you'll converge to a 3 gaussian crab species model, or you'll converge to a weibull single species model (assuming you put all 3 of those models into your prior).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: