Most Monte Carlo methods used to calculate Bayesian posterior distributions are ...

nyan_sandwich · on Nov 18, 2013

MCMC isn't strictly serial; you can run multiple chains in parallel and get decent results (provided your break in period isn't too big), and this is commonly done for multicore.

The "all the data" problem is interesting. It seems solvable if each node summarizes the likelihood function for part of the data. I'll have to read the paper and see if that's what they did.

tristanz · on Nov 18, 2013

All of it's definitely solvable, it's just harder than lots of Monte Carlo problems that are embarrassingly parallel.

mathattack · on Nov 18, 2013

Thanks. This helps. Can you give a specific example of when someone would use this calculation, and where it would go wrong? Or what kind of problem requires the draws to be a function of all the data?

tristanz · on Nov 18, 2013

This is the core calculation to all of Bayesian statistics, so it's used pretty much everywhere. Almost any time you try to compute p(unknown|data) you run into computational issues. It's a high dimensional integration problem that has no closed form solution. There are models that can be simplified, but they are the minority.