I've read several pieces on Bayesian stats, and I've done some nontrivial statis...

tel · on Sept 14, 2010

You're confusing p(data) with p(data|data) which is, trivially, equal to 1.

p(data) is better formulated as p(data|F) where F codifies your assumptions about the possible generative probability models that you're building your likelihood function from. Or, similarly, F codifies your understanding of the world and the possible things that could occur within it.

This makes p(data|F) a perfect normalizing constant for the numerator of Bayes' Theorem since the numerator implies a choice of a specific model in the family F, but p(data|F) averages over all possible models/worlds/parameter choices (contained in F).