I've read several pieces on Bayesian stats, and I've done some nontrivial statistics before. It still confuses me that p(data) != 1. I kinda wish the author had gone into detail about how to calculate the probability of an already-observed event.
You're confusing p(data) with p(data|data) which is, trivially, equal to 1.
p(data) is better formulated as p(data|F) where F codifies your assumptions about the possible generative probability models that you're building your likelihood function from. Or, similarly, F codifies your understanding of the world and the possible things that could occur within it.
This makes p(data|F) a perfect normalizing constant for the numerator of Bayes' Theorem since the numerator implies a choice of a specific model in the family F, but p(data|F) averages over all possible models/worlds/parameter choices (contained in F).