Much commercially viable music is indeed pretty formulaic... so much so, that I wonder why we would even need something as sophisticated as "AI" to autogenerate it.
This is a total hot take, but my explanation is that music - even pop music - needs to do something new or slightly different from other songs you've heard before to make it worth listening to. And having "swagger" / something unique / an interesting point of view isn't something procedural generation (or diffusion based generation in general) seems to be that good at inventing. Look at minecraft. Every map is randomly generated, but they all look kind of the same. Its really obvious in stable diffusion - people drawn by stable diffusion always have a slightly soulless look, where the quirks and charisma of the subject have been sanded down to the point of being unmemorable.
I think the best music (just like the most productive artists) will soon be humans working together with AIs. Chess went through a period like this, where the best chess games were played by humans collaborating with AIs. But I think it'll be a few years yet before you get better art without humans being involved at all.