GANs are pretty good at not averaging as that is fairly easily spotted by the discriminator. It would assume the training set is biased to more attractive (who make better subjects for photos) so it would generate unattractive people at a lower probability than exists in real life. To do it properly you’d have to pull the real examples from the model training set and I didn’t see where they did that.
They are better than most VAE models at not averaging, but they may still not be covering the entire data distribution. That would probably imply that the underdispersed samples are closer to the average. We don’t have great metrics for figuring out how complete that coverage is.