I think the hand running through the wheat (?) is pretty good, object permanence is pretty reasonable especially considering the GAN architecture. GANs are good at grounded generation--this is why the original GigaGAN paper is still in use by a number of top image labs. Inferring object permanence and object dynamics is pretty impressive for this structure.
Plus, a rather small data set: REDS and Vimeo-90k aren't massive in comparison to what people speculate Sora was trained on.
Plus, a rather small data set: REDS and Vimeo-90k aren't massive in comparison to what people speculate Sora was trained on.