The real comparison would have been between Cifar-10 (or Imagenet) and a downsampled version there of, not upsampled, and I actually know for sure that this harms performance.. So this ideal world and real world training are definitely not the same!
This is the exact comparison we make in the paper.
We have subsampled ImageNet experiments as well; see Figure 3 in the full paper: https://arxiv.org/abs/2010.08127