I think the point of the article is that there isn't 2^(32x32x3x8) meaningful images that can be encoded - there's far less than that because so many of the possible arrangements of the data are just meaningless noise, or millions of versions of the same thing with slightly altered color values etc.
The article is saying that neural networks are cool, because part of what they do is finding which images actually contain meaningful visual information.
But they don't, they often conflate noise with actual features, as long as the noise has some statistical bias, which given enough random generations, will. Try it yourself, generate random images using a uniform distribution (not gaussian) and run a SOTA classifier on them. Eventually you'll hit some minor false positives.
It errs in the same way we often see objects in clouds or constellations.
I got the point quickly enough (from the title and initial table of snapshots?), but author has lost me as a reader due to saying little with too many words before they could elaborate on the point.