One could say that you need to understand something about the artifact you are compressing, but, to be clear, you can compress text without understanding anything about its semantic content, and this is what gzip does. The only understanding needed for that level of compression is that the thing to be compressed is a string in a binary alphabet.
Of course, which is why gzip is a good baseline for "better" compressors that do have semantic understanding.
The whole idea of an autoencoder is conceptual compression. You take a concept (say: human faces) and create a compressor that is so overfit to that concept that when given complete goobldygook (random seed data) it decompresses that to something with semantic meaning!