pedantic nit-pick that doesn't detract from your main point: Turing completeness doesn't imply that it can never be idempotent. In fact we'd expect that a particular Turing machine given a particular tape should reliably produce the same output, and similarly any Turing-complete program given the exact same input should produce the exact same output. Turing machines are single-threaded, non-networked machines with no concept of time.
The main reason the RUN statement isn't idempotent, in my opinion, is that it can reach out to the network and e.g. fetch the latest packages from a repository, like you mention. (Other things like the clock, race conditions in multi-threaded programs, etc. can also cause RUN statements to do different things on multiple runs, but I'd argue those are relatively uncommon and unimportant.)
If it's Turing Complete you can't prove that it's hermetic. You can't even prove if it'll ever finish running (halting problem).
Maybe someday someone will invent a sort of EBPF for containers, but usually the first batch of RUN commands and the last are calling package managers, and in between you're doing things like creating users and setting permissions using common unix shell commands, which have the same problems.
With the possible exception of the unix package managers, we have no hope of those ever being rewritten inside of a proof system, because the package manager is often a form of dogfooding for the language community. The Node package manager is going to be written in Node and use common networking and archive libraries. Same for Ruby, Rust, you name it.
> If it's Turing Complete you can't prove that it's hermetic
Turing Machine model doesn’t apply to i/o so it is a meaningless statement.
Bazel and Nix achieve this by sandboxing i/o no turing incompleteness needed (tho bazel is still purposely non-tc but for totally different reason than hermeticity)
The main reason the RUN statement isn't idempotent, in my opinion, is that it can reach out to the network and e.g. fetch the latest packages from a repository, like you mention. (Other things like the clock, race conditions in multi-threaded programs, etc. can also cause RUN statements to do different things on multiple runs, but I'd argue those are relatively uncommon and unimportant.)