I was sort-of agreeing with you until this point. In my experience "write a Dockerfile" usually means "run a bunch of fragile, non-deterministic nonsense, like `apt-get update && apt-get -y install`, `yum install`, `pip install`, `npm install`, etc.; often all at the same time!"
Docker is a way of taking fragile, non-deterministic things like package installations, and make them less fragile, in very practically-useful ways.
Compared to the old way of doing these things (which would likely be a shell script that would run those exact same apt-get/pip/yum/npm install commands), Dockerfiles give you:
- a much more predictable starting state, which lets you remove a bunch of complexity from your installation script. The old style of installation script might need to detect which OS it's running on, which version of Python is installed, whether you're using GCC or clang, whether libgomp or lzma or whatever is installed, whether or not the script is running as root, whether or not there are existing sites configured in the local nginx install. With a Dockerfile built on a slow-moving image, you can know exactly where everything is, what version is installed, and whether there are any other conflicting applications.
- limiting side effects (As a user, you can be fairly sure that a `docker build` isn't going to clobber your nginx configs.)
Containers have a whole pile of downsides, but making builds _more_ fragile is not really one of them.
> the old way of doing these things (which would likely be a shell script that would run those exact same apt-get/pip/yum/npm install commands)
Shell scripts are certainly an "old way" of doing things, dating back to the 1970s. However, since the 1990s there's been a safer, more declarative way to solve this problem: package managers!
> detect which OS it's running on, which version of Python is installed, whether you're using GCC or clang, whether libgomp or lzma or whatever is installed... isn't going to clobber your nginx configs
These are trivially solved by making a package:
- To specify the desired OS, just depend on e.g. ubuntu-minimal-1.417 (or equivalent)
- To depend on Python, libgomp, lzma, etc. we just declare those as dependencies (specifying exact, known-good versions)
- To use gcc but not clang, specify one as a dependency and the other as a conflict (or vice versa)
- Packages cannot "clobber" existing files (like an Nginx config): if multiple packages try writing to the same path, the package manager will abort and roll back the transaction
In contrast, writing `apt-get install` in a script (whether it's inside a container or not) is just flat-out wrong: it's using a package manager incorrectly.
> Dockerfiles give you... a much more predictable starting state
"Much more predictable" than what? Established tools like `debootstrap` start from an empty folder; I can't imagine anything more predictable.
> Containers have a whole pile of downsides, but making builds _more_ fragile is not really one of them.
I never said containers make builds "more fragile"; I actually think the exact opposite! Containers are great; in the above example, after debootstrap is finished we can run `tar` on the result to get a standard OCI container image.
My complaint is about Docker, and in particular Dockerfiles, which encourage fragility and non-determinism in two ways:
- For some reason, there's a strange belief that Dockerfiles are somehow related to building/packaging software (e.g. projects providing their software in container images, defined using Dockerfiles). That's a bait-and-switch: Dockerfiles are basically just scripts; we still need to choose what commands to run in order to actually build/package our software. If we believe that Docker is solving our building/packaging problem, then we're less inclined to run "proper" build/packaging tools in those containers (e.g. dpkg, rpm, nix, etc.); we're more likely to write crappy imperative shell scripts instead, since that's all that Docker itself provides.
- Container images produced by Dockerfiles are essentially a cache of their output. This avoids having to deal with the fragility and non-determinism of our scripts up-front, until we need to rebuild. As long as it "worked for me" at some point, we can keep using the resulting binary blob, even if it's completely unreproducible and bears no relation to its supposed "source code" (e.g. `apt-get update && apt-get -y install` just copies whatever files happen to be on the server that day). We can even "push" those blobs to a "registry", so others can avoid having to run our crappy scripts.
(Note that the above focuses on binary package managers, since the example was Dockerfiles running `apt-get` and `yum`. Personally I switched to using Nix about a decade ago, which is source-based and even more deterministic than those tools; e.g. apt-get and yum use version numbers and constraint solvers, whilst Nix uses hashes)
I was sort-of agreeing with you until this point. In my experience "write a Dockerfile" usually means "run a bunch of fragile, non-deterministic nonsense, like `apt-get update && apt-get -y install`, `yum install`, `pip install`, `npm install`, etc.; often all at the same time!"
(AWS are a serial offender IMHO, e.g. their documentation recommends this sort of crap https://docs.aws.amazon.com/lambda/latest/dg/images-create.h... https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-dat... etc.)