This is amazing, thanks for sharing it! A very useful feature that I always missed but never actually tough about.
I've had a hard time understanding the slices on the beginning will be very useful for troubleshooting bad images also while optimizing image sizes.
Can probably help to understand problems like the one bug where the image size got almost a 2x increase after a CHOWN, because the slice is added again with different owner ( I think ;) ) would be so much easier to see with this.
We added a lib to a java image and the size went up by about 4x the library size, I spent a few hours trying to track it down to no avail. Hopefully this will help pin down the issue.
Same here. I'm relatively new to Docker. I've taken a public jessie image, and tweaked and committed locally. I can see that layers exist when I pull images, but just what they are has been a mystery.
The latter can be easily done in a few hours by inspecting the image layer metadata.
Every layer understands the command it was run in the Dockerfile to create itself. Just look at `docker history` and have a look at the "CREATED BY" field for human-readable output of the layer metadata, or depending on your graph driver have a look in /var/lib/docker/image/overlay2/imagedb/content/sha256. From there you can reverse-engineer a Dockerfile.
For layers that were not built using `docker build` (e.g. `docker commit`, OCI-compatible image builders), re-creating the exact command that generated that layer is much harder to do. The only information most tools will give you might just be the diff itself.
Can someone well versed in Docker explain in their own words what this does and why it is useful? Thanks in advance! I'm definitely not a docker poweruser...other than build start and stopping..
If two Dockerfiles use debain:stretch-slim as their base, docker won't download both base images twice. Each command in the Dockerfile creates an image. If you copy a Dockerfile and just change the last line and build them in order, the 2nd build will run a lot faster and you'll see in the output it pulls out "cached images" since it's already run those commands (docker build --no-cache always forced a full build).
If you start making Dockerfiles and playing around with it, it will start to make more sense.
It's declarative in the sense that the instructions are set in text. But what goes inside a Dockerfile might be imperative, like running apt-get to install packages (i.e., running steps in order).
But what goes inside a Dockerfile might be imperative, like running apt-get to install packages (i.e., running steps in order)
But...that too is done via the Dockerfile, via the RUN command to execute commands inside the container at run time. Or alternatively using the ENTRYPOINT command.
I have been looking for something like this for like 2 years. This will be most useful for experienced devs new to Docker that don't quite understand what's going on and haven't had time to read the specs on how docker works.
layers are not delta transfers in my opinion, they are just new copies or removed files. its like saying sftp is delta transfer because you don't have to send your entire disk image every time. I love layers and dockers very nice reusing of base layers and such but i feel like delta transfers would dramatically improve my workflow and upload times.
> its like saying sftp is delta transfer because you don't have to send your entire disk image every time
I see your point, but the comparison to sftp seems really off. Overlay2 [1] indeed works on whole file differences, just as sftp would do.
Additionally, I don't think making actual byte-level deltas would yield you much improvement on container size. It would also increase access time unless you keep a cache that doubles storage requirements per image. Dockerfiles primarily add files rather than change them, and actual changed files are often small.
> i feel like delta transfers would dramatically improve my workflow and upload times
If you think delta transfers would help, you're saying you have a bunch of data that doesn't change between image builds, but you have to re-upload all that data every time.
This actually sounds like a perfect use-case for image layers. If you can rearrange your Dockerfile to put the steps where files change more rarely before steps that change frequently, docker will automatically cache the earlier steps and not reupload if that layer already exists.
Image layers are at their heart already an implementation of a cache + delta transfer system. (Compared to some other delta systems it trades storage efficiency for computational efficiency.) You can get small delta transfers, but you have to work with docker via your Dockerfile to make it happen.
For example, that could mean: moving steps up or down the Dockerfile so more frequently changed layers come later; splitting a step into two steps to exploit the previous; making a step deterministic so you don't have to force rebuild; use multi-stage builds to make all of these easier to implement.
My understanding is the on-disk storage might not be delta, as tar files, but the transfer on the wire is... could be wrong, but I think they intentionally made delta-transfer a docker registry network concern? I suppose if you wanted delta storage on-disk that again could be an implementation concern, knowing where to pull and reassemble the bits into a standard image? Oh and it uses gzip compression on the wire and possibly elsewhere, again a form of delta compression.
It does two things for you:
a) you get to visualize and explore the image layer by layer, and b) it will analyze and score the image with a % efficiency, listing all of the potential "inefficient" files.
In this way it's not just showing you a score... the score doesn't help you make the image any better. This is why letting the user explore the layers is good, to help discover and explain why there is an inefficiency (not just that there is one).
There's no tool I'm aware of, but with experimental features enabled in the Docker CLI you can use `docker manifest inspect` on each image manifest and diff the content. e.g. `docker manifest inspect ubuntu:latest`.
This requires that you have logged in to the registry, pushed the image to the registry and have `docker pull` rights for the image. You could also run a registry locally, push your image there and inspect your registry's storage db. There's just no CLI command to do that.
Beyond Compare for Docker/Moby seems like an ideal use case for an inspection tool such as this. While space is cheap finding dbs which could be consolidated could have some tangible benefits for just about everyone.
Off topic but anyone notices lot of issue with Dockerhub recently? Such as image tag disappear :( randomly fail to push or take a very long time to pull
I've had a hard time understanding the slices on the beginning will be very useful for troubleshooting bad images also while optimizing image sizes.
Can probably help to understand problems like the one bug where the image size got almost a 2x increase after a CHOWN, because the slice is added again with different owner ( I think ;) ) would be so much easier to see with this.
Ref for the old bug https://github.com/moby/moby/issues/5505