Hacker News new | past | comments | ask | show | jobs | submit login
Dive – A tool for exploring each layer in a Docker image (github.com/wagoodman)
684 points by boyter on Nov 25, 2018 | hide | past | favorite | 42 comments



This is amazing, thanks for sharing it! A very useful feature that I always missed but never actually tough about.

I've had a hard time understanding the slices on the beginning will be very useful for troubleshooting bad images also while optimizing image sizes.

Can probably help to understand problems like the one bug where the image size got almost a 2x increase after a CHOWN, because the slice is added again with different owner ( I think ;) ) would be so much easier to see with this.

Ref for the old bug https://github.com/moby/moby/issues/5505


Newer docker versions have a --chown flag for COPY/ADD exactly to address that.


We added a lib to a java image and the size went up by about 4x the library size, I spent a few hours trying to track it down to no avail. Hopefully this will help pin down the issue.


Let us know if it’s a relatable issue. Might save someone else a headache in the future.


Same here. I'm relatively new to Docker. I've taken a public jessie image, and tweaked and committed locally. I can see that layers exist when I pull images, but just what they are has been a mystery.


Project idea: a Google for docker images.

Search for files by hash within all public docker images.

Find images that contain a certain piece of code.

Reverse engineering Dockerfiles for those that were built without.


The latter can be easily done in a few hours by inspecting the image layer metadata.

Every layer understands the command it was run in the Dockerfile to create itself. Just look at `docker history` and have a look at the "CREATED BY" field for human-readable output of the layer metadata, or depending on your graph driver have a look in /var/lib/docker/image/overlay2/imagedb/content/sha256. From there you can reverse-engineer a Dockerfile.

For layers that were not built using `docker build` (e.g. `docker commit`, OCI-compatible image builders), re-creating the exact command that generated that layer is much harder to do. The only information most tools will give you might just be the diff itself.


> Every layer understands the command it was run in the Dockerfile to create itself

How reliable is this? Can it be modified after creation by a malicious party?

That is, if I get a wild docker image, can I trust the results of `docker history`?


Cool!

New project idea ... Crawl a large set of popular docker images lacking Dockerfiles and attempt to recreate the dockerfile with this technique.


DockerSlim will reverse engineer / auto-generate a Dockerfile for you :)


A GitHub for Docker images might be the better analogy.

Docker the company could, I imagine, add deep search and browsing as enhancements to their existing Docker Hub.


Can someone well versed in Docker explain in their own words what this does and why it is useful? Thanks in advance! I'm definitely not a docker poweruser...other than build start and stopping..


A docker image is built in layers that stack on top of each other. Think of each layer as a commit in a git repository.

This project takes an image and shows the list of layers (=commits) and for each layer (=commit) allow you to see what was changed (=the diff).

A bit like a git log for a docker image.


If OP looks up some Dockerfiles on docker hub, this will make more sense. Look at this mysql file for instance:

https://github.com/docker-library/mysql/blob/696fc899126ae00...

If two Dockerfiles use debain:stretch-slim as their base, docker won't download both base images twice. Each command in the Dockerfile creates an image. If you copy a Dockerfile and just change the last line and build them in order, the 2nd build will run a lot faster and you'll see in the output it pulls out "cached images" since it's already run those commands (docker build --no-cache always forced a full build).

If you start making Dockerfiles and playing around with it, it will start to make more sense.


As others have noted here, this is why we use Nix https://nixos.org/


This is great! Docker feels imperative to me, and using this could help mitigate that.

Also looking into Nix to help my container workflow more declarative.


But a Dockerfile is imperative. It just lists, in order, the steps needed to build an image.


It's declarative in the sense that the instructions are set in text. But what goes inside a Dockerfile might be imperative, like running apt-get to install packages (i.e., running steps in order).


But what goes inside a Dockerfile might be imperative, like running apt-get to install packages (i.e., running steps in order)

But...that too is done via the Dockerfile, via the RUN command to execute commands inside the container at run time. Or alternatively using the ENTRYPOINT command.


I have been looking for something like this for like 2 years. This will be most useful for experienced devs new to Docker that don't quite understand what's going on and haven't had time to read the specs on how docker works.


Is there a smart way to make more layers than splitting everything up into modules and ADDing each one? Why doesnt docker have delta transfers yet?


Layers are delta transfers. Maybe not as smart as they could be, but there is a lot of under-appreciated complexity hiding here.


layers are not delta transfers in my opinion, they are just new copies or removed files. its like saying sftp is delta transfer because you don't have to send your entire disk image every time. I love layers and dockers very nice reusing of base layers and such but i feel like delta transfers would dramatically improve my workflow and upload times.


> its like saying sftp is delta transfer because you don't have to send your entire disk image every time

I see your point, but the comparison to sftp seems really off. Overlay2 [1] indeed works on whole file differences, just as sftp would do.

Additionally, I don't think making actual byte-level deltas would yield you much improvement on container size. It would also increase access time unless you keep a cache that doubles storage requirements per image. Dockerfiles primarily add files rather than change them, and actual changed files are often small.

[1] https://docs.docker.com/storage/storagedriver/overlayfs-driv...


> i feel like delta transfers would dramatically improve my workflow and upload times

If you think delta transfers would help, you're saying you have a bunch of data that doesn't change between image builds, but you have to re-upload all that data every time.

This actually sounds like a perfect use-case for image layers. If you can rearrange your Dockerfile to put the steps where files change more rarely before steps that change frequently, docker will automatically cache the earlier steps and not reupload if that layer already exists.

Image layers are at their heart already an implementation of a cache + delta transfer system. (Compared to some other delta systems it trades storage efficiency for computational efficiency.) You can get small delta transfers, but you have to work with docker via your Dockerfile to make it happen.

For example, that could mean: moving steps up or down the Dockerfile so more frequently changed layers come later; splitting a step into two steps to exploit the previous; making a step deterministic so you don't have to force rebuild; use multi-stage builds to make all of these easier to implement.


My understanding is the on-disk storage might not be delta, as tar files, but the transfer on the wire is... could be wrong, but I think they intentionally made delta-transfer a docker registry network concern? I suppose if you wanted delta storage on-disk that again could be an implementation concern, knowing where to pull and reassemble the bits into a standard image? Oh and it uses gzip compression on the wire and possibly elsewhere, again a form of delta compression.


Nice!! have been waiting for something like this for a while. Gonna try this for sure.


just curious, how do you intend to use this? I mean, what problem does exploring each layer in a Docker solve for you?


"why does this small change increase the size of my image by 500mb"


A few people pointed me to this dive tool after I wrote up these notes on shrinking Docker images: https://simonwillison.net/2018/Nov/19/smaller-python-docker-...


It does two things for you: a) you get to visualize and explore the image layer by layer, and b) it will analyze and score the image with a % efficiency, listing all of the potential "inefficient" files.

In this way it's not just showing you a score... the score doesn't help you make the image any better. This is why letting the user explore the layers is good, to help discover and explain why there is an inefficiency (not just that there is one).


This looks spectacular. This could solve a large chunk of my challenges with Docker in one swoop. Really excited to use this!


Is there a tool to compare 2 or more images to check which layers they have in common?


There's no tool I'm aware of, but with experimental features enabled in the Docker CLI you can use `docker manifest inspect` on each image manifest and diff the content. e.g. `docker manifest inspect ubuntu:latest`.

This requires that you have logged in to the registry, pushed the image to the registry and have `docker pull` rights for the image. You could also run a registry locally, push your image there and inspect your registry's storage db. There's just no CLI command to do that.


Thanks!


This used to be a tool to do it, but it hasn't been kept up to date with changes to image creation and storage. https://imagelayers.io


Beyond Compare for Docker/Moby seems like an ideal use case for an inspection tool such as this. While space is cheap finding dbs which could be consolidated could have some tangible benefits for just about everyone.


Very, very useful. Thank you for sharing this!


dude this is sick


This is amazing!


Good tool!


Off topic but anyone notices lot of issue with Dockerhub recently? Such as image tag disappear :( randomly fail to push or take a very long time to pull




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: