Hacker News new | past | comments | ask | show | jobs | submit login

If you only care about the final image size, then `docker build --squash` squashes the layers for you as well.



Definitely, although it's worth noting that while the image size will be smaller, it will get rid of the benefits of sharing base layers. Having less redundant layers lets you save most of the space without losing any of the benefits of sharing slices. I think that is the main reason why this is not usually done.


--squash is still experimental, I believe multi-stage images are the new best practice here.


in case anyone doesn't know what that means, its basically this kind of dockerfile

  FROM the_source_image as builder
  RUN build.sh

  FROM the_source_image
  COPY --from=builder /app/artifacts /app/
  CMD ....
i'm not sure if you can really call it the new best practice though, its been the default for ... a very long time at this point.


Typically I wind up using a different source image for the builder that ideally has (most of) the toolchain bits needed, but the same runtime base as the final image. (For Go, go:alpine and alpine work well. I'm aware alpine/musl is not technically supported by Go, but I have yet to hit issues in prod with it, so I guess I'll keep taking that gamble.)


I take advantage of multi-stage builds, however I still think that the layer system could have some nice improvements done to it.

For example, say I have my own Ubuntu image that is based on one of the official ones, but adds a bit of common configuration or tools and so on, on which I then build my own Java image using the package manager (not unlike what Bitnami do with their minideb, on which they then base their PostgreSQL and most other container images).

So I might have something like the following in the Ubuntu image Dockerfile:

  RUN apt-get update && apt-get install -y \
    curl wget \
    net-tools inetutils-ping dnsutils \
    supervisor \
    && apt-get clean && rm -rf /var/lib/apt/lists /var/cache/apt/*
But then, if I want to install additional software, I need to fetch the package list anew downstream:

  FROM my-own-repo/ubuntu
  
  RUN apt-get update && apt-get install -y \
    openjdk-17-jdk-headless \
    && apt-get clean && rm -rf /var/lib/apt/lists /var/cache/apt/*
As opposed to being able to just leave the cache files in the previous layers/images, then remove them in a later layer and just do something like:

  docker build -t my_optimized_java_image -f java.Dockerfile --purge-deleted-files .
  
  or maybe
  
  docker build -t my_regular_java_image -f java.Dockerfile .
  purge-deleted-files -t my_regular_java_image -o my_optimized_java_image
Which would then work backwards from the last layer and create copies of all of the layers where files have been removed/masked (in the later layers) to use instead of the originals. Thus if I'd have 10 different images that need to use apt to install stuff while building them, I could leave the cache in my own Ubuntu image and then just remove it for whatever I want to consider the "final" images that I'll ship, which would then alter the contents of the included layers to purge deleted files.

There's little reason why these optimized layers couldn't be shared across all 10 of those "final" images either: "Hey, there's these optimized Ubuntu image layers without the package caches, so we'll use it for our .NET, Java, Node and other images" as opposed to --squash which would put everything in a single large layer, thus removing the benefits from the shared layers of the base Ubuntu image and so on.

Who knows, maybe someone will write a tool like that some day.


You will be happy to hear that already exists since. Read up on docker buildkit and the --mount option.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: