Yep true, generally baking caches into build images is a bit fragile tho, the more caches and different types of jobs you have the larger the set of combinations of images you need to cover it. Often ends up easier to decouple build image from caches, and ideally calculate a cache key based on a yarn.lock or similar (not supported by gitlab).
Also image pull performance is generally pretty poor from most registries, you will get much higher speeds out of a straight s3 download then pulling an image via docker or containerd, even tho ecr for ex is backed by s3 .
I haven’t actually used ootb gitlab caching in quite a long time because of these limitations, but would be nice for it to work great ootb!
In some projects, you can just do something like the following (Java centric example, but applies to other stacks as well):
- have Nexus (or another solution) running with a Docker registry next to your GitLab Runner nodes, internal network
- setup Nexus to proxy all of the Maven/NuGet/whatever packages as well, in addition to Docker images in the registry
- setup GitLab CI to build the base container images that you need weekly (e.g. Maven with whatever you need on top of it for builds, another base JDK image as a runner etc.), this includes not just stuff in your Dockerfile, but also those that you'll use for Runners (assuming Docker executor)
- setup GitLab CI to build an intermediate base image for your project with all of the dependencies every morning (assuming that your project has lots of dependencies that don't change often though)
- base your actual builds on that image, which is based on the Maven image, which will essentially cut out the need to pull everything from the Nexus Maven repo
- it will still allow you to add new dependencies during the day (e.g. new library), which will be pulled from the Internet and then cached both in Docker layer cache for your current pom.xml file, as well as your Nexus intermediate repo
You don't even need the intermediate dependency images, as long as you have control over which Runners your project has, with dedicated ones the Docker cache should be sufficient on its own. Right now i no longer have to fear Docker rate limits anymore and most of my builds don't even hit the Internet either at all.
I applied some of those principles at a large enterprise project and the builds went from about 7 minutes to 3, with no drawbacks to speak of. Of course, the next step would be incremental code compilation and not wasting time there, however seeing as even some of the above is overkill for many scenarios, that's probably situational.
Ah ok sounds like a good setup, I could never get past the slow image pull times for systems like that though? Maybe I missed a setting. Containerd especially was quite slow at pulling images. Image of 2GB~ could take a couple of minutes to come down, order of magnitude more than straight s3 download. Once you’ve got your images fine, but if your build nodes are elastic you can easily hit cold/new nodes and pay the penalty.
> Once you’ve got your images fine, but if your build nodes are elastic you can easily hit cold/new nodes and pay the penalty.
If you have your own Nexus or Artifactory, or even just a registry that's configured as a pull through cache (https://docs.docker.com/registry/recipes/mirror/), then it shouldn't be a problem.
If those two servers are in an internal network, even better, otherwise just get a VPS with a good port speed. Honestly, even 100 Mbps should be enough for 2 GB, taking less than 30 seconds, with 1 Gbps that time becomes 2 seconds: https://techinternets.com/copy_calc
The difference here would be that you're using your own software on your own servers and therefore can deal with the load that you generate yourself with the full allotment of resources that you have, vs having to rely on public registries that are used by tens of thousands of other developers/processes at the same time.
That's not that different from S3 either (apart from maybe more capacity being available to you on private buckets, depending on the vendor), since you can also use something like MinIO or Zenko on your own servers as well instead of relying on the cloud.
Depending on the languages and environments involved, builder images might be needed to reduce complexity with installing build tools. Like, C++ with gcc and controlling the exact version being used (major versions may introduce ABI breaking changes, I've seen it with armhf and gcc6 a while ago). Builder images also reduce the possibly for users to make mistakes in CI/CD before_script/script sections. An optimized pipeline may just include a remote CI/CD template with the magic happening in central maintained projects, using the builder images.
I believe you can specify a docker image to be used for running a job so you can bake things like node_modules into the image.