Hey HN! I'm Jacob, one of the founders of Depot (
https://depot.dev), a build service for Docker images, and I'm excited to show what we’ve been working on for the past few months: run GitHub Actions jobs in AWS, orchestrated by Depot!
Here's a video demo: https://www.youtube.com/watch?v=VX5Z-k1mGc8, and here’s our blog post: https://depot.dev/blog/depot-github-actions-runners.
While GitHub Actions is one of the most prevalent CI providers, Actions is slow, for a few reasons: GitHub uses underpowered CPUs, network throughput for cache and the internet at large is capped at 1 Gbps, and total cache storage is limited to 10GB per repo. It is also rather expensive for runners with more than 2 CPUs, and larger runners frequently take a long time to start running jobs.
Depot-managed runners solve this! Rather than your CI jobs running on GitHub's slow compute, Depot routes those same jobs to fast EC2 instances. And not only is this faster, it’s also 1/2 the cost of GitHub Actions!
We do this by launching a dedicated instance for each job, registering that instance as a self-hosted Actions runner in your GitHub organization, then terminating the instance when the job is finished. Using AWS as the compute provider has a few advantages:
- CPUs are typically 30%+ more performant than alternatives (the m7a instance type).
- Each instance has high-throughput networking of up to 12.5 Gbps, hosted in us-east-1, so interacting with artifacts, cache, container registries, or the internet at large is quick.
- Each instance has a public IPv4 address, so it does not share rate limits with anyone else.
We integrated the runners with the distributed cache system (backed by S3 and Ceph) that we use for Docker build cache, so jobs automatically save / restore cache from this cache system, with speeds of up to 1 GB/s, and without the default 10 GB per repo limit.
Building this was a fun challenge; some matrix workflows start 40+ jobs at once, then requiring 40 EC2 instances to launch at once.
We’ve effectively gotten very good at starting EC2 instances with a "warm pool" system which allows us to prepare many EC2 instances to run a job, stop them, then resize and start them when an actual job request arrives, to keep job queue times around 5 seconds. We're using a homegrown orchestration system, as alternatives like autoscaling groups or Kubernetes weren't fast or secure enough.
There are three alternatives to our managed runners currently:
1. GitHub offers larger runners: these have more CPUs, but still have slow network and cache. Depot runners are also 1/2 the cost per minute of GitHub's runners.
2. You can self-host the Actions runner on your own compute: this requires ongoing maintenance, and it can be difficult to ensure that the runner image or container matches GitHub's.
3. There are other companies offering hosted GitHub Actions runners, though they frequently use cheaper compute hosting providers that are bottlenecked on network throughput or geography.
Any feedback is very welcome! You can sign up at https://depot.dev/sign-up for a free trial if you'd like to try it out on your own workflows. We aren't able to offer a trial without a signup gate, both because using it requires installing a GitHub app, and we're offering build compute, so we need some way to keep out the cryptominers :)
Overall it’s pretty simple terraform setup plus a couple dockerfiles. And we get to run in the same region as the rest of our infra that’s close to most of our devs (us-west-2).
ECS might sound more complicated than “just use ec2” but we don’t have to screw around with lambdas and the terraform is pretty simple, much simpler then the Philips-labs one. It’s about 1400 lines of Terraform across 2 files since ECS has so much stuff built in and integrates with auto scale groups well.