It's been a while since I read such a long and detailed blog post (a 6 month effort according to OP).
> With the default limits, Lambda scales to 3 000 containers in a couple of seconds. That means that with default limits, we get 30 TB of memory and 18 000 vCPUs in a couple of seconds.
> With a legitimate workload and increased limits, as we just saw, we are now living in a world where we can instantly get 150 TB of RAM and 90 000 vCPUs for our apps
Going from 0 to 150TB instantly is powerful, but I don't think there is anybody that needs that.
It's nowhere near these volumes, but at work we recently experienced traffic spikes like 35-fold in the span of 30 seconds for a few minutes, then faded regularly for a few hours. Our scaling was too slow to be of any use for such pattern. There are legitimate needs for things that scale as fast as possible.
E-commerce flash sales, DDoS/spam prevention (where you're looking at the content, not just blocking traffic), data analytics. Not only for the e-commerce site but downstreams systems like payment processors.
There's also downstream infrastructure, too. If your website scales, you now have a bunch more log and metric data--what do you do with that?
A lot of times, data analytics "can be slow" but a human is sitting around waiting for processes to complete. A boost in speed usually means a boost in productivity.
There's also IoT. I think ~2012 GM was dumping terabytes of engine sensor data per airplane flight. Being able to process that quickly could mean the difference between being able to adjust engine fuel efficiency between flights, for instance.
I think telecom/voice/audio processing have pretty tight latencies. Even "second" latencies are probably too slow for those types of systems.
In addition, reducing scaling time potentially means reducing scaling cost since you're paying for compute you might not necessarily be using (depends on the billing model).
We have a compute job scheduling system that often sees swings like that (bigger, actually!), where it goes from idle to having a ton of newly queued jobs ready to run. But we use EC2 spot instances because they're much cheaper and the resource limits are much higher. It wouldn't matter if Fargate scaled out arbitrarily high, arbitrarily fast -- that's not really the problem with Fargate for our type of usage.
We do orchestrate the EC2 spot instances ourselves, because ASGs and Spot Fleets are way too slow. Spot Fleet especially is like calling your grandma to have her click around in the AWS Console every time you want to scale. AWS is way too "polite" to their own infrastructure but when you do it yourself, you can slam right up against your rate limits.
I can think of an example; the gambling industry. When a large event (such as a big football match, the Grand National, etc) finishes, all the bets will need settling. It's difficult to do big swathes at once, since different accounts might qualify (or not) for promotions or similar.
Any other large-scale individual action driven by a particular event happening could also fall under this description; concert tickets going on sale for example.
Of course many workloads might not need to go from zero to 3,000 containers that quickly - but it's useful to know that the underlying infrastructure can do this so that you know it ought to handle far smaller demand spikes without much concern.
Using Fargate for on-demand provisioning remains slow for us (> 60sec).
Without Docker image caching support (1) and working within a VPC (2), it easily takes 2+ minutes.
This blog post doesn't account for an extremely important detail.
Right off the bat it mentions "Fargate is now faster than EC2" but this is not the case for practical usage on pre-sized clusters, at least not on EKS.
As of today you can't avoid a ~30 second hard delay on starting a container on EKS Fargate. It's the time it takes for Amazon to understand you want to run a workload and provide the Fargate driven compute resources into your cluster so it's available to run your workload.
Then on top of that, you will wait however long it takes to download your image from a registry because images aren't cached anywhere.
In practice this usually means 40-60 seconds to just start a container on EKS Fargate, and if you happen to perform jobs before your main app runs such as a database migration it means you're paying this penalty twice (even if your migration is a no-op since it spawns a pod to run the command), so it's really a ~90-120 second delay to start a container every single time you deploy your application, which you need to wait for each replica too.
Compare that to having an EC2 based cluster where you have the resources provisioned ahead of time. Starting your containers will happen as fast as your process will start, there's 0 seconds waiting for the compute resources to be available and since it's just an EC2 node images are saved on disk so you only need to download the diff (often very small for app code changes).
In practice the difference in night and day, I'm so happy we moved away from Fargate to managed EC2 nodes on EKS. Things are SO much faster, everything from being able to develop and test things on EKS to every single release of all of our services.
> That means keeping unused or partially used EC2s up which means $$$.
It does but having a pre-sized cluster that's overprovisioned can end up being cheaper than using Fargate because Fargate is really expensive. This isn't accounting for using spot instances on EC2 too.
Of course your mileage may vary but as long as you're not dealing with massively irregular spikes where you're spinning up hundreds or thousands of containers up / down based on load you can very easily just run a few [dozen] replicas of your apps on a pre-sized cluster with enough room to let Kubernetes horizontally scale your pods as needed.
For scaling the cluster's size there's Karpenter which is mentioned in the post. It can quite quickly give you new resources in a Fargate-like manner without needing to use Fargate. The benefit of Karpenter is it's faster than auto-scaling groups and you can spin up uniquely sized instances based on your workload.
Let's say you plan to run 5 web apps, 3 replicas of each and they all need 4gb of memory.
Fargate demands you reserve 2 vCPUs for 4gb even if your app only needs 20% of 1 CPU, so you're wasting 1.8 CPUs per replica of each app. Often times web apps use a lot more memory than CPU, even with Ruby or Python your web app might be humming along at like 5% CPU load while using potentially gigs of memory.
That leaves you paying:
5 * 3 * 4gb of memory = 60gb of memory = $195.00
5 * 3 * 2 cpus = 30 cpu cores = $886.50
= $1,081.50
You're wasting a ton of CPU here because of AWS' rules on EKS Fargate memory / cpu size combos.
To run your workload on EC2 you can grab yourself (4) m6i.xlarge instances which has 16 CPU cores and 64gb of memory of total capacity which is $560.64 (grand total for 4).
In this case Fargate is double the cost, to me that is a lot. You do have less CPU capacity but it doesn't matter, your apps aren't coming close to that in CPU usage.
If your work load supported it (stateless web apps that can finish a request within 2 minutes) you could use EC2 spot instances too which would be $122.35 total instead of $560.64.
Both regular instance and spot instance prices were taken from AWS' official docs, I just multiplied their hourly values by 730 which is how many hours AWS uses to calculate 1 month.
> With the default limits, Lambda scales to 3 000 containers in a couple of seconds. That means that with default limits, we get 30 TB of memory and 18 000 vCPUs in a couple of seconds.
> With a legitimate workload and increased limits, as we just saw, we are now living in a world where we can instantly get 150 TB of RAM and 90 000 vCPUs for our apps
Going from 0 to 150TB instantly is powerful, but I don't think there is anybody that needs that.