> "Compare that with AWS EKS pricing where the Kubernetes cluster alone is going to cost you $144 a month. A cluster isn’t terribly useful without something running on it, but just wanted to highlight that Fargate doesn’t come with a cost overhead."
That $144 a month is a deal considering it's just the price of a few instances that you'd normally provision anyway if you were rolling your own k8s cluster, and it manages the control plane administration for you. The moment your hand-rolled k8s cluster goes the way of so many horror stories [1] that savings evaporates into hours of expensive engineering labor hours wasted.
Service IP configurability is a very common ask, and as you’ve linked, is on our roadmap along with a slew of other control plane configuration options.
You can delete the AWS VPC CNI DaemonSet and install any CNI plugin you’d like.
EKS regularly backs up etcd and has automatic restore in the case of a failure. Manually restoring to an old snapshot would be quite disruptive. What is your use case, and what would be the interface you’d like to see?
Indeed, the networking layer that EKS forces upon you is quite strange. For example, you can only run 17 pods on a t3.medium instance because of how IP addresses are managed in your VPC (and because of various mandatory daemonsets that run pods on every machine, like aws-node). Most people will likely find out about this very low limit not in normal operations, but when one AZ blows up and pods are rescheduled on other AZs. You will hit the pod limit, things will be down and nothing new will schedule. And Amazon provides no meaningful monitoring for health at the control plane level; you need to bring all that yourself.
Managing worker nodes is also a huge pain. They recently released a tool that manages some of it, but it doesn't work with older clusters. To do something simple like fix a vulnerability in the linux kernel, you have to make a new CloudFormation stack (involving cutting and pasting a ton of random stuff; the template ID, the AMI ID, etc.), edit several security groups, start the new nodes, make sure they join, drain the old nodes, delete the old stack, edit security groups. Upgrading the Kubernetes version in the cluster is a similar story, especially if you skipped an intermediate version. (That said, incidental changes like adding more nodes is easy. It's a lot rarer than upgrading Linux or the k8s version, however.)
Their load balancers other than Classic don't work every well either. I would prefer to use a NLB for all incoming traffic (terminated by Envoy in the cluster which does the complicated routing) but that apparently results in kube-proxy not working right... and every time you change a set of pods whose selector matches the NLB service, it changes all the IP addresses in DNS, and traffic just stops arriving until the DNS TTL changes. It is very broken and the docs should say "never even think about using this" instead of "it's beta and here are some weird caveats that probably won't dissuade you from using it".
Anyway, my overall impression of EKS is that Jeff Bezos walked into someone's office, said "you're all fired if we don't have some half-ass Kubernetes service in the next 3 months" and walked out. The result is EKS. It's wayyyyyy better than being locked into Amazon's crazy stuff (CloudFormation, etc.) but it's not as good as what other managed k8s providers offer.
If I were to setup k8s again, I would just buy my own servers and self-host everything. Dealing with Things That Can Go Wrong With Servers is much easier than dealing with Things That Can Go Wrong With AWS. (But I already have a datacenter, can get the machines a 100Gbps connection to the Internet for free, and have a team of network engineers to tell me which CNI to use. If I didn't have that I would just use GCP.)
In theory, you could replace the CNI on worker nodes, but is that something that is practically useful (when it can't be done on master nodes in EKS) and supported? How would the kube-apiserver, for example, communicate to the metrics-server if it is not connected to the Calico network?
You are correct that the API server is only aware of the VPC network, and not any overlays. One solution to the metrics-server or other webhooks is to use host-networking mode so the API server can have connectivity.
Absolutely, but it's worth mentioning that it's not zero-cost for those providers. (I have no idea what their actual cost is, and I assume they take steps to minimize it...)
EKS is priced competitively, with Amazon's other offerings. I was a part of the chorus of voices saying "wtf Amazon, control planes ought not cost, nobody else is charging for them" but I think this is fairly priced for what it is... it only takes four reserved M4.large instances to eclipse the cost of an EKS cluster, and you will likely want more than that if you are aiming for real High Availability and trying to build it on your own EC2 nodes.
The default configuration of EKS is fully HA, AIUI is built to be resilient to faults like failure of an entire region.
If you don't want to pay for the Kubernetes API, the interop chances for engaging outside support that it gives you, and the associated complexity required to support it and keep pace with the release cadence, then there is also Fargate which is cheaper at low scale, (or my preference, go with the competition who have all agreed to undercut AWS, so why not take advantage as it's clearly favorable to run at least some of those workloads elsewhere.)
And hey, there's even Virtual Kubelet on Fargate if you want to get the best of both worlds, right? Of course that one comes with a big warning, DO NOT run any Production...
That $144 a month is a deal considering it's just the price of a few instances that you'd normally provision anyway if you were rolling your own k8s cluster, and it manages the control plane administration for you. The moment your hand-rolled k8s cluster goes the way of so many horror stories [1] that savings evaporates into hours of expensive engineering labor hours wasted.
[1] https://k8s.af