If anyone has any tips on keeping up with control plane upgrades, please share them. We're having trouble keeping up with EKS upgrades. But, I think it's self-inflicted and we've got a lot of work to remove the knives that keep us from moving faster.
Things on my team's todo list (aka: correct the sins that occurred before therealfiona was hired):
- Change manifest files over to Helm. (Managing thousands of lines of yaml sucks, don't do it, use Helm or similar that we have not discovered yet.)
- Setup Renovate to help keep Helm chart versions up to date.
- Continue improving our process because there was none as of 2 years ago.
One technique is to never upgrade clusters. Instead, create a new cluster, apply manifests, then point your DNS or load balancers to the new one.
That technique won't work with every kind of architecture, but it works with those that are designed with the "immutable infrastructure" approach in mind.
There's a good comment in this thread about not having your essential services like vault inside of kubernetes.
Any state that a container uses, such as databases or static assets, should be mapped to something outside k8s, no? I thought container orchestration was only for app later
I never understood gitops. You introduce a whole new class of problem- syncing desired state from one place to another
Kubernetes is a perfectly good place to keep your desired state. I think it would be in most people’s best interests to learn to maintain, failover, and recover Kubernetes clusters, so they can trust the API, rather than trying to use tools to orchestrate their orchestration
How do you deploy workloads to your clusters then? `kubectl apply -f`? Another form of CI/CD?
Assuming you have some sort of build pipeline that also pushes to your cluster, Flux does the same thing whilst ensuring whatever was pushed never diverges.
We either install a new version of a helm chart, or we roll back. we have rollback jobs to roll back, and our CI/CD pipelines or our maintenace jobs do the install of the new version, depending on whether it's our app or a dependency
It's not the EKS upgrade part that's a pain, it's the deprecated K8S resources that you mention. Layers of terraform, fluxcd, helm charts getting sifted through and upgraded before the EKS upgrade. You get all your clusters safely upgraded, and in the blink of an eye you have to do it all over again.
We address this by not using helm, and not using terraform for anything in the cluster. Kustomize doesn't do everything you'd want from a DRY perspective, but at least the output is pure YAML with no surprises.
We upgrade everything once a quarter. Usually takes about four hours per cluster. Occasionally we run into something that's deprecated and we lose another day, but not more than once a year.
Such a pity that helm makes this so awful. I suppose one could keep using it to package up complex deployments and tweak them with a values.yaml as long as you just use that to writeout kustomize and install that.
Go get a cluster manager like Rafay or Spectro cloud. There are a lot of footguns in cluster management: cert management, ingress controller, IaC (TF versioning is a pain), etc.
a cluster manager isn’t cheap but it sounds like you are getting buried. If you’re on 1.23 or up though, you at least have a year now to fix it.
Things on my team's todo list (aka: correct the sins that occurred before therealfiona was hired):
- Change manifest files over to Helm. (Managing thousands of lines of yaml sucks, don't do it, use Helm or similar that we have not discovered yet.) - Setup Renovate to help keep Helm chart versions up to date. - Continue improving our process because there was none as of 2 years ago.