Hacker News new | past | comments | ask | show | jobs | submit login

If anyone has any tips on keeping up with control plane upgrades, please share them. We're having trouble keeping up with EKS upgrades. But, I think it's self-inflicted and we've got a lot of work to remove the knives that keep us from moving faster.

Things on my team's todo list (aka: correct the sins that occurred before therealfiona was hired):

- Change manifest files over to Helm. (Managing thousands of lines of yaml sucks, don't do it, use Helm or similar that we have not discovered yet.) - Setup Renovate to help keep Helm chart versions up to date. - Continue improving our process because there was none as of 2 years ago.




This is definitely a hard problem.

One technique is to never upgrade clusters. Instead, create a new cluster, apply manifests, then point your DNS or load balancers to the new one.

That technique won't work with every kind of architecture, but it works with those that are designed with the "immutable infrastructure" approach in mind.

There's a good comment in this thread about not having your essential services like vault inside of kubernetes.


This indeed seems like The Way but I have no idea how it works when storage is involved. How do Rook or any other storage providers deal with this?

If Kubernetes is only for stateless services, well, that's much less useful for the org to invest in.


Any state that a container uses, such as databases or static assets, should be mapped to something outside k8s, no? I thought container orchestration was only for app later


In the early days that was true. K8s has had many options for stateful containers for a while though.

https://kubernetes.io/docs/concepts/storage/


we are talking in a context where you would spin up a new cluster whenever you want to upgrade kubernetes version.

In that case you don't want to migrate application/user data so you are kind of forced to keep DBs and filesystems outside.


adopt a gitops tool (argo or flux):

- create a dev cluster

- create a test cluster

- manage dev & test clusters with gitops tool

- copy payloads and objects from your prod cluster in the gitops tool

- compare test to prod. try new things out in dev. tear down dev cluster at will

- create a new prod cluster based off of test cluster

- migrate things over slowly


I never understood gitops. You introduce a whole new class of problem- syncing desired state from one place to another

Kubernetes is a perfectly good place to keep your desired state. I think it would be in most people’s best interests to learn to maintain, failover, and recover Kubernetes clusters, so they can trust the API, rather than trying to use tools to orchestrate their orchestration


How do you deploy workloads to your clusters then? `kubectl apply -f`? Another form of CI/CD?

Assuming you have some sort of build pipeline that also pushes to your cluster, Flux does the same thing whilst ensuring whatever was pushed never diverges.


Really? So how would you version control changes to anything?

> learn to maintain, failover, and recover Kubernetes clusters

Kubernetes administration is a completely different topic to gitops, the two compliment each other - they don't replace each other.


We use helm which gives all of the version control we’ve ever needed


So how do you manage the versions of helm charts you have installed? Or are you manually running helm install from a local machine?


We either install a new version of a helm chart, or we roll back. we have rollback jobs to roll back, and our CI/CD pipelines or our maintenace jobs do the install of the new version, depending on whether it's our app or a dependency


IME EKS version upgrades are pretty painless - AWS has a tool that tells you if any of your resources would be affected by an upcoming change even.


It's not the EKS upgrade part that's a pain, it's the deprecated K8S resources that you mention. Layers of terraform, fluxcd, helm charts getting sifted through and upgraded before the EKS upgrade. You get all your clusters safely upgraded, and in the blink of an eye you have to do it all over again.


We address this by not using helm, and not using terraform for anything in the cluster. Kustomize doesn't do everything you'd want from a DRY perspective, but at least the output is pure YAML with no surprises.

We upgrade everything once a quarter. Usually takes about four hours per cluster. Occasionally we run into something that's deprecated and we lose another day, but not more than once a year.


Such a pity that helm makes this so awful. I suppose one could keep using it to package up complex deployments and tweak them with a values.yaml as long as you just use that to writeout kustomize and install that.


Go get a cluster manager like Rafay or Spectro cloud. There are a lot of footguns in cluster management: cert management, ingress controller, IaC (TF versioning is a pain), etc.

a cluster manager isn’t cheap but it sounds like you are getting buried. If you’re on 1.23 or up though, you at least have a year now to fix it.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: