If anyone has any tips on keeping up with control plane upgrades, please share t...

catherinecodes · 2024-02-06T22:32:17 1707258737

This is definitely a hard problem.

One technique is to never upgrade clusters. Instead, create a new cluster, apply manifests, then point your DNS or load balancers to the new one.

That technique won't work with every kind of architecture, but it works with those that are designed with the "immutable infrastructure" approach in mind.

There's a good comment in this thread about not having your essential services like vault inside of kubernetes.

jauntywundrkind · 2024-02-06T22:51:39 1707259899

This indeed seems like The Way but I have no idea how it works when storage is involved. How do Rook or any other storage providers deal with this?

If Kubernetes is only for stateless services, well, that's much less useful for the org to invest in.

unethical_ban · 2024-02-06T23:52:19 1707263539

Any state that a container uses, such as databases or static assets, should be mapped to something outside k8s, no? I thought container orchestration was only for app later

prng2021 · 2024-02-07T00:02:53 1707264173

In the early days that was true. K8s has had many options for stateful containers for a while though.

https://kubernetes.io/docs/concepts/storage/

prmoustache · 2024-02-07T00:46:58 1707266818

we are talking in a context where you would spin up a new cluster whenever you want to upgrade kubernetes version.

In that case you don't want to migrate application/user data so you are kind of forced to keep DBs and filesystems outside.

eddyfromtheblok · 2024-02-07T02:22:50 1707272570

adopt a gitops tool (argo or flux):

- create a dev cluster

- create a test cluster

- manage dev & test clusters with gitops tool

- copy payloads and objects from your prod cluster in the gitops tool

- compare test to prod. try new things out in dev. tear down dev cluster at will

- create a new prod cluster based off of test cluster

- migrate things over slowly

jonfw · 2024-02-07T03:20:38 1707276038

I never understood gitops. You introduce a whole new class of problem- syncing desired state from one place to another

Kubernetes is a perfectly good place to keep your desired state. I think it would be in most people’s best interests to learn to maintain, failover, and recover Kubernetes clusters, so they can trust the API, rather than trying to use tools to orchestrate their orchestration

jonasdegendt · 2024-02-07T12:10:28 1707307828

How do you deploy workloads to your clusters then? `kubectl apply -f`? Another form of CI/CD?

Assuming you have some sort of build pipeline that also pushes to your cluster, Flux does the same thing whilst ensuring whatever was pushed never diverges.

glitchcrab · 2024-02-07T07:24:17 1707290657

Really? So how would you version control changes to anything?

> learn to maintain, failover, and recover Kubernetes clusters

Kubernetes administration is a completely different topic to gitops, the two compliment each other - they don't replace each other.

jonfw · 2024-02-07T13:00:58 1707310858

We use helm which gives all of the version control we’ve ever needed

glitchcrab · 2024-02-07T13:15:17 1707311717

So how do you manage the versions of helm charts you have installed? Or are you manually running helm install from a local machine?

jonfw · 2024-02-07T21:34:01 1707341641

We either install a new version of a helm chart, or we roll back. we have rollback jobs to roll back, and our CI/CD pipelines or our maintenace jobs do the install of the new version, depending on whether it's our app or a dependency

throwboatyface · 2024-02-06T22:16:35 1707257795

IME EKS version upgrades are pretty painless - AWS has a tool that tells you if any of your resources would be affected by an upcoming change even.

raffraffraff · 2024-02-06T22:29:33 1707258573

It's not the EKS upgrade part that's a pain, it's the deprecated K8S resources that you mention. Layers of terraform, fluxcd, helm charts getting sifted through and upgraded before the EKS upgrade. You get all your clusters safely upgraded, and in the blink of an eye you have to do it all over again.

physicles · 2024-02-07T06:19:23 1707286763

We address this by not using helm, and not using terraform for anything in the cluster. Kustomize doesn't do everything you'd want from a DRY perspective, but at least the output is pure YAML with no surprises.

We upgrade everything once a quarter. Usually takes about four hours per cluster. Occasionally we run into something that's deprecated and we lose another day, but not more than once a year.

raffraffraff · 2024-02-09T21:57:46 1707515866

Such a pity that helm makes this so awful. I suppose one could keep using it to package up complex deployments and tweak them with a values.yaml as long as you just use that to writeout kustomize and install that.

master_crab · 2024-02-07T03:31:52 1707276712

Go get a cluster manager like Rafay or Spectro cloud. There are a lot of footguns in cluster management: cert management, ingress controller, IaC (TF versioning is a pain), etc.

a cluster manager isn’t cheap but it sounds like you are getting buried. If you’re on 1.23 or up though, you at least have a year now to fix it.