If you can't login to it then it is not good for development. If it is not good for development it is not good for production because ideally your dev and production environment should be the same.
I’ve been writing and managing development platforms on Kubernetes for 6-7 years.
In this time, I remember having to SSH into a host node exactly once. This was me, the platform engineer - not an application developer. Even then, having is a strong word. I could have just as well done with a privileged container with host access.
Application developers have nothing to do on the host. As in, they gain nothing from it, and could potentially make everything worse for themselves and the other applications and teams on the platform.
You should not be logging in to production unless something has gone seriously wrong. I've not seen a company where developers (minus a handful of "blessed" staff) even know how to access prod, let alone log in.
During development you will need to login to the pod, to review settings, directory contents and so on. If the OS running in the pod does not allow you to do that - during development - then that's severely limiting.
Talos is OS running in the host. You can run in the pod whatever you need.
Also you shouldn't really use pod OS to debug it. Kubernetes supports debug containers: you launch a separate container (presumably with convenient debug environment) and mounts selected container rootfs inside, so you can inspect it as needed. It also helps, when the target container does not work and you can't just exec into it.
There's a recommendation to remove everything from the container that's not necessary for running a given program, that reduces attack surface.
You can still exec yourself into the pods. No one said you cannot.
There is no shell or ssh on the hosts for you to login to, but still if you absolutely must you can create a privileged container and mount /. Whole point is you shouldn't.
> If it is not good for development it is not good for production because ideally your dev and production environment should be the same.
Correct: Your dev environment should also not let you do stuff on the host machines. In an k8s environment, you run everything in pods. Don't compromise on security and operational concerns just because it's a dev environment.
>
If you can't login to it then it is not good for development.
You develop inside pods, and you are more than welcome to install any shell and other programs you want inside containers. (Or for working at the k8s level it doesn't matter; you `kubectl apply` or run helm against the k8s API, it doesn't matter what's happening on the host.)
Some customers of an internal Kubernetes platform complain that their pods keep getting evicted because their nodes keep running out of disk space. The platform's maintainers' first instinct was that customers' pods should not write to ephemeral storage, e.g., should not write to log files on the filesystem unless mounted from external storage. But that turns out not to be the case: customers' pods did not write anything to disk at all. So why were nodes running out of disk space? Prometheus metrics show which partitions use how much disk space but cannot go into more detail. The team wanted to inspect the node's filesystem to figure out what exactly is using so much disk space. The first thing they tried is to run a management pod that contains standard tools such as 'df' and that mounts the host's filesystem. Unfortunately the act of scheduling a pod on that node causes it to experience disk pressure, and so the management pod gets evicted.
So, being dogmatic about "the host should not have any tools installed" is good and all, but how do you debug this scenario without tools on the host?
We eventually figured it out. By logging into the host OS and using the shell tools there.
> So, being dogmatic about "the host should not have any tools installed" is good and all
Less dogma, more the lived experience that letting people log into hosts ends badly. Though I grant there's a cost/benefit both ways and perhaps there could be edge cases.
> but how do you debug this scenario without tools on the host?
Cordon the node, evict any one pod to free up just enough room, and then schedule your debug pod with a toleration so it ignores the error condition? I confess I've never had to do this but it seems workable.
Actually, no, the host OS was Amazon BottleRocket, a specifically container-focused OS.
The cause was indeed images being too big. Images — not only the raw images, but also their extracted contents on the filesystem — count towards ephemeral storage too. In their case they can't even control the size of the images because those are supplied by a vendor.
The solution was to increase the node's disk space.
Interesting, I use Bottlerocket on my work clusters too. I think we had issues like this using some ridiculous data tool images that take up gigabytes, so we just upped the EBS size. Easily done.
It’s designed for running Kubernetes. You would log into the containers running on it if you need to, there’s no need to log into the underlying host. Managed Kubernetes clusters already work like this.