If you can't login to it then it is not good for development. If it is not good ...

birdiesanders · 2024-07-14T04:22:03 1720930923

You shouldn’t be working on the node OS for k8s dev work.

breadwinner · 2024-07-14T14:34:23 1720967663

I said ideally your dev and production environment should be the same. That should have been a hint I was talking about container.

breadwinner · 2024-07-14T14:32:18 1720967538

You assume I was talking about the node OS, but I was talking about the container.

cassianoleal · 2024-07-14T09:29:33 1720949373

I’ve been writing and managing development platforms on Kubernetes for 6-7 years.

In this time, I remember having to SSH into a host node exactly once. This was me, the platform engineer - not an application developer. Even then, having is a strong word. I could have just as well done with a privileged container with host access.

Application developers have nothing to do on the host. As in, they gain nothing from it, and could potentially make everything worse for themselves and the other applications and teams on the platform.

breadwinner · 2024-07-14T14:30:11 1720967411

I was talking about logging into the container, not the node. Is Talos an OS for the node or the container?

cassianoleal · 2024-07-14T18:26:45 1720981605

For the nodes. For the containers there's things like Google's distroless which are great for reducing attack surface as well.

breadwinner · 2024-07-14T20:14:26 1720988066

Their website only says it is "for Kubernetes", that could mean containers, nodes or both.

cassianoleal · 2024-07-14T20:58:15 1720990695

> Supports cloud platforms, bare metal, and virtualization platforms

> Production ready: supports some of the largest Kubernetes clusters in the world

> It only takes 3 minutes to launch a Talos cluster on your laptop inside Docker.

> delivers current stable Kubernetes

Whilst you're not wrong, and the website could be clearer, there are plenty of clues.

hadlock · 2024-07-14T05:18:49 1720934329

You should not be logging in to production unless something has gone seriously wrong. I've not seen a company where developers (minus a handful of "blessed" staff) even know how to access prod, let alone log in.

breadwinner · 2024-07-14T06:07:51 1720937271

During development you will need to login to the pod, to review settings, directory contents and so on. If the OS running in the pod does not allow you to do that - during development - then that's severely limiting.

vbezhenar · 2024-07-14T07:04:18 1720940658

Talos is OS running in the host. You can run in the pod whatever you need.

Also you shouldn't really use pod OS to debug it. Kubernetes supports debug containers: you launch a separate container (presumably with convenient debug environment) and mounts selected container rootfs inside, so you can inspect it as needed. It also helps, when the target container does not work and you can't just exec into it.

There's a recommendation to remove everything from the container that's not necessary for running a given program, that reduces attack surface.

gotbeans · 2024-07-14T08:21:03 1720945263

You can still exec yourself into the pods. No one said you cannot. There is no shell or ssh on the hosts for you to login to, but still if you absolutely must you can create a privileged container and mount /. Whole point is you shouldn't.

yjftsjthsd-h · 2024-07-14T05:26:30 1720934790

> If it is not good for development it is not good for production because ideally your dev and production environment should be the same.

Correct: Your dev environment should also not let you do stuff on the host machines. In an k8s environment, you run everything in pods. Don't compromise on security and operational concerns just because it's a dev environment.

> If you can't login to it then it is not good for development.

You develop inside pods, and you are more than welcome to install any shell and other programs you want inside containers. (Or for working at the k8s level it doesn't matter; you `kubectl apply` or run helm against the k8s API, it doesn't matter what's happening on the host.)

FooBarWidget · 2024-07-14T06:17:14 1720937834

Some customers of an internal Kubernetes platform complain that their pods keep getting evicted because their nodes keep running out of disk space. The platform's maintainers' first instinct was that customers' pods should not write to ephemeral storage, e.g., should not write to log files on the filesystem unless mounted from external storage. But that turns out not to be the case: customers' pods did not write anything to disk at all. So why were nodes running out of disk space? Prometheus metrics show which partitions use how much disk space but cannot go into more detail. The team wanted to inspect the node's filesystem to figure out what exactly is using so much disk space. The first thing they tried is to run a management pod that contains standard tools such as 'df' and that mounts the host's filesystem. Unfortunately the act of scheduling a pod on that node causes it to experience disk pressure, and so the management pod gets evicted.

So, being dogmatic about "the host should not have any tools installed" is good and all, but how do you debug this scenario without tools on the host?

We eventually figured it out. By logging into the host OS and using the shell tools there.

yjftsjthsd-h · 2024-07-15T04:03:13 1721016193

> So, being dogmatic about "the host should not have any tools installed" is good and all

Less dogma, more the lived experience that letting people log into hosts ends badly. Though I grant there's a cost/benefit both ways and perhaps there could be edge cases.

> but how do you debug this scenario without tools on the host?

Cordon the node, evict any one pod to free up just enough room, and then schedule your debug pod with a toleration so it ignores the error condition? I confess I've never had to do this but it seems workable.

sierra1011 · 2024-07-14T06:32:25 1720938745

Sounds a lot like the host OS is generic instead of container-focused, which could be part of the problem.

What was the cause/solution? Images too big?

FooBarWidget · 2024-07-14T16:43:12 1720975392

Actually, no, the host OS was Amazon BottleRocket, a specifically container-focused OS.

The cause was indeed images being too big. Images — not only the raw images, but also their extracted contents on the filesystem — count towards ephemeral storage too. In their case they can't even control the size of the images because those are supplied by a vendor.

The solution was to increase the node's disk space.

sierra1011 · 2024-07-14T18:34:34 1720982074

Interesting, I use Bottlerocket on my work clusters too. I think we had issues like this using some ridiculous data tool images that take up gigabytes, so we just upped the EBS size. Easily done.

breadwinner · 2024-07-14T06:08:47 1720937327

Is Talos for running inside pods, or running on the node. It is not immediately clear from the website.

sierra1011 · 2024-07-14T06:21:20 1720938080

"Talos: Secure, immutable, and minimal Linux OS for running Kubernetes"

I guess we'll never know...

But in truth it's for running on hosts.

antonvs · 2024-07-14T09:26:54 1720949214

It’s designed for running Kubernetes. You would log into the containers running on it if you need to, there’s no need to log into the underlying host. Managed Kubernetes clusters already work like this.

vbezhenar · 2024-07-14T07:06:40 1720940800

It's trivial to mount host root fs into container, so I don't see any issues "logging in" to it, as long as it works.

agilob · 2024-07-14T12:14:22 1720959262

We're long past the times when you would ssh into prod as root, run `nano /var/www/index.php` to fix a bug.

rstat1 · 2024-07-14T05:16:04 1720934164

k8s is k8s, it doesn't really matter much what the host OS is.