Actually, no, the host OS was Amazon BottleRocket, a specifically container-focused OS.
The cause was indeed images being too big. Images — not only the raw images, but also their extracted contents on the filesystem — count towards ephemeral storage too. In their case they can't even control the size of the images because those are supplied by a vendor.
The solution was to increase the node's disk space.
Interesting, I use Bottlerocket on my work clusters too. I think we had issues like this using some ridiculous data tool images that take up gigabytes, so we just upped the EBS size. Easily done.
The cause was indeed images being too big. Images — not only the raw images, but also their extracted contents on the filesystem — count towards ephemeral storage too. In their case they can't even control the size of the images because those are supplied by a vendor.
The solution was to increase the node's disk space.