This is a lot of work for something that works out of the box with Podman. Of course, using Podman introduces its own idiosyncrasies, and as someone else noted, the benefit of the approach in the article is that all users share an image cache.
Source: I use Podman on a workstation where I SSH in as a bunch of different non-root users, and I've never had to think about it working.
I've used rootless podman for development for several months now.
I had a few issues in the beginning, but in the end the solutions were rather trivial. I had to:
- Delete config files from previous podman versions (pre 4)
- Enable the docker socket (for my user)
- Use docker compose 2 rather than "podman compose" or an older docker compose (shipped with the distro)
We mostly use docker-compose files for our dev setups, so I can't say if I'd run into issues with more elaborate setups. But I must say that it works extremely well for me.
It also lets you generate those manifests from existing containers or pods. No need to learn the compose spec, less friction dev and prod without stop-gap measures like Kompose
However, if your company has already highly invested into using Docker, the fact of something working OOTB in any other piece of software doesn't convince to make a decision to switch.
On top of that, our DevOps team is pretty happy with Docker :)
Anyway, surprised that so many comments wonder about the usefulness of docker rootless in a shared environment. It is my main approach for separation of concerns in my homelab. I always use docker rootless to share resources with many isolated apps. I wrote a blog post about how to host Mastodon with docker rootless [1].
Possibly, the web server was down due to a large number of requests. I have a lot of stuff on the server running in containers, but limited my httpd instance to 1 CPU core and 1 GB of RAM. Better upgrade it, I guess.
I still don't get why multi user. I haven't seen a multi user scenario that makes any sense for any Linux deploy in ages. Is this a shared prod server? Is it a shared Dev server? Why would those be multi user? For prod, why isnt it cattle where you don't ever SSH onto the server?
For Dev if you can't just run it all on your local machine why not do something like shipyard.build
Not docker specific, but there are still lots and lots of servers out there running multiple services. Especially in my home-lab! Each service runs under a service-specific user, sometimes with extra hardening applied by systemd. It's a tried-and-true method for gaining some semblance of security boundaries on a shared server without the additional administration overhead of kubernetes or similar.
A more relevant use case to industry might be a CI machine that you want to get better utilization out of. Easy, just start of multiple CI runners under different users. "Just use Kubernetes" I hear someone screaming? Well, sometimes you need CI on macOS.
I run a multi-user Linux server. It's IBM POWER9 hardware and several of my friends in the open-source community who care about testing on something other than x86, or who are interested in different architectures or playing with its unique capabilities such as quad-precision floating point implemented in the hardware appreciate having shell access to it.
Let's say you have some centralized monitoring of various host metrics. The ingestion process needs some amount of privileged access. But you don't want to give it full root. Meanwhile, your actual service probably needs no privileged access, save for some secrets which should also be inaccessible for the monitoring agent process. You may want to run both of these as containers.
I guess you may be using SELinux but even so, users and groups are natural parts of expressing and enforcing such constraints.
>> Why would those be multi user? For prod, why isnt it cattle where you don't ever SSH onto the server?
I hate the cattle not pets analogy, do you know how well taken care of most cattle are?
If your operating at google/fb/apple scale then yes you can have this approach. There are lots of business that DON't need, want or have to scale to this level. There are lots of systems where this approach breaks down.
Docker is great if you're deploying some sort of mess of software. Your average node/python/ruby setup with its run time and packages + custom code makes sense to get shoved into a container and deployed. Docker turns these sorts of deployments into App Store like experiences.
It doesn't mean is the only, or a good approach... if your dev env looks like your production env, if your deploying binaries then pushing out as a c-group or even an old school "user" is not only viable it has a lot less complexity.
As a bonus you can SSH in, you have all the tools available on a machine, you can tweak a box to add in perf/monitoring/debug/testing tools on there to get to the source of those "it only happens here" bugs...
> If your operating at google/fb/apple scale then yes you can have this approach. There are lots of business that DON't need, want or have to scale to this level. There are lots of systems where this approach breaks down.
I used to work at Yandex which is not Google but had hundreds of thousands of servers in runtime nevertheless. So definitely cattle.
Still the CTO of search repeatedly said things like "It's your production? Then ssh into a random instance, attach with gdb and look at the stacks. Is it busy doing what you think it should be doing?"
Dealing with cattle means spending a lot of time in the barn not in the books.
We have over a dozen of production servers. Both our DevOps engineers and developers have access to these servers in case something needs to be fixed or configured.
If it works as I understood, in this setup I can see an advantage at an architectural level: in Podman containers images are stored on a per-user basis, while in this setup they would be shared between users, thus using much less disk space (if using the same base images). Besides this, I actually have the same question.
I think OP is referring to the "unprivileged user namespaces" [1] feature of Linux, which caused numerous security incidents in the past. AFAIK, this is mainly because with this feature enabled, unprivilged users can create environments/namespaces which allow them to exploit kernel bugs much more easily. Most of them revolve around broken permission checks (read: root inside container but not outside, yet feature X falsely checks for the permissions _inside_). [2] has a nice list of CVEs caused by unprivileged user namespaces.
Given that rootful docker e.g. is also prone to causing security issues, it's ultimately an attacker model / pick-your-poison situation though.
Ok, but here the OP is doing something a bit different than just rootless Docker, which is to use a "centralised" rootless Docker running as a single, non-privileged user... or am I missing something?
We've used it in a single-user (a docker user) and multi-user (user for each dev) environment.
Most, if not all, containers work fine, there are some, like mailcow which don't work well with it.
If you have multiple IPs on the one machine, there is a longstanding bug that means you can't bind the same port on different IPs. Eg IP1:80 and IP2:80. The workaround for this is separate rootless docker users + runtime for each container that shares ports, nasty.
In a multi-user environment we simply setup rootless docker under each devs user, so they have their own runtime and their own containers isolated from other devs. This works really well.
Related: If you are into this kind of thing and the extra fun that is GPUs + pydata, we have a 1mo or 2mo project around adding rootless to our GPU graph AI containers & packer flow. Ping build@graphistry.com .
Niche but a project we have been wanting for awhile. Base containers get OSS'd etc. This stuff is twisty!
What's the difference between this, and adding users to the docker group? As long as the docker daemon is running, you should be able to spin up containers without needing root.
Your claims here are inaccurate. You can pass flags or define environment variables to get the behavior you want. Please spend some more time hitting the man pages and the guide.
> It indeed does not enforce (or even permit) robust isolation between the containers and the host, leaving large portions exposed. … More in detail, directories as the /home folder, /tmp, /proc, /sys, and /dev are all shared with the host, environment variables are exported as they are set on host, the PID namespace is not created from scratch, and the network and sockets are as well shared with the host. Moreover, Singularity maps the user outside the container as the same user inside it, meaning that every time a container is run the user UID (and name) can change inside it, making it very hard to handle permissions.
I actually went into every single line of the manuals and even discussed the matter on the official Singularity Slack.
In that blog post I wrote that it does not enforce. It is true that you can achieve some level of isolation by setting certain flags and environment variables explicitly, but this is (was?) quite hard to get working, moreover the user mapping inside the container is always host-dependant and there is just no network isolation.
To achieve something close to the behaviour "I wanted", I had to use a combination of the command line flags you mentioned (and in particular -cleanenv, -containall and -pid) together with custom-made, ad-hoc runtime sandboxing for directories which required write access (as /tmp and /home).
However, this is not the default behaviour and it is not how Singularity is used in practice by its users. But yes, I was able to achieve something close to the behaviour I wanted [1].
This said, if I am missing something, or if the project has evolved to allow for a better level of isolation by default, please let me know. That blog post is dated 2022 after all.
I agree to a certain level. However, it's hard to ensure dependencies to work in the right way without isolation. These two support tickets are a showcase of the essence of the problem: "Same container, different results" [1] and "python3 script fails in singularity container on one machine, but works in same container on another" [2]. In my experience with Singularity, there were many issues like these.
I am not sure why they had to call it a "containerization" solution. It gets a bit philosophical, but IMO containers are meant to "contain", not to just package. To me, Singularity is more a "virtual environment on steroids", and it works great in that sense. But it doesn't "contain".
The hard truth is that Singularity was designed more to address a cultural problem in the HPC space (adoption friction and push back of new, "foreign" technologies) rather than to engineer a proper solution the the dependency hell problem.
HPC clusters still use Linux users and shell access, meaning that it is up to the user to run the container: there is just no container orchestration. This means that the user has to issue a command like "singularity run" or "docker run". And since not long time ago, to let users do a "docker run" it meant to have them part of the docker group, which is a near-root access group. Just not doable.
Singularity also works more or less out of the box with MPI in order to run parallel workloads, either locally on multi-nodes. However, this has a huge price as it relies on doing an "mpi singularity run", and it requires to have the same MPI version inside and outside the container. To me, this is this is more a hacky shortcut than a reliable solution.
I believe that the definitive solution in the HPC word will be to let HPC queuing systems to run and orchestrate containers on behalf of the users (including to run MPI workloads), thus allowing to make use of any container engine or runtime, including Docker. I did some trials and it works well, almost completely solving the dependency hell problem and greatly improving scientific reproducibility. A solution like the one presented in the OP contributes in the discussion towards this goal, and I personally welcome it.
With respect to Singularity, I think they just had to name the project "singularity environments" rather than "singularity containers" and everything would have been much more clear.
At the risk of showcasing my bubble: Lack of exposure; how many people even know what singularity is or how to use it? I know it's used in scientific HPC, but I don't see evidence of wider adoption.
I think it is a few things together. The rootless and daemonless design leads to UX differences with Docker. Because of the differences, it isn't just a drop in replacement; porting applications can be a pain if they do anything weird (no network isolation, --containall isn't default and still is a bit different when on, etc). And Docker has a ton of momentum and usage.
Source: I use Podman on a workstation where I SSH in as a bunch of different non-root users, and I've never had to think about it working.