It’s astonishing to me that people would argue that docker is some how a simpler solution. Some would happily maintain a container registry, constantly worry about the order of operations in your docker file, create a build pipeline in a format incompatible with non docker use cases, spend time understanding how dockers bridge interacts with multiple Network controllers over learning how your operating system works.
There's docker and there's docker. Docker as a local container builder/runner is simple. Docker as a deployment / networking / scheduling service is not. Not everyone is using Docker for more than "run this script in an environment I want".
The issue with curl|bash is not just the security (you can run curl|bash as an unprivileged user).
It's also the fact that it generally doesn't do anything to integrate well with your system, and just pull all its dependencies in a folder and never update it afterward.
I'm astonished too. The section on use cases doesn't help to justify why we need all this complexity. Maybe they are trying to compensate for the inefficiency of the Unix permission system ( having user, group and world is lacking in a lot of situations by today standards ).
The other solution is to implement a totally different mechanism ( ACL + some other thing, Capabilities ) but then you lose retrocompatibility.
> having user, group and world is lacking in a lot of situations by today standards
A process has supplemental group membership. If you were only limited to the above, then, yes, you would need to share resources through world or utilize a special broker if you wanted multiple privilege domains for a service.
But with supplemental groups you can be as fine-grained as you want. Just create a group for each domain with a unique membership set and make that a supplemental group for each user with access to that domain. When you want to share a file, just chown and chmod appropriately.
To see how this works in practice, look at BSDAuth (OpenBSD's alternative to PAM). Rather than using dynamically loadable modules for authentication methods--which effectively means that any program that wants to use authentication services needs root permissions--BSDAuth uses setuid and setgid helpers in /usr/libexec/auth.
$ ls -lhd /usr/libexec/auth/
drwxr-x--- 2 root auth 512B Mar 24 13:23 /usr/libexec/auth/
$ doas ls -lh /usr/libexec/auth/
total 372
-r-xr-sr-x 4 root _token 21.3K Mar 24 13:23 login_activ
-r-sr-xr-x 1 root auth 9.0K Mar 24 13:23 login_chpass
-r-xr-sr-x 4 root _token 21.3K Mar 24 13:23 login_crypto
-r-sr-xr-x 1 root auth 17.2K Mar 24 13:23 login_lchpass
-r-sr-xr-x 1 root auth 9.0K Mar 24 13:23 login_passwd
-r-xr-sr-x 1 root _radius 17.1K Mar 24 13:23 login_radius
-r-xr-xr-x 1 root auth 9.0K Mar 24 13:23 login_reject
-r-xr-sr-x 1 root auth 9.0K Mar 24 13:23 login_skey
-r-xr-sr-x 4 root _token 21.3K Mar 24 13:23 login_snk
-r-xr-sr-x 4 root _token 21.3K Mar 24 13:23 login_token
-r-xr-sr-x 1 root auth 21.0K Mar 24 13:23 login_yubikey
In this scheme, for any user who you want to grant permission to test basic system authentication you add the auth group to their supplemental group. Et voila, they can now use the framework, but without root permissions. For mechanisms that you may want to grant separately you use a separate group (e.g. _radius).
Like many bad ideas, Docker basically won because you can use it to isolate and package pre-existing, broken software. And because it's easier to find examples of how to [abuse] Docker for sharing files across security domains than on how to use supplemental groups to do this.
I just recalled an interesting difference between BSD and SysV (including Linux[1]) regarding group ownership.
On BSD when you create a file the group owner is set to the group owner of the directory. From the BSD open(2) manual page:
When a new file is created it is given the group of the
directory which contains it.
But on Linux the [default] behavior is to set the group owner to the creating process' effective group. From the Linux open(2) man page:
The group ownership (group ID) is set either to the
effective group ID of the process or to the group ID of the
parent directory (depending on file system type and
mount options, and the mode of the parent directory, see
the mount options bsdgroups and sysvgroups described in
mount(8)).
The BSD behavior is more convenient when it comes to using shared supplemental groups because you can often invoke existing programs with a umask of 0002 or 0007 and files they create (without subsequently modifying permissions) will usually have the desirable permissions--specifically, writable by any process which has a supplemental group of the directory they're located in. For example, with BSD semantics Git technically wouldn't have needed a special core.sharedRepository setting. But because of SysV semantics on Linux Git had to be modified to explicitly read the group of the parent directory and explicitly change the group owner of newly created files.
To me one of the most valuable parts of docker is almost never mentioned. The fact that a Dockerfile exactly describes the steps necessary to get from a fresh OS image to a working instance of whatever you're containerizing is one of the main selling points.
To me that's what other technologies like rkt seem to miss or at least choose not to focus on.
They describe how to do it in an impure non-reproducible way.
They're not much different from nix expressions, or brew formulas or such.
Unlike nix expressions (or rpm specfiles or various other technologies), the dockerfile format never concerned itself with clearly defining the hashes of inputs and dependencies, reproducibility, working without network access on build farms, etc etc,
Sure, it ended up being popular and usable, but at the cost of taking the art of describing how software is installed back in time.
It's not a coincidence that most Docker containers are built using Debian- or Red Hat-derived images. It's because they have proper packaging systems which are consistently applied, which is what allows them to build and maintain 10s of thousands of projects in a reusable format. The intersection of "correct" and "usable" substantially overlap; certainly much more so than "incorrect" and "usable".
> Anecdotally, I've tried to learn Nix for several years.
Out of interest, what did you find difficult? I ask because I spent last weekend setting up NixOS, and the only things I found difficult were discoverability and documentation (I spent a lot of time just reading through the source code).
Edit: Thinking about it it's very possible that this is just a case of me not knowing what I don't know, since I'm not currently trying to do anything particularly complex.
Not sure if this is exactly what you meant by discoverability, but this page helps with at least seeing what configuration options are available: https://nixos.org/nixos/options.html# -- no other distro has anything similar as far as I know.
The language is not too bad, but how it should be used is the missing part. Like a cookbook or a properly populated Stack Overflow site.
When I was on Linux, I liked to distro hop few times a year. Whenever I got back to NixOS, I felt that I had to relearn everything, and that there was nothing to grasp on to. I always take that as a bad smell.
Oh yeah. Systemd and containers. Two technologies certain militant technical users love to complaint about :-)
Do we really need a non-OCi image format and tool chain here?! I get immutability benefits, district benefits, and isolation. having a completely separate toolchain makes no sense to me...
Containers will be in this role, but fragmentation will only make that process slower and more painful.
I for one salute our future k8s kubelet overlords.
> Do we really need a non-OCi image format and tool chain here?
This doesn't use a new competing image format, but rather uses disk images or tarballs, which have existed longer than OCI has.
The toolchain also predates OCI since the underlying technology is really basically nspawn (and various other systemd.service options), which have been a part of systemd since before OCI stabilized. This is simply a new coat of paint on something that has been there for a while.
> Containers will be in this role, but fragmentation will only make that process slower and more painful.
Sure, so we shouldn't have let docker do anything and instead used LXC, since LXC already did containers, already had its own format and toolchain, etc.
That's the exact same argument you're making now.
OCI is also not really a good standard. It has exactly one usable implementation (and only in Go, not even C where it's easy to link against and use it from other languages), and since systemd has already defined various things for services (such as seccomp filters etc) which overlap with what OCI does, it makes almost no sense for systemd to begin using it.
I actually think more people should have adopted LXC. The engineering behind LXC is beyond remarkable and it's a shame that it got such a bad wrap in the early days of Docker. I've worked with some of the LXC folks and they're all incredibly sharp and have a much wider breadth of work than just LXC.
With regards to image formats, I actually think that we do need to improve the image format significantly and not just stick to tried and tested stuff. Tarballs are (quite frankly) simply the most awful format to use for container images (disk images are a close second). It's a shame that everyone uses them. I am working on a blog post to better explain what I mean by this.
And yeah, runc being written in Go is probably one of the most frustrating things about working on it. I often say that one of the worst things you could decide to write in Go is a container runtime -- it's a little bit odd that people keep doing exactly that. runc actually isn't entirely written in Go -- it has to have a fairly substantial amount of C because the Go runtime cannot play well with the delicate dance required to set up a container properly.
All of that being said, containers really did need to be standardised otherwise the container wars wouldn't have ended as nicely as they did. But that's not to say that the OCI doesn't have issues. It definitely does, but I'm hoping they can be fixed over time.
[ I am one of the maintainers of the "exactly one usable" OCI implementation, and have been working on OCI stuff since its inception. ]
Something to note about LXC developers: they don't just provide a wide array of usable interfaces to containers (daemon with a RESTful API, command line tools, and a library with many language bindings). They also are some of the most active contributors to the underlying container technology in the kernel, including namespaces and cgroups.
> They also are some of the most active contributors to the underlying container technology in the kernel, including namespaces and cgroups.
This is what I was referring to when talking about their breadth of work. I've collaborated with them quite a bit, and it's always amazing working with them on a hard problem.
Trying to treat people who does not agree with the broken software and systems engineering practices like a child is obviously won't help. Systemd and docker has legitimately bad engineering practices.
Yet another tool for containerization technology :-)
While this is interesting, the containers that I'd appreciate, are: multiple versions of programming languages & packages to be installed by each user without affecting the system globally. Every language has their own separate answer (virtualenv/venv, rbenv, node version manager etc).
Just today I struggled with "brew install python3" which complained with "Error: python 2.7.14_2 is already installed" and the only option offered was to upgrade. Whereas what I wanted was parallel installations of python 2.7 and python3. :-/
That's exactly what Nix does! Nix is wonderful tech that unfortunately seems doomed to low adoption due to poor usability. My belief is that this stems the fact that maintaining the package tree is obscenely costly of contributor time, leaving almost no time or energy to make progress on "niceties" like improving the core CLI experience and the continuous integration system.
The Nix team (in particular grahamc with https://github.com/NixOS/ofborg) is working hard on making CI better . But it's true that the on-boarding experience and CLI could be much improved.
It's quite frustrating to me, because after I've been using Nix & NixOS for about 3 years now, there's no way I'd go back to the insanity out there. I never had the amount of stability, control, and predictability on any other platform, from configuring my OS, editors, WM, shell, random ricing, or building VMs, Servers, Docker containers, production deploys, development environments, there's really not much Nix can't handle.
But getting familiar with Nix took some serious effort I'm sure not many people can afford. I begin to understand how frustrated people in the Lisp and Smalltalk communities must feel while people slowly reinvent everything they've enjoyed for decades (and Nix is still in its teens).
Teaching people about the value of having just a single package manager, no matter what OS or language they're using, is mostly futile because there's so much value attributed to being "mainstream" that now we have hundreds of mainstream package managers with varying degrees of sophistication, security, predictability, and ability to _not_ try and take over your system.
I use it to build and develop Go, Ruby, JS, Crystal, Elm, Mint, Haskell, Perl, Bash, VimL, Elisp, Guile, and whatever else comes my way.
All I have to do to get a working and isolated dev env is go into that directory and let nix-shell do the rest. On NixOS you also get nixos-container with all its benefits, so I can test whole networks or just spin up some DBs, all behaving exactly the same they'll be once deployed by simply reusing that configuration.
Of course it also manages Systemd, so I guess we'll just write yet another function to slap a checksum on those containers and control them as well, it's just a shame that humanity loses so much time reinventing the wheel every few months.
On the other hand, there's still a lot of opportunities for start-ups in that space, like https://nixcloud.io, https://www.tweag.io (they just hired Eelco Dolstra so he can work full-time on the Nix core), https://www.packet.net (they sponsor almost all our new CI infrastructure), or https://vpsfree.org/ (first one to offer their own NixOS based distro specifically made for VPSs), which are all doing well.
i spent a lot of time trying to make nix the 'standard build environment' for an open source project. it was going really really well, even on top of standard distros like ubunutu, redhat, etc.
the problem is OpenGL and Nvidia. NiX doesnt want to deal with NVidia's closed source drivers, and i can't blame them, i got so depressed working on the issue, that i gave up looking into it. there is no good solution. there are only kludges and kludges of kludges.
I once started trying to learn about Nix. All the up-to-date documentation I could find, however, was about NixOS.
I would like to use Nix just to manage my local development environment on Ubuntu, not to replace my entire operating system. Is that something the developers are still interested in? Is there documentation?
Have a look here: https://nixos.org/nix/manual/
There's a dropdown in the top-left corner to switch between nix-related projects. (NixOS, NixOps, etc.)
There shouldn't be need for containers to use different versions of programming toolkits. I'm Java developer and in our world it's extremely easy: JDK is just a directory with bunch of files. I can have (and I actually have) around 10 versions of JDKs, from Java 1.1 to Java 10. There's no need to install anything, just extract the archive. To use a particular Java usually JAVA_HOME environment variable is enough. That works for every operating system from Linux to Windows and macOS identically. That's why I don't ever use distro Java, it's a mess, just download Oracle JDK and start work.
I use asdf [0] for this. It lets you install and run different versions of any language. I keep a sensible global default if I'm just experimenting with stuff, but for any serious project I'd create a .tool-versions file in its root to specify the exact language version I'm developing and deploying against.
> multiple versions of programming languages & packages to be installed by each user without affecting the system
What language runtime do you have in mind? I haven't seen one that doesn't support this. A normal user should not be allowed to affect the system, so installing language runtimes under your home directory is normal and expected.
Most languages use an environment variable to point to the packages being used, so it should be possible to have multiple versions installed under your home. This is how the language developers work. They need to have many development versions installed for testing.
Containers are useful for a lot of things, but a much more complicated set up than just installing a couple of language runtimes. You should have no problems doing that the normal way.
Then, there is something wrong with either brew or MacOS or both because we can install both Python 2 and 3 in parallel on any Linux with the system package manager.
Not quite utter insanity - isn’t it quite weird that every programming language has its own tool for installing multiple different versions of itself? nvm, pyenv, rbenv, rustup, etc.
Maybe it’s nice to think of a common solution to this!
This is cool. I'm betting hard on k8s at $dayjob, but I often think of how much less I could be happy with. A simple daemon that used some gossip/broadcast protocol to work out a list of systemd service states would be juuust enough.
Have you checked out https://www.habitat.sh/? It has a gossip-based discovery/coordination layer. I've been meaning to give it a try --- I too have Docker fatigue.
I just spent a few days trying to navigate all this. Perhaps some one here could enlighten me.
If I just want to download some semi-trusted source code. And Configure; make; run; it in an environment where it can’t access anything unless I explicitly white list it (in particular, it should not be allowed to access the internet or any network resources, but I should be able to send http requests to it)
Which variant of all these container stuff would make things simplest to set up?
Docker seems way overkill (and seems to have the wrong defaults for this need anyways) but handcrafting things with iptables or ip netns seems a bit to low level. (And, well, not “contained”... I would prefer something declarative with automatic setup/teardown)
Ubuntu snaps looks like an interesting middle ground. But it also looks completely dead.
Any tips?
Edit: should probably rtfa before asking. it actually looks like a good fit ;)
I think the systemd containerization features are worthwhile to explore. Both portable services and nspawn. They will be around on most Linux systems "out of the box" which is a big benefit. It will be like bash, maybe not the best shell, but the one that is around and that you can count on.
Not a container, but IF you manage to get Qubes running, it's really easy to create temp VM off fedora or debian that cannot access your other stuffs. You can even swap distro from under your homedir later.
I don't know. But if it does the service would need to reap zombies or systemd would need to spawn yet another instance of itself inside the container (like Rocket does).
I’ll take s8 and some sort of log-structured FS for local buffering of structured messages logged to stderr, TYVM. No pid files, no lock files, no log files to rotate awkwardly and no unstructured log files.
PS: log events shouldn’t be perceived of as lines of text but structured data messages, so that they don’t need special parsing.
There's limited grounds for this to not be a disingenuous argument. Systemd is not well described as an init system. It is a service management system, which handles transitions in system state, including (but not limited to) init and shutdown. The reason why systemd is considered a large change is because sysvinit had fundamental defects in its ability to manage processes. Cgroups were as I recall only one of several attempts to introduce improvements to the Linux kernel features, following the example of other Unixes. It's not quite correct to say that systemd was designed without regard to backwards compatibility, but it is intended to be a relatively clean break. One might say that systemd is a modern "init" system with some backwards-compatible elements, and OpenRC is more of a backwards-compatible system with some modern features.
I've generally had a positive impression of the systemd developers' ability to design systems, and to communicate those designs. In most cases, I do not disagree with the choices they have made. There have been a few errors, and there is a great deal of bad blood. I think both that the software is generally good, and that there's nothing wrong with informed criticism.
"Because the Portable Service concepts introduces zero new metadata and just builds on existing security and resource bundling features of systemd it's implemented in a set of distinct tools, relatively disconnected from the rest of systemd. "
clearly, this is a messy situation. lots of different pieces doing different things in different ways. in order to harmonize this, and simplify it, I believe we need to have systemdd, a daemon for systemd, so that all of it's various pieces can be in one, neat, centralized location.
I can't speak for the GP, but I believe their comment is parody. Keeping tools decoupled is, of course, very good -- systemd is the antithesis of that norm.
That is no problem. We had our comments scattered out into separate pieces, each doing their own thing in a non-standard comment idiom. In order to simplify and organize things better, I have created commentd which will combine them all into one comment. To save valuable disk space I have also compressed them and put them in binary format.
begin 644 -
M0F%S:6PZ($QI<W1E;BP@9&]N)W0@;65N=&EO;B!T:&4@=V%R(2!)(&UE;G1I
M;VYE9"!I="!O;F-E+"!B=70@22!T:&EN:R!)(&=O="!A=V%Y('=I=&@@:70@
)86QR:6=H="X*
`
end
And init systems are, unfortunately, not a common feature of containers so far. I really wouldn't mind seeing both of them in the same project.
Both init systems and container systems are concerned with dependencies, logging, and managing the lifetime of processes, and systemd is way better at the last two than Docker, so I'm optimistic.
QR code generator is not a common feature of an init system, yet systemd includes one. Modern attempt at OS packaging (a.k.a. containerization) is not a weird thing in such a neighbourhood at all.
The systemd init system doesn't have a qrcode generator. This is typical anti-systemd FUD of trying to conflate the systemd init process and the other processes available as part of the systemd project.
The separate journald log server available as part of the systemd project can generate qrcodes. journald can cryptographically sign logs so that future log tampering can be detected. This requires two keys, a sealing key and a verification key. The verification key must be stored off server. When the keys are generated systemd can display a qrcode to allow easy recording of the verification key.
I mostly like systemd, but judging the project as an init system with a bunch of other functionality tacked on is completely fair. The init daemon requires the logging daemon and vice versa, so they aren't really separate.
Ah yes, this totally separate journald that you can replace with something
else, except that you cannot, you can only run it in forwarding mode beside
other logging daemon. Very modular. Totally not a part of an init system.
In all honesty I've never yet administered a system where the addition of systemd solved a problem I actually had. I would gladly go back to Upstart (or hell, even sysv) in a heartbeat.
Put another way, as a couple of anecdotes at $day_job, the boot time benefits (the one thing people always seem to bring up in defense of this ever-growing behemoth) were eaten up months ago in time troubleshooting, transitioning, and working around its corner cases.
More humorously, the words "fucking" and "goddamn" used to be followed by the name of some internal program known for being clunky. After getting up to Ubuntu 16 and Cent 7 in the whole environment, those words tend to be followed by "systemd".
Counter-anecdote - since switching to operating systems using systemd, I haven’t had a single bug or issue with the init system or writing startup scripts, because finally they use incredibly obvious files that I can write myself.
For me it’s basically magic that I can now write a simple declarative file that describes how to run an application, and from there it works with all of the expected features.
Yes. However, systemd did this part certainly right --- the service files and their dependencies are much easier to understand compared to e.g. its counterpart in upstart.
I have found that systemd solves many problems. I couldn't care less about boot times, especially on servers. The service startup with dependencies is very nice and easy compared to sysv. You write like 10 lines and it just works. You get status output of the service without having to grep through the log file. Sometimes you don't need a logfile, which makes this even better, as you don't need to use screen/tmux workarounds. You see if a service is running or has exited.
Same for timers. Debugging not running cronjobs is a pain in comparison.
Writing sysv init scripts is so fucking shit, people started to dump everything in /etc/rc.local, especially for earlier RPi versions of Debian.
I can't fathom why it's so popular for Java programs to ship a slew of poorly-tested shell scripts that read environment variables and config files, and transform these into -D arguments to the java command. Why on earth don't they just o that with Java!?
Indeed, I have referred to your web site many times over the years while unpicking various horrific daemon control shell scripts in order to run them under sane init systems... :)
We used Upstart until just a couple of years ago, and it was still losing service processes regularly (as in, you start a service, then you stop it, but the process keeps running, despite Upstart telling you it's stopped).
I don't know if systemd is better (I've never managed a server fleet with it), but Upstart certainly wasn't good enough.
It seems some people on HN love to hate sytemd. Even though they can't come up with an alternative that's better.
I hated systemd before it was cool to hate systemd. I never had a problem with sysvinit as the root casuse, but I had many problems with systemd as the root cause.
Hint: both sysvinit and BSD init are better. They are more mature, have fewer moving parts, are easier to debug without specialized tools, and are better understood. Plus the decoupled nature of init systems reap bonuses for system stability (unlike systemd).
Having a QR code parser as part of something that's tangentially related to the init system in concept but deeply embedded into systemd in practice is exactly what the post you were replying to is complainig about. It's unnecessarily complex.
Not sure what else you're doing with a QR library, except maybe building tetris (which seems like a bad idea for something purported to be an init system).
Sure it is. An init system should, first and foremost, be stable and well understood. By virtue of its size and youth, systemd is not (at least not by devs and end users).
But all those words are meaningless. "stable" and "well understood" could also be said of systemd. There is no metric to it. By that logic we could go back to using software from before 20 years, because they are "stable".
I have noticed that systemd is "stable" and "well understood" because I have no problems with it. But would that convince someone else?
To be honest, @inferiorhuman keeps providing you with well supported reasoning for his position, whilst your only contribution appears to be "that doesn't meet my personal standards", which is devoid of any meaningful content for the rest of us. I am a committed systemd argument aficionado, and would really like to see a more supported set of arguments from your side. So far, @inferiorhuman is winning this debate.
It's typical in a systemd discussion that a link dump to broken stuff appears. Weather or not these are indeed broken - that is not explained. FWIW at least there are bug reports. The conclusion that because of those reports sysvinit is better, is false. It could just be that nobody cares about sysvinit anymore, so fewer bug reports happen.
It's typical in a systemd discussion that a link dump to broken stuff appears
Please help me understand this line of reasoning. Someone has a list of reasons as to why they believe systemd is not a good fit for them, and something else works better. They get told "prove it isn't a good fit for you". They prove this with some line of argumentation. They get told "that is your opinion only, and that doesn't count". They provide links to show many other users have the same issues, and they feel these issues have not been given due consideration. They get told "typical systemd haters, they always provide a link dump with broken stuff, it doesn't mean anything"
This has pretty much become the default template for the systemd pro/con arguments, and is unwinnable. FWIW, my problem with systemd is that it broke the concept of free choice in my environment. It was harshly shoved down everyone's throat, through politics rather then the usual meritocratic methods. I appreciate that this was a commercially advantageous position for the various distro maintainers, which lead me to cancel all my (thousands) of distro support agreements.
It is the political approach that was taken that gives rise to serious suspicions for me. If the system cannot stand on its' own feet from the meritocratic perspective, and needs to resort to all kinds of political games to gain a substantive foothold, my default position is not one of trust.
The "la la la i-can't-hear-you" systemd fanboy troupe response whenever someone attempts to make a well-supported argument against systemd only fuels my distrust of the whole systemd story.
As I mentioned, I vote with my feet and wallet. I use Alpine wherever possible.
> It could just be that nobody cares about sysvinit anymore, so fewer bug reports happen.
Waving your hands and suggesting everything is going to have bugs isn't much of an argument. It's unlikely that sysvinit would suffer from any of these bugs as there's simply a much smaller vector of attack than systemd by design. You could knock me over with a feather if arbitrary network traffic could compromise your system via sysvinit. The other class of bugs are ones where sysvinit is the defined behavior and systemd is simply deviating in unexpected, unintuitive, and undocumented ways.
What would have been ideal is for a docker-compatible systemd to run inside the container.
IMHO the maintainers are bent on creating a competing standard (like this one) and don't want to build anything that brings the advantages of systemd to the Docker ecosystem.
Poettering writes:
> To say this explicitly: we are really interested in making sure that systemd runs out-of-the-box in containers, and docker is just one implementation of that.
3. Running systemd as pid1 in docker (privileged or not) works, so clearly the maintainers didn't do a good job of preventing it from happening. (just google, there's plenty of info out there on how to do this)
I'm sure there are many other examples too, but this is with a quick glance around.
I'd be interested in your reference before you spread such strange FUD.
I would recommend speaking to someone from the LXC team. I only know of their systemd pains second-hand, but there are many many many things that systemd has done throughout its history that made it a royal pain to run in a container. I also have my own battle-scars from systemd but they are mostly related to systemd on the host rather than inside a container.
Here's one piece of evidence though[1]. Effectively systemd enabled a security feature for the host's /dev/console and when we said it wasn't necessary for containers (because /dev/console is not a real console) and it actually broke runc, Lennart said that we should fix runc.
That discussion does not match your summary of it.
This wasn't a security feature that got enabled. This was an expectation of the long-standing semantics of /dev/console, which as M. Poettering pointed out long pre-date systemd, being broken by a container manager.
So yes, it's right that the container manager be fixed so that it is possible for every open file descriptor for /dev/console to be closed and then the device re-opened again. Such semantics have been around longer than Linux itself has; systemd is written to expect them. /dev/console is not supposed to magically vanish/become inoperable once all currently open file descriptors for it have been closed. Quite a lot of other softwares, including everything that uses openlog() with LOG_CONS, expect this of /dev/console too.
Indeed, /dev/console is one of the very few device files mandated to exist by the Single UNIX Specification (XBD part 10). The container manager was actually setting up an execution environment that is not POSIX conformant.
And I observe that indeed said container manager did get fixed.
GP was arguing that the idea that systemd would refuse to make a change that would benefit running inside containers was ridiculous and required evidence. I have required that evidence -- it is an example of a change in systemd being rejected in favour of fixing it in the container runtime.
Of course we fixed it, and of course you can argue that Lennart was correct (I still think that having SAK protections in a container is nonsensical but that's all water under the bridge). I obviously agree that our /dev/console handling was incorrect. In our defense, /dev/console doesn't actually make much sense in a container since the purpose of /dev/console is to access the physical console not the current PTY -- so anything we put there would still be "wrong" from the standpoint of POSIX. But you can't just ignore /dev/console because then a bunch of programs don't work. I could also go on about how it was also a Go stdlib issue because we'd assumed io.Copy "did the right thing" but it turns out it really doesn't handle any form of interruptions properly. But I'm sure you're not interested in that discussion.
The point is that I agree it was fixed, and I agree that fixing it in the container runtime was overall correct. But that wasn't the point I was making -- it was that there have been examples where systemd has made a change that broke running inside a container and they were not willing to make concessions for container runtimes. Which is what GP was arguing about.
I do have plenty of other examples (cgroups are particularly fruitful for systemd bugs that won't die), but they aren't really related to running inside a container.
[ I'm not a hater of systemd, or Lennart. I actually really like having a declarative service manager. My frustration comes from having to deal with it when developing system tools that don't want to be tightly coupled with it. That's where systemd really starts to get ugly to deal with. ]
> M. Poettering
What does the 'M' stand for? His first name is Lennart.
We had support for this in systemd in the initial versions, but since nobody was using and testing this, and the semantics were different from the system version in subtle ways we removed support for it.
Also note that systemd requires kernel 3.7 as minimal version right now. You cannot run it on older kernels, and hence really old operating systems anyway.
Sorry, but this is nothing we want or can support!
I admit that it is 2 years old - and if it has changed, then that would be nice. But there is an explicit reply there.. and I'm not spreading FUD. There's nothing more that I would love than to replace all the supervisord stuff in Docker with systemd.
Your original comment was about running docker in systemd.
That bug doesn't appear to be closely related.
You wrote: "... it would be great if there was a way we could use systemd+journald in older OS to run applications".
That's what Lennart replied to. He didn't dismiss using it in containers as pid1, he dismissed running it on arbitrary older linuxes as pid != 1... those are completely different scenarios.
I do not find any support in that link for your original comment, so unless you have more evidence or can explain how I'm misunderstanding your bug, I still consider you to be spreading FUD.
Because I have explicitly mentioned this in the bug. Quote unquote
Docker is only making things even worse - the recommended way of using Docker is shifting gradually to be using poorly-suited PID 1 processes like supervisord.
Additionally, there are other places where I mention Docker as a use case. Granted that English is not my first language, but I do not think there was confusion on this being around Docker. In addition the bug title is "
Standalone version of systemd to act as a process control system" ...Which was as generic as I could make it.
Running an arbitrary job-runner on older system is an implicit use case....But by explicit clarification, I brought in the Docker use case as well. I also explicitly mentioned that we are prepared to use newer kernels within Docker if that is important.
P.S. Already, a lot of us work with upgraded kernels because of Docker and this is already acceptable.
I WANT to use systemd - but I stand by my claim that this bug is sufficient illustration that there is no intent to support the docker ecosystem. I do not wish to start a flamewar, but I categorically reject your claims of me spreading FUD as deliberately malicious.
You replied with those lines after the bug was already resolved, and they don't match with your primary bug report well. In fact, they contradict your primary report where you wrote "I do not believe we will upgrade older computers to newer OS" and "it would be great if there was a way we could use systemd+journald in older OS" and "I do understand that systemd needs to be pid 1, but I wonder if a subset of functionality can still be exposed when it is not running as pid 1".
Again, all of those things are entirely unrelated to it running as a pid1 in a container, and those parts of your original bug report are what Poettering responded to.
You're reading way too much into that one bug. In fact, you could already run systemd in docker before you posted that bug if you go back to my first link to a mailing list thread about the very subject from 2014.
Your main bug report made it sound like you were asking to run systemd as pid != 1, which is unrelated to running systemd in docker.
Your clarification here only convinces me that you hold a grudge against systemd and wish to interpret benign comments in a bad light because, no matter how many times I read that bug, I cannot understand your claim that the original posting and Poettering's rejection of it is in any way related to running systemd in docker.