Rethinking the D-Bus Message Bus

Animats · on Aug 23, 2017

We rather consider a bus a set of distinct peers with no global state.

If they've gone that far, they may as well implement QNX messaging, which is known to work well. QNX has an entire POSIX implementation based on QNX's messaging system, so it's known to work. Plus it does hard real time.

The basic primitives work like a subroutine call. There's MsgSend (send and wait for reply), MsgReceive (wait for a request), and MsgReply (reply to a request). There's also MsgSendPulse (send a message, no reply, no wait) but it's seldom used. Messages are just arrays of bytes; the messaging system has no interest in content. Receivers can tell the process ID of the sender, so they can do security checks. All I/O is done through this mechanism; when you call "write()", the library does a MsgSend.

Services can give their endpoint a pathname, so callers can find them.

The call/reply approach makes the hard cases work right. If the receiver isn't there or has exited, the sender gets an error return. There's a timeout mechanism for sending; in QNX, anything that blocks can have a timeout. If a sender exits while waiting for a reply, that doesn't hurt the receiver. So the "cancellation" problem is solved. If you wan to do something else in a process while waiting for a reply, you can use more threads in the sender. On the receive side, you can have multiple threads taking requests via MsgReceive, handling the requests, and replying via MsgReply, so the system scales.

CPU scheduling is integrated with messaging. On a MsgSend, CPU control is usually transferred from sender to receiver immediately, without a pass through the scheduler. The sending thread blocks and the receiving thread unblocks.

With unidirectional messaging (Mach, etc.) and async systems, it's usually necessary to build some protocol on top of messaging to handle errors. It's easy to get stall situations. ("He didn't call back! He said he'd call back! He promised he'd call back!") There's also a scheduling problem - A sends to B but doesn't block, B unblocks, A waits on a pipe/queue for B and blocks, B sends to A and doesn't block, A unblocks. This usually results in several trips through the scheduler and bad scheduling behavior when there's heavy traffic.

There's years (decades, even) of success behind QNX messaging, yet people keep re-inventing the wheel and coming up with inferior designs.

tomegun · on Aug 23, 2017

Please note that this is still an implementation of the D-Bus specification, but trying to adhere to the principle of distinct peers. As is explained, this is not entirely possible when implementing D-Bus, so it is nothing more than a guiding principle.

AceJohnny2 · on Aug 23, 2017

So, SIMPL?

Synchronous Interprocess Messaging Project for LINUX (SIMPL) is a free and open-source project that allows QNX-style synchronous message passing by adding a Linux library using user space techniques like shared memory and Unix pipes[3] to implement SendMssg/ReceiveMssg/ReplyMssg inter-process messaging mechanisms.

https://en.wikipedia.org/wiki/SIMPL

http://icanprogram.com/simpl/

Animats · on Aug 23, 2017

If you do it via pipes, the performance will be terrible. If you do it via shared memory, there's a good chance that one side crashing will take down the other side.

QNX itself implements pipes via messaging.

AceJohnny2 · on Aug 23, 2017

This project implements QNX messaging as a kernel module:

https://github.com/martinhaefner/qnxcomm

Can't immediately find any more information about it though, so don't know the maturity.

CyberDildonics · on Aug 24, 2017

> If you do it via shared memory, there's a good chance that one side crashing will take down the other side.

Having done a fair amount of IPC through shared memory, you'll have to explain this one. One process crashing doesn't destroy a memory mapped file on Linux, OS X, or Windows.

transpute · on Aug 23, 2017

Since QNX is proprietary, QNX messaging is likely patent encumbered.

What's a good open alternative, or starting point for new innovation, given current investments in microservice architecture?

  - ZeroMQ (and nanomsg)
  - gRPC (Google)
  - Apache Thrift (Facebook)
  - Finagle (Twitter)
  - L4/seL4 IPC

sbierwagen · on Aug 23, 2017

Remember, patents don't last forever. The first release of QNX was 35 years ago. Patents only last 20 years.

sametmax · on Aug 23, 2017

The wamp protocol is interesting imo. It's independant a serialisation and transport despite the fact it uses websocket + json by default. It does routed rpc and pub/sub out of the box. But no low level router yet.

marqis · on Aug 23, 2017

Check out crossbar.io, it's a broker from the authors of wamp.

AceJohnny2 · on Aug 23, 2017

> Since QNX is proprietary, QNX messaging is likely patent encumbered.

Not necessarily. QNX messaging is old enough that patents related to the interface may have expired.

transpute · on Aug 23, 2017

Here is one from 2005: https://www.google.com/patents/US20060182137

"An asynchronous message passing mechanism that allows for multiple messages to be batched for delivery between processes, while allowing for full memory protection during data transfers and a lockless mechanism for speeding up queue operation and queuing and delivering messages simultaneously."

Maybe there are older patents that have expired?

copperred · on Aug 24, 2017

Agree. As a QNX user and later QNX employee, I really missed the messaging system when I had to go back to Linux. In fact, I spent time writing a QNX-like messaging library now used extensively in our application (at my current job). Although not optimal from a performance perspective, most of the messaging we have is small packets and infrequent sends. I feel that it has really accelerated our app development, and forced a similar design pattern for each thread or "service" within the application. The same result could be had with a library such as Nanomsg.

oconnor663 · on Aug 23, 2017

> Receivers can tell the process ID of the sender, so they can do security checks.

Does this run into security corner cases around pid reuse? (Or race conditions like a process dropping it's privileges after sending?) I remember the kdbus authors talking about making a lot of security metadata attach to the message itself, rather than indirecting through a pid, maybe for these reasons?

teeuwen · on Aug 24, 2017

> CPU scheduling is integrated with messaging.

So QNX messaging is implemented in kernel space?

I never really understood why kdbus was rejected from Linux, it seems to only have advantages compared to a user space message bus. The only disadvantage I can come up with is security.

sethrin · on Aug 24, 2017

Well, security is a pretty big concern, to be fair. The rejection of kdbus was probably political as much as anything else. However, I think that to the degree that there was any kind of consensus, it was not necessarily a rejection of the idea of a kernel-based message bus entirely, but that this particular effort was not up to the mark. This seems to be a fairly relevant bit on LKML:

https://lkml.org/lkml/2015/6/23/22

twoodfin · on Aug 24, 2017

Receivers can tell the process ID of the sender, so they can do security checks.

How do they implement this securely? I can't immediately think of a POSIX-y way for Process A to prove its pid to Process B without involving the kernel.

wmf · on Aug 24, 2017

Messaging should be implemented by the kernel.

CyberDildonics · on Aug 24, 2017

"All problems of scale become a network problem." Sean Parent

jnwatson · on Aug 24, 2017

As QNX has a microkernel, literally the entire OS, kernel and user space, is designed to support this.

eschaton · on Aug 23, 2017

This is basically Mach.

Animats · on Aug 23, 2017

No, Mach used unidirectional messaging. There was an RPC system, but it came with some formatting and marshalling stuff. Not sure about MacOS. Apple has an explanation of the five or so IPC mechanisms they ended up with.[1]

[1] https://developer.apple.com/library/content/documentation/Da...

onli · on Aug 23, 2017

That sounds reasonable. I'm very surprised. Disabling remote targets, ignoring SELinux, focusing on reliability.

DBus is the one part of the modern linux desktop I would like to/have to install to get the applications I want running, even though I dislike it a lot (pulseaudio and systemd one can just not install). One example is the password remember function of steam. Having a more reasonable implementation could help with this a lot.

tomegun · on Aug 23, 2017

For the record: dbus-broker has full SELinux support.

onli · on Aug 23, 2017

Well, I don't mind either way. https://github.com/bus1/dbus-broker/wiki#using-dbus-broker says it has to be disabled.

tomegun · on Aug 23, 2017

Indeed, thanks for the pointer! Removed that now (it was left-over from before we got SELinux support).

pinpeliponni · on Aug 23, 2017

Hmm. If it doesn't have SELinux policy that actually does something, it probably shouldn't run as a daemon. Let's see how I can remove it... :(

tomegun · on Aug 23, 2017

Not sure what you mean here. dbus-broker(1) supports SELinux exactly to the same extent as dbus-daemon(1) does. Also; remove what?

atemerev · on Aug 23, 2017

Dbus is bloated hell. Whoever came with the idea "let's cram all communications from all sources into the single unified data stream, and let the clients fish what they need out of it" had the strange mapping of mental processes, to say the least. Most other forms of IPC are better (more scalable, more elegant, more comprehensible) — "everything is a file" is better, actor model is better, and I nearly think that even plain shared memory is better than a common bus.

There is a reason there is no "shared bus" in Internet communications.

hp · on Aug 23, 2017

> There is a reason there is no "shared bus" in Internet communications.

Yes, but dbus isn't _for_ Internet communications. It was designed to wire together the multiple processes that act more-or-less as a whole to implement a desktop environment.

"Better" is contextual. The main problems dbus solves aren't "IPC" at all - they are things like lifecycle tracking, service discovery, and getting events across the system/user-session security boundary.

dbus-broker looks interesting!

coldtea · on Aug 23, 2017

>Dbus is bloated hell. Whoever came with the idea "let's cram all communications from all sources into the single unified data stream, and let the clients fish what they need out of it" had the strange mapping of mental processes, to say the least.

Yeah, god forbid anyone attempts to unify similar concerns and do away with the mess of ad-hoc solutions that is POSIX/Linux.

vidarh · on Aug 24, 2017

The concern is more that they seem to have ignored literally decades of prior art when designing it, and in doing so in an overly complicated way have added to the mess rather than reduce it.

atemerev · on Aug 26, 2017

You have just described every project initiated by Pottering ever.

pjc50 · on Aug 23, 2017

Well, there's multicast, but nobody uses that. Some things use link-local broadcast.

Personally I think the Windows messaging system would actually be a pretty good model to follow, especially if you could give it an actual payload and not just two words. It would certainly solve the actual problems DBus was built to address - media change notifications and things like that.

fusiongyro · on Aug 23, 2017

We actually rely on multicast here at the NRAO for monitor and control of the VLA. I admit it's the only place I've heard of it being used.

ZeroMQ is getting used more for those kinds of purposes; the Greenbank Telescope uses it for one of their instrument backends and we are now using it for VLITE and REALfast. The new archive system I'm helping build uses AMQP.

jpm_sd · on Aug 23, 2017

I have also seen ZeroMQ used for mobile robot control; it was considered for ROS 2.0 [1] before they settled on DDS [2]

[1] http://design.ros2.org/articles/ros_with_zeromq.html [2] http://design.ros2.org/articles/ros_on_dds.html

jcurbo · on Aug 23, 2017

This is fascinating (as an amateur astronomer), have you written anything else on the systems design of these types of instruments, or have any good pointers? I imagine for large instruments there is a massive amount of data involved and lots of stuff to control. I know it's bad enough with just my one telescope setup :)

fusiongyro · on Aug 23, 2017

There is quite a bit of technical documentation about our systems, and the code is supposed to be open-source (though a lot of it isn't publicly accessible as it should be). But I haven't written anything meant to be a high-level overview of how it works. I think I will try and do that in a few days and send you an email.

I have been joking about how we should build a "total sh*t array" out of old Dish/DirecTV antennas, so that we could explore the systems design without worrying too much about whether anything could be done with the data collected. This hasn't interested my coworkers that much :) There is an amateur radio astronomy society, and there are plans for how to build various levels of radio telescope, starting from ~$50 and an old Dish receiver and going up. And our open-skies policy means that you do not have to be a professional astronomer to use our instruments, although I only know of one or two amateurs that have proposed for time. (They did get it, though).

As a rough back-of-the-hand deal, we allow anybody who gets time to access about 25 MB/s of data. The correlator we have currently (WIDAR) can certainly output much more than that, up to gigabytes per second, but the rest of the infrastructure certainly can't keep up at that rate sustained. It's not unusual for an observation to top a few TB in size. ALMA data files are probably even larger on average.

We are already in the early design stage for a next-generation VLA, which will increase the number of antennas to about 300. At that point, we probably won't be able to keep correlated but unprocessed raw data, just because of the sheer size of it.

jcurbo · on Aug 24, 2017

Thanks, this is interesting! I work at JHU/APL where there are people that do similar stuff, although I don't personally. The APL Astronomy Club is actually in the midst of trying to get some of our unused scopes & equipment in better shape and possibly put to use so I'm thinking about ways to automate things. Plus I'm not happy with the state of data processing for amateurs, it seems like everything is bespoke and old freeware or expensive software packages. That is what got me thinking about how the pros do things. Looking forward to your email!

benjaminjackman · on Aug 23, 2017

Multicast is used pretty heavily in the finance space for dissemination of market data.

jnordwick · on Aug 23, 2017

Intranet only though and usually with only a couple hops away within the same colo'ed building.

bitwize · on Aug 23, 2017

Read the actual rationale behind dbus by the author before you make such comments on his sanity:

https://news.ycombinator.com/item?id=8648995

Dbus solves problems that the IPC methods you discussed do not. If there were a better solution, it would probably have been adopted by now.

atemerev · on Aug 23, 2017

The actual reasoning was "we need to get something to make desktop guys happy. Something is better than nothing. Dbus is something". And it is a valid reasoning, right. However, it left some legacy and can now be rethought.

FooBarWidget · on Aug 23, 2017

You make it sound like D-Bus is a first attempt and even a hastily put together attempt to solve its target problem. It's not.[1] D-Bus is heavily influenced by (and intended to replace) DCOP, the communication system used by KDE 1 and KDE 2. DCOP was widely lauded as an extremely well-designed system.

[1] https://en.wikipedia.org/wiki/D-Bus#History_and_adoption

hp · on Aug 23, 2017

yep. as noted in https://news.ycombinator.com/item?id=8648995 it wasn't even just dcop - KDE had tried CORBA before switching to DCOP, and GNOME of course tried CORBA (two different ORBs), then tried Bonobo-on-top-of-CORBA, and SOAP. There were tons of documented protocols on top of X11 (many still in use today). And that's ignoring the countless ad-hoc solutions that various apps used...

Linux desktops are implemented as process swarms and communication among processes is one of the central things they have to deal with.

atemerev · on Aug 23, 2017

CORBA and SOAP are things that must not be named... The horror is too real. Of course, Dbus is better than CORBA and SOAP, almost anything is.

How old is CORBA, again? And how crazy is SOAP?

hp · on Aug 24, 2017

It isn't better than those categorically. But it's definitely better than them for building a multi-process desktop.

"better" depends on what you are trying to do.

Of course CORBA and SOAP are considered old and horrible now, 15 years later. But currently-popular stuff is in many ways equally unsuited to coordination of local desktop processes, because it's not designed for that.

cjhanks · on Aug 23, 2017

It largely depends on the use case. dbus is functionally just a transport layer - it can very easily be used to implement an actor model if you choose to do-so (services can dynamically register and unregister channels).

There are quite a few cases where reliable 1-to-many and many-to-many communications need to occur. This is particularly the case when you have many loosely affiliated independent applications with optional communication paths. d-bus, for all of its flaws... does that well enough that I rarely notice it's running on my system.

arca_vorago · on Aug 23, 2017

"Linux Only"

I like this approach more and more these days. For example, I run murmur(mumble) servers sometimes, and they deprecated d-bus support for ZeroC ICE (gplv2 or proprietary), but it seems almost as bloated if not more so. The reasoning was mostly around the portability bindings...

Recently though, I have been refusing to support Windows and OSX as a concious decision. One thing I've found is that the constant want/need to target every platform adds an ever-increasing amount of complexity, which really seems to go against the unix philosophy. So I applaud others willing to buck the trend and narrow scope down.

In the end, I think the main problem with the many eyes theory is that code has gotten so complex that there simply aren't enough eyes, and therefore I think the future of software is going to be in reduction of complexity. For example, loc isn't the best measure, but the Minix 3 kernel is at ~20kloc, while the Linux kernel is now at, what ~11mloc!? Not even redhat can audit that shit properly. (another reason we need a Hurd microkernel, but I digress)

microcolonel · on Aug 23, 2017

> the Minix 3 kernel is at ~20kloc, while the Linux kernel is now at, what ~11mloc!? Not even redhat can audit that shit properly. (another reason we need a Hurd microkernel, but I digress)

Well, I don't think Hurd is going anywhere. They missed a crucial opportunity to move from Mach to L4 and they simply didn't have the manpower. What we might focus on is migrating facilities (drivers, core services like TCP, system services) from Linux, OpenBSD, Illumos, and Minix3(especially that daemon that can restart things even when the filesystem daemon goes kaput) to a well-designed L4 like seL4. At that point, at least we have some hope of taming the beast.

The great opportunity here is that you don't need to care too much which license each driver or service is under, since they're all running in user space. You can have your (yuck) CDDL processes, where you keep your OpenZFS instance. You can have your GPLv2+ processes, where you keep your (maybe a bit dirty, but at least they exist!) Linux drivers.

Also, the major difference in line counts is precisely because of the number of facilities offered by the Linux kernel (most of which you can disable, or would never be enabled in the first place!). Minix3 in its "equivalent" form (containing sufficient drivers for the machine running standard daemons) vs Linux with the same subset of drivers and services would be a much fairer comparison.

chris_wot · on Aug 24, 2017

> For example, loc isn't the best measure, but the Minix 3 kernel is at ~20kloc, while the Linux kernel is now at, what ~11mloc!? Not even redhat can audit that shit properly. (another reason we need a Hurd microkernel, but I digress)

It's not like that's 11mloc in one monolithic system. The Linux kernel has a variety of different subsystems, and is maintained by a lot of people. Each subsystem is auditable, so I don't see you have a valid objection here.

rleigh · on Aug 23, 2017

Ice is doing a lot more stuff than dbus, it's mainly features rather than bloat.

arca_vorago · on Aug 23, 2017

I def see a lot more features that actually work in ice, that's for sure.

zokier · on Aug 23, 2017

Just noticed that this lives under bus1 github organization; does that imply that eventually it will be using bus1?

Btw, whats happening at bus1, haven't heard about it lately?

tomegun · on Aug 23, 2017

> Just noticed that this lives under bus1 github organization; does that imply that eventually it will be using bus1?

That is something we intend to explore. The idea would be to let bus1 be used under the hood by dbus libraries to do peer-to-peer communication where possible (circumventing the broker) but still stay compatible to the D-Bus semantics.

> Btw, whats happening at bus1, haven't heard about it lately?

We spent half a year working on dbus-broker ;)

ploxiln · on Aug 23, 2017

When kdbus was first proposed for merging, Linus basically said something like "dbus is way more inefficient than just kernel api limitations would require, fix that first before proposing a new kernel feature". So that seems to be what they're looking into now.

ref: https://lkml.org/lkml/2015/6/23/657

baybal2 · on Aug 23, 2017

As I remember from more than a decade ago, the selling point of DBUS was that they were not trying to design a high performance message bus with sophisticated work mechanisms in spirit of Corba and Bonobo, but a small, flexible, and utilitarian one.

Things like implicit message buffering were deliberate design decisions.

ajross · on Aug 23, 2017

IMHO the problem with D-Bus was that it was never small and utilitarian. They decided (correctly) to ignore all the engineering effort involved in performance and scalability, and put all that overengineering into the API instead.

D-Bus code is basically unreadable, as not only are the bus names heavily scoped (java-style) to avoid collisions, but also the interface and method names. A tiny python (or whatever) script to invoke a single method on a well-known object should be a one-liner but in practice lives over 6-7 lines just due to verbosity.

D-Bus types are inexplicably low-level for a "utilitarian" IPC mechanism, leading to a bunch of type conversion to do simple things, and a ton of marshalling code in the core. Javascript has shown us how far you can get with just IEEE doubles and UTF-8 strings, yet D-Bus suffers with a type model that looks more like C.

fiddlerwoaroof · on Aug 23, 2017

Yeah, I used to have a whole bunch of shell scripts to automate KDE 3 apps via dcop and then when dcop was dropped in favor of dbus, the complexity of the latter system discouraged me from porting the scripts.

Whatever technical limitations dcop may have had, its command line was amazing: space separated words and a emphasis on discoverability made it a joy to use

simcop2387 · on Aug 23, 2017

qdbus is an almost adequate replacement for that. It's still more verbose and a bit more difficult to pass some arguments (this is all from memory) than dcop was, but it's servicable.

lima · on Aug 23, 2017

pydbus nicely abstracts away much of that: https://github.com/LEW21/pydbus

hp · on Aug 23, 2017

right, nobody was ever intended to use the raw protocol details directly... those details were intended to support an API that looked like using in-process objects (well, perhaps in-process objects with methods declared as 'async').

vidarh · on Aug 23, 2017

I wish they'd just build something simple, like AREXX. There's a reason AREXX ports were in almost every AmigaOS app: It was trivially simple to get started. The network effects were huge - pretty much any app was simple to automate. People to some extent built their apps as message pumps where input events (mouse, keyboard) triggered the same command hooks as AREXX messages, so every little piece of the app was scriptable.

If you wanted a more advanced API than AREXX could reasonably accommodate, it was easy enough to layer the more complex bits next to it.

The threshold for people to take full advantage of DBus is still too high. Maybe there's a need for something that complex for inter-application communication, but if so we'd also benefit from something simpler.

Maybe it's just a documentation failure... I don't know.

revelation · on Aug 23, 2017

Ahh yes, we know this all too well, the Linux desktop trap:

iterative work is lame, the old solution is so bad it's not even wrong, here is my idea for a rewrite, look it's even still compatible (for another few minutes).

asveikau · on Aug 23, 2017

I disagree with this characterization, because they explicitly state that compatibility with the original is a goal. When people have the more careless "re-write and throw-away" attitude, they often abandon the old API too.

So this reads more of a strategic re-write, or re-do of implementation while keeping API, which I think is often a smart way to do it.

jzl · on Aug 23, 2017

aka the "CADT" model. :)

https://www.jwz.org/doc/cadt.html

But, not saying that characterization applies here specifically. The article was quite well-reasoned in explaining the proposed changes, as far as I could tell. Disclaimer: I barely know anything about D-Bus.

chme · on Aug 23, 2017

So is the dbus-broker the latest project from the kdbus/bus1 guys.

Since from the text dbus-broker does not use the bus1 kernel module, does that mean the bus1 project is dead?

tomegun · on Aug 23, 2017

bus1 is very much not dead. We intend to work on the next RFC soon.

oconnor663 · on Aug 23, 2017

If bus1 magically landed tomorrow, what would that mean for dbus-broker? Are the projects related at all, or mostly doing different things?

tomegun · on Aug 23, 2017

At the moment dbus-broker does not have code to take advantage of bus1, but we intend to explore adding bus1 support to dbus-broker, so that peers (if their libraries support it), would seamlessly communicate peer-to-peer (circumventing the broker) when possible.

chme · on Aug 23, 2017

Great to hear! Keep up the good work Tom and friends!

throw7 · on Aug 23, 2017

are you still required to reboot the system if you upgrade "dbus-broker"?

tomegun · on Aug 23, 2017

Yes, for the time being we do not support reexecution.

JdeBP · on Aug 24, 2017

One thing that I wonder about this is how it deals with the D-Bus Death Rattle.

* https://jdebp.eu/FGA/dbus-death-rattle.html

tomegun · on Aug 24, 2017

This is an application issue, not a brokre/daemon issue. The broker will (as dbus-daemon(1)) does, deliver all signals that clients subscribe to. If they subscribe to things they don't care about, that is something that should be fixed in the clients. During the kdbus times, quite some time was spent on fixing clients to avoid too broad subscriptions exactly to fix the issue described in that blog post.

JdeBP · on Aug 24, 2017

Not every page on the WWW is a blog post, nor even every WWW site a web log to have blog posts on.

tomegun · on Aug 25, 2017

My apologies, it read like a blog. Seeing the parent page I see that it is not. Either way, my comment stands.

j_s · on Aug 23, 2017

I'm sure systemd would be happy to take over responsibility for this functionality. (Sorry, couldn't resist!)

JdeBP · on Aug 26, 2017

Actually, it has been given the responsibility.

A quick review of the code reveals that dbus-broker-launch relies upon systemd entirely for bus-activation. To activate a dbus server on demand, it sends a message to systemd using a systemd-specific protocol. It has no way to demand-activate services on a non-systemd operating system.

The dbus-daemon that this purports to be compatible with at least can be persuaded, via its launch helper, to demand-activate services in a generic fashion using whichever of initctl, systemctl, service, or system-control is appropriate.

* http://jdebp.eu./Softwares/nosh/avoid-dbus-bus-activation.ht...

* http://jdebp.eu./Softwares/nosh/guide/per-user-dbus-demand-s...

Rjevski · on Aug 23, 2017

If systemd does it right (just like it did with making service management) then I see no issues with this.

work_account · on Aug 23, 2017

The problem with systemd usurping basic Linux functionality is that it makes it really difficult for non-systemd distros to keep up.

It's not like you can run 'systemd-udevd' standalone, for example. Instead there are massive "porting" efforts like eudev and elogind, just to extract the functionality BACK from systemd. And then you have obsolete-but-necessary components such as ConsoleKit and PolicyKit that are stuck on ancient pre-systemd versions with no current replacement.

I started using systemd back before they even took over "udev". Back then systemd was a breath of fresh air. Now I'm using a different service manager and observing systemd gobbling up various critical parts of the Linux desktop like some damn Katamari is like watching a train accident in slow motion.

kuschku · on Aug 23, 2017

Have you ever considered why it is so much effort to extract this functionality?

systemd can iterate quicker, and work faster and better, because they can share more code between projects.

Code that should have been in the stdlib, provided by the distro, but which no one does. So it ends up in systemd.

You see the issue even in GNU yes, which implements its own version of a buffered output, or in cat, which does the same, but slightly different.

All these things should be in the stdlib, and because they’re not, those projects that can use premade solutions iterate a lot quicker, and can get better, faster.

CyberDildonics · on Aug 24, 2017

If what they are doing is so fundamental, why isn't it more modular?

TazeTSchnitzel · on Aug 23, 2017

When has that ever happened?

Rjevski · on Aug 23, 2017

Since systemd became mainstream? If only you've seen the time, money and sanity I saved when I switched to systemd instead of suffering with prehistoric initscripts.

_lqaf · on Aug 23, 2017

You mean, when they started creating bizarre arbitrary username rules for handing out root(!), breaking DNS resolution, time sync, and so on?

Perhaps it is best considered a time/money/sanity redistribution scheme, because I've certainly spent plenty on the above.

JoshTriplett · on Aug 23, 2017

The bugs generate noise and make the news, the successes rarely do.

(And what's wrong with time sync, exactly? It seems to work perfectly with zero configuration required, for a large number of people, myself included.)

_lqaf · on Aug 23, 2017

The question above was, "when has Systemd done it right?", followed by an assertion that they've been doing it right "since systemd became mainstream." I offered several examples, such as the '0day' username thing that was not only an example of a bug, but an example of very clearly Doing It Wrong on a design level.

I'd suggest you search for timesyncd issues on Google, but you preemptively announced bugs don't matter to you and declared "works for me!", so I don't know why you'd ask. So perhaps just stop and consider for a moment why, exactly, it is that your init system is expanding to replicate existing, functional, standards-compliant userspace daemons with limited, buggy, noncompliant "replacements".

JoshTriplett · on Aug 23, 2017

> The question above was, "when has Systemd done it right?", followed by an assertion that they've been doing it right "since systemd became mainstream."

Yes, they've been doing a great many things right. That doesn't make them bug-free by any means, nor does it mean that every single thing they've done is right; it does mean they built something incredibly useful and working for a large number of people. People don't seem to talk about those as often; outrage carries so much louder.

> I offered several examples, such as the '0day' username thing that was not only an example of a bug, but an example of very clearly Doing It Wrong on a design level.

Yes, and that was broken. And it has since been fixed, but that didn't get nearly as widely reported. systemd now checks for that issue and reports it rather than running the unit in question.

(You could argue about its parsing of such fields, and that discussion is ongoing, but that's separate from the issue of running the unit as root.)

> I'd suggest you search for timesyncd issues on Google, but you preemptively announced bugs don't matter to you and declared "works for me!"

No, I asked the question of what you considered problematic about timesyncd, especially since you seemed to be talking about what you considered fundamental design issues. I keep a close eye on the large community of Debian folks running systemd, and read the bugs reported, and I had not seen anything notable related to timesyncd, especially not anything that would suggest a design issue.

I never said bugs don't matter to me, nor was I attempting to generalize my own experiences to suggest that it must necessarily work for everyone. You seem to be actively seeking out and assuming hostility where none exists; I would not be surprised if you find it, but I'm not looking to supply any.

_lqaf · on Aug 23, 2017

You appear to want to have a particular discussion that I'm only marginally interested in, and I don't really have much time today, so I'll have to curtail this. I apologize for that.

Sounds like we're agreeing, sort of. A nasty bug was caused by an inexplicably weird design decision, demonstrating that they haven't been "doing it right" since it became mainstream. That is what I was responding to.

Backing into my interest in the discussion: it is that inexplicably weird design decision, made far worse by the authors' repeated habit of reflexively trying to make their problem someone else's, that explains why I simply don't trust them or their code. This is a pattern that has been repeated over several several iterations across years - the first one I was aware of was when they were crashing the kernel in debug mode, leading to the famous "fuck systemd" patch-showdown. The repeated replay of that pattern shows they haven't learned anything. That combination of arrogance and incompetence is annoying in college grad-hotshots, but they can be kept in line until they grow up; bluntly, it has no place in my systems, and makes me wonder why RHEL wants to burn trust and goodwill like this.

Moving on: briefly, timesyncd is sntp, not ntp, is client-only, doesn't track jitter, only jumps forward, and makes other mistakes. I've seen reports of the sntp implementation being wrong, or perhaps simply having interop problems, but haven't bothered to look because I don't use it. Those things probably don't matter for a gaming client or such, but it simply isn't an ntpd replacement.

And it still leaves the question of why your init system, produced by an erstwhile-enterprise vendor, is replacing unrelated daemons with reimplementations that look more like college-assignment toys than production software.

JoshTriplett · on Aug 23, 2017

> A nasty bug was caused by an inexplicably weird design decision, demonstrating that they haven't been "doing it right" since it became mainstream.

As I understand it, two separate design decisions interacted there. One was "some fields should be ignored if not supported or if they use new syntax that isn't supported, so that a unit written for new systemd won't break on old systemd"; if they'd done that differently, it'd have generated many problems as people wrote units for the latest bleeding-edge version. The other was "parse and validate usernames and see if they look sane"; the ideal solution there would be "check if they exist and do no other validation if they do", but NSS turns out to not be viable for that in the context of an init system. Usernames should never have been an "ignore if not supported" field, but it's at least understandable how the issue could occur.

If your standard for "doing it right" is "every single thing they do is always correct", very little software will meet that standard.

> Moving on: briefly, timesyncd is sntp, not ntp, is client-only, doesn't track jitter, only jumps forward, and makes other mistakes. I've seen reports of the sntp implementation being wrong, or perhaps simply having interop problems, but haven't bothered to look because I don't use it. Those things probably don't matter for a gaming client or such, but it simply isn't an ntpd replacement.

timesyncd doesn't claim to be a replacement for all of ntpd; it claims to be a simple implementation of the common case of "I want my time to be correct". A client-only SNTP implementation is what they set out to build.

rndgermandude · on Aug 23, 2017

It's understandable that they made that mistake, it's not understandable that Lennart kept insisting it's not a bug until it blew up in the news. Remember that he actually got cranky that somebody assigned CVEs to the issue...

I agree with _jal that this kind of arrogant, dismissive behavior and repeat behavior doesn't instill trust and does a huge disservice to the project's reputation.

JoshTriplett · on Aug 23, 2017

I do agree that the messaging and tone was painfully bad, yes.

I can sympathize with people who spend all day dealing with unwarranted rants and flames letting some of that leak out into their responses to everything, but yes, that should have been handled better.

mixedCase · on Aug 23, 2017

> prehistoric initscripts

That's the only thing I've seen people praise systemd for, and I happen to agree.

Nothing else systemd does do I think it does better.

If systemd had remained an init system and nothing else, it would've been a clear improvement worth breaking what it does. What systemd is today should be called the Fedora userspace suite, not a Linux init system.

bkor · on Aug 23, 2017

systemd is far more than an init system. I saw somewhere the description of the "basic building block of Linux".

IMO a well working together basic building block is great. The huge amount of pointless differences needed to be reduced ages ago.

koffiezet · on Aug 24, 2017

> IMO a well working together basic building block is great. The huge amount of pointless differences needed to be reduced ages ago.

Indeed, who needs competition, the bazaar philosophy got us nowhere right? /s

atemerev · on Aug 23, 2017

There were many other modern init replacements which did the job better than init scripts and better than systemd.

I like my systems elegant, transparent and bloat-free. Systemd is none of that.

acdha · on Aug 23, 2017

… and yet, oddly, most of the distribution maintainers who reviewed them picked systemd. That doesn't mean it's perfect but it does suggest this conversation would be far more productive if you reviewed the historical discussions to see the reasoning behind those decisions.

atemerev · on Aug 23, 2017

The reasoning was hammering it down to everybody's throats by Redhat. It is easier to adapt to proposed solution (even inferior) where there are some guarantees of support and traction, than to reinvent everything and be left as sole maintainers.

It doesn't relate in any way to the quality of the solution.

JoshTriplett · on Aug 23, 2017

You are giving far too little credit and good faith to the distribution maintainers who spent the better part of a year carefully evaluating the available options. I watched those discussions take place, and contributed to many of them. They included detailed evaluations of the service management capabilities, verbosity, transparency, debuggability, maintainability, tooling, backward compatibility, any numerous other factors.

That you do not like the outcome does not mean the process was inherently flawed or incomplete.

atemerev · on Aug 23, 2017

I certainly do not blame distribution maintainers. I blame Redhat, with their not-so-subtle hints of "we might not have any resources to support anything that comes from us for non-systemd systems". Which at some point extended to GNOME and other vital sine qua nons.

JoshTriplett · on Aug 23, 2017

> not-so-subtle hints of "we might not have any resources to support anything that comes from us for non-systemd systems".

That honestly doesn't seem unreasonable to me. If you build a tool to make things easier to maintain, you lose much of the benefit of that if you still have to support other alternatives where you have to do everything manually. (For instance, maintaining a 100-line init script in addition to a 10-line unit file.) Asking people who care about that to do the work to maintain it seems perfectly reasonable.

> Which at some point extended to GNOME and other vital sine qua nons.

There are far more people complaining about the lack of alternatives, and far fewer people willing to actually write and maintain alternatives. It doesn't help that many of the people complaining take the attitude of "you don't really need that anyway".

Xylakant · on Aug 23, 2017

AFAIR the gnome people were desperately looking for someone who'd maintain console kit and the systemd people stepped up and provided a working alternative, so gnome switched. After the switch, systemd basically was a hard dependency for every system using a recent gnome version. If the time spent complaining about systemd were spent towards supporting alternatives, things might look different today. It doesn't require coding an alternative, but organizing support, funding a patreon etc. would go a long way.

JoshTriplett · on Aug 23, 2017

It's also worth noting that you can still, today, run systemd-logind (and several other systemd components) with the compatibility layer of systemd-shim on top of another init system. But even that doesn't have enough people willing to maintain it.

atemerev · on Aug 23, 2017

People who are unhappy with systemd are usually so for reasons other than compatibility.

Twirrim · on Aug 23, 2017

I really don't understand Redhat's position here.

RHEL5 came with one init system. RHEL6 came with another. RHEL7 comes along with yet another init system replacement.

Each of which has required software vendors who build software to run on those platforms to do non-trivial porting work. I know it's annoying the software vendors no end.

I've heard from so many end users that they can't upgrade to RHEL7 (or derivatives) because the software they need to support doesn't work with systemd yet. Stuff that makes them frustrated with both RedHat and the software vendor. Annoying your customers hardly seems the sanest business practices.

Luckily with Debian being on board, and thus Ubuntu, at least there's some incentive for vendors to work at it.

acdha · on Aug 24, 2017

They should be frustrated with the vendor if an upgrade is being held up by the trivial Upstart to systemd conversion for a dozen lines of code. Supporting both init systems combined is an order of magnitude less work than SysV init, and it allows you to rip out a good bit of darmonization code which tends to have details less portable across Unix variants.

It's far more likely that the real cause is one of the many major dependency updates (6 to 7 is like a half decade jump) and the systemd mention is either an excuse or axe-grinding.

_lqaf · on Aug 23, 2017

There seems to be an effort to rewrite history to "well, everyone adopted it, so it must be good, right?" That is very much not the way it happened. Go read the discussions on the Debian lists, for instance - RHEL was indeed throwing its weight around.

Add to that the games they're playing with interfaces, and at some point, it starts smelling like a miniature, farcical version of Microsoft in the 90s.

aseipp · on Aug 23, 2017

I don't even know what planet you're living on. In particular, Russ Allbery of the Debian technical committee gave some of the most detailed evaluation of systemd vs alternatives that I've seen from anyone from a user and maintainer POV. (He noticeably said, beforehand, he didn't think it would be a huge deal at first, but it turned out it was a big difference.) Many people did not like their decision, but suggesting that the Debian committee in particular didn't take the issue seriously and were strong-armed by Red Hat -- makes me think you have no idea what you're talking about, honestly.

I'd like to read this part about RHEL throwing their weight around on the Debian lists. Given your implication is that Debian was clearly "forced" to adopting it, I imagine the relevant evidence shouldn't be hard to find.

(Alternatively, you could actually ask other Debian maintainers yourself, like Josh, in this thread how it went. But you already did that and it didn't seem his narrative aligned with yours, so...)

JdeBP · on Aug 24, 2017

Everyone seems very focussed upon Debian in this thread, even though the thread started out discussing all distributions.

It might be worth everyone's while remembering that Debian is not the be-all and end-all here, even though it did have a massive hoo-hah. The processes in other distributions were markedly different.

* https://news.ycombinator.com/item?id=11834348

* http://jdebp.eu./FGA/debian-systemd-packaging-hoo-hah.html

pas · on Aug 26, 2017

The first link doesn't seem to support redhat strongarming.

Arch rc maintainer decided to drop rc for systemd.

The linked evil-poettering intermezzo seems to be irrelevant and strawman-ish too.

bkor · on Aug 23, 2017

I participated in almost all of Debian discussions. I don't see how you can claim what you claim, at all. Care to explain? E.g. RHEL was throwing its weight around?

The process at Debian started years later than any other distribution. It involved various votes, at least a year of discussion, etc.

Then in 2017 someone ignores history and summarizes this into "RHEL was throwing its weight around"... ?!?

atemerev · on Aug 23, 2017

One question: did the ability (or easiness) to stay compatible with Gnome 3 influence the decision? How about the great udev scare? Also, we all remember that 4:4 vote, and all the drama later.

acdha · on Aug 23, 2017

Your first sentence is directly contradicted by the second and certainly doesn't fit the history of the distributions. Debian, et al. have a long history of not following Red Hat when they see value – e.g. think of the patches which Debian developers maintain to make configuration easier and safer compared to Red Hat — and reason why they couldn't have done the same here other than that the people who actually do the work didn't think it was worthwhile.

It's simply disingenuous to describe Red Hat contributing a lot of engineering time for free as “hammering it down to everybody's throats”, any more than Linux was hammered down our throats over Hurd. I don't think systemd is perfect but I think anyone's standing to complain about open source software is bounded by your willingness to commit to supporting alternatives.

(And, lest you think I'm some sort of die-hard Red Hat fanboy I should note that I started using Debian in the Bo/Hamm era and have never found a compelling value to using RH)

atemerev · on Aug 23, 2017

On the contrary, I mostly use Fedora. Systemd won, it is sad, but I accepted it. However, I want to outline that the reasons for its proliferation were less than objective merit.

Etzos · on Aug 23, 2017

I'm not sure this really makes sense. It sounds like RedHat didn't hammer it down throats, but instead created a solution that had some guarantees of support and traction which distro maintainers then decided was better than the current alternatives. No one was holding a gun to distro maintainer's heads tell them that they needed to move away from the existing solutions or else, or at least I haven't seen any evidence of that.

So if they weren't forced into it and RedHat did indeed provide some guarantees, it does actually suggest what the parent says.

Xylakant · on Aug 23, 2017

Even if what you say is true and systemd is the inferior system and everybody only picked it because Redhat promised to support it, that's fine with me. Support is indeed a major feature. Anything that is not supported is lacking a feature. I'll take the good-enough-but-supported version of a thing over the perfect-but-left-out-in-the-cold version any time of the day for anything that I'd put close to production.

edoceo · on Aug 23, 2017

I <3 OpenRC

Nothing broke, init scripts still plain simple.

throw2016 · on Aug 23, 2017

This is repeated a lot but without specific examples lacks substance.

Given scripts are used in nearly all linux software branding them them as 'prehistoric' comes off sounding misinformed.

brainfire · on Aug 23, 2017

I think it is the specific scripts that are prehistoric, not the idea of scripting.

tyfon · on Aug 23, 2017

I have to agree with this, I never got the major systemd hate.

And I started using linux in 1994-95ish and still use it and different bsds.

The only thing it doesn't do properly out of the box is logging. I'm not the biggest fan of non text log files.

api · on Aug 23, 2017

I haven't had a lot of stability problems with systemd, but I hate it. For me the hate comes from its extremely arcane interface and configuration layout. It's as bad as git in that you have to Google to learn how to do anything at all and the way to do simple things is obtuse and non-memorable, but unlike git I don't use it constantly so I never remember. Doing anything with systemd involves searching for how to do it since the command structure, layout, etc. is bizarre and counter-intuitive.

There seem to be some Unix/Linux developers who have a mysterious affinity for obtuse arcane time-wasting cognitive-load-increasing design. It's like they see the ability to master crufty badly designed systems as a badge of honor or something, or maybe it comes from a drive for job security or consulting hours.

Example: "systemctl list-units"

What's wrong with "systemctl ls"? What would have been wrong with a shorter command that's easier to type like "sys ls"? It's a core aspect of the system so a name like "sys" would have been appropriate, easy to type, and easy to remember.

Even worse the output of list-units has overly long lines and manages to be simultaneously hard for humans to read and hard for machines to parse. It uses white space as both a delimiter and within identifiers, making shell script parsing with "cut" etc. impossible.

The entire design is like this: obtuse, verbose, clunky, hard to type, hard to remember.

I absolutely loathe this stuff. Using it inspires fantasies of causing physical pain to its designers. A little bit of thought could have resulted in a clean, sparse, intuitive, and discoverable design with memorable commands and a straightforward configuration structure.

Don't get me started on abominations like Debian packaging or Windows drivers, though those are somewhat forgivable as their ugliness can be explained by their age and the need for backward compatibility. Systemd was a green field design from the 21st century so it has no excuse.

dmix · on Aug 23, 2017

I have to agree on systemctl's naming scheme is terrible inefficient. This is something I use all the time when I'm on Archlinux and while I've come to memorize systemctl's interface for the most part it's still very poorly thought out, favouring verbosity over simplicity at every level.

I've made enough aliases to make it useful and without having to type such a long name. It should definitely be a shorter name for such a critical piece of software.

One example is using user services vs system services and the various non-intuitive locations and name schemes of the various .service files.

michaelmrose · on Aug 23, 2017

I'm not a systemd fan, quite the opposite however I have often found package manager, service management clis less than friendly so I regularly write a little wrapper script just to provide a short cuts to my most used functions.

The package management system regardless of distro is pkg the the service management is serv. This would also be helpful if you are using more than one system and needn't be complicated.

pmoriarty · on Aug 23, 2017

"What's wrong with "systemctl ls"? What would have been wrong with a shorter command that's easier to type like "sys ls"? It's a core aspect of the system so a name like "sys" would have been appropriate, easy to type, and easy to remember."

"Even worse the output of list-units has overly long lines and manages to be simultaneously hard for humans to read and hard for machines to parse. It uses white space as both a delimiter and within identifiers, making shell script parsing with "cut" etc. impossible."

It's the Microsoftization of Linux. This gives me bad flashbacks of Powershell where exactly what you describe is the case, where the simple is made difficult and the difficult made impossible.

The overengineered, bloated monstrosity that was CORBA also springs to mind. Systemd has been one of the worst things that have happened to Linux in recent years.

vidarh · on Aug 24, 2017

While I can agree about some of the commands, they're still far better than what they replaced for the most part, and easy enough to wrap with aliases or little helpers if you prefer.

That said, the problem with "systemctl ls" would be that systemctl has commands to list multiple things things: units, socket, timers, dependencies, unit files, virtual machines, jobs.

I agree it's annoying that the output isn't formatted in more parsing friendly ways, though. Especially because there e.g. is an option to format the journal output in systemctl status as json -- why couldn't they do that for the main output too (indeed, I wish all tools had an option for that)

jjoonathan · on Aug 23, 2017

"Clean, sparse, intuitive, and discoverable"? I have yet to meet a single non-broken production init script that merits a single one of those adjectives.

api · on Aug 23, 2017

That's exactly my point. Systemd had a chance to replace the arcane cruft of sysvinit with something clean and well designed, but instead they developed something almost as arcane and unusable.

It's pointless arcana to boot. There is no reason whatsoever that a green field implementation of something so straightforward needs to be so obtuse.

If the complexity of interface exceeds the complexity of the information it needs to take and provide, it's bad design. If the command or UI structure uses arcane terms when straightforward terms exist, it's bad design.

e12e · on Aug 23, 2017

I've used plenty of init-scripts that are named after the daemon in question, support start/stop/reload/status and work fine. What more do you generally need?

vidarh · on Aug 24, 2017

The "work fine" part tends to break down horribly the moment something unexpected happens. E.g. pid files being left lying around and breaking restarts.

0xbadcafebee · on Aug 23, 2017

If you had that much trouble with init scripts, the problem wasn't the scripts.

jjoonathan · on Aug 23, 2017

Then why do all production init scripts for the simplest conceptual tasks always seem to be full of hacks (magic comments and files to declare dependencies, polling, cross-script communication, special cases, tons of comments referencing bugfixes to explain the crazy shit the script has to do, etc)?

0xbadcafebee · on Aug 23, 2017

Some of those are just things you find in any code (comments aren't bad), some of it refers to specific programs (there will always be 'special cases' if a program supports more modes of operation than "start" and "stop"), some of it is good (bugfixes are not bad), and sometimes init scripts are written by morons.

Sometimes the code is bad. But the model isn't bad. People sometimes use "init script" to refer to the tools that run and maintain network services, and those are completely different.

An init script boots your computer. A service runner is responsible for interfacing with an application and performing complex operations. They should not be confused, but often are, as Systemd conflates the two.

bkor · on Aug 23, 2017

The model of how the init scripts worked is terrible. There's no cross distribution bug fixing of those scripts. It's something added by the distribution. If there's an upstream script, it's often buggy. Or it didn't integrate with the 'speed init scripts up hacks' that various distributions were doing. Huge time sink for no particular gain.

bkor · on Aug 23, 2017

Those scripts were different per distribution. Sometimes shared, but then had to be copied basically manually from one distribution to the other. With systemd it's just a configuration file shipped by upstream. Any fixes automatically get reused by all distributions.

The init scripts differed way too much across distributions. This including pointless differences in where various files are located. All of that has become way more standard, thankfully.

I did have various issues with a few init scripts of my distribution. It didn't happen often, but nowadays you usually can do with a config file, way easier.

tdumitrescu · on Aug 23, 2017

Who doesn't love cleaning out stale pidfiles? That's why we get those big SV $$$.

Rjevski · on Aug 23, 2017

Yes sure, the problem is me for wanting to spend more time with my friends & family instead of pulling my hair out until 3am to make those damn initscripts work.

digi_owl · on Aug 23, 2017

And i see the PR team is already out in force to sell this and defend what has already been sold.

We should really just move to BSD already and let them sink this ship.