Hacker News new | past | comments | ask | show | jobs | submit login
DBus and Systemd (uyha.github.io)
73 points by thunderbong 4 months ago | hide | past | favorite | 80 comments



We need another variation of the rule: "Any sufficiently advanced desktop operating system contains an informal, buggy implementation of half of Smalltalk"

DBus, COM, various UNIX IPC mechanisms & co. are a lot of bureaucracy to do a very simple job: connecting two things and letting them communicate, without caring about the semantics of messaging, or protocols.

DBus is crucial to making a fully-featured desktop, but that's a lot of work to send some typed data to another process. Despite the complexity and ceremony, more and more thing depend on DBus because communication among disparate actors is such a useful concept.

We have spent the last 60 years in this field focused on honing methods of computation, while communication is something that's tacked on at the end. Now, deep in the Internet age, with fast-enough multicore networked machines, we need a lot of boilerplate to do something all our software depends on. [1] It should be the other way around: communication should be as easy, for the programmer, as adding two integers.

(I am working on this space, as I have been obsessed with this question for a long while.)

---

1: even a number-crunching monolith written in high-performance C++ needs to communicate its result to the user or some other remote service. How many lines of code to do that?


I quote Fowler's ancient First Law of Distributed Object Design: "don't distribute your objects".

Any distribution results in masked complexity and many hard to reason about scenarios even if the language or platform supports it transparently.

Ergo, just because you can doesn't mean you should, unless you want to discover them all, usually at an inconvenient point in the future when everything is broken and there's cash bleeding out of your org somewhere.


> don't distribute your objects

I guess people writing microservices or services connected via the Internet didn't get the memo.


The author of that rule explains that (well-crafted) microservices don't disobey that rule: https://martinfowler.com/articles/distributed-objects-micros...


I spend all day every day working with those.

I have yet to find a single microservices architecture that could not be solved cheaper and simpler in some other way with less side effects. Plus there are some disparity between APIs and object distribution.

(Note I only work on non-google scale stuff where it is probably appropriate)

Edit: good example recently, I saw one where they implemented a complex distributed reservation pattern across microservices that gets about 100 transactions a day. They could have done it in one process with proper transaction isolation in a DB fine for about 1/100th of the total cost.


Honestly the rule I put on it nowadays is, if it doesn't have a logical boundary within the organization then it doesn't need to be a separate service.

If you have 6 microservices and they're all being maintained by the same team of developers, you don't need 6 microservices. At most, I would grant maybe 2 - and that's simply because usually once something like payments and credit card handling get involved, for compliance reasons you want to really not touch it unless you have to - but that's still an organization boundary, just a self-imposed/PCI compliance based one.


sounds as bit like Conway's Law


Conway's law


Microservices aren't about distributed objects, they're about small services which communicate via explicit IPC (such as HTTP or gRPC).


I think there are many problems with D-Bus, and there are better ways to send data to other processes. It is a useful feature, but I have different ideas about how it would be done, which I will describe below.

There is no bus names, interface names, and object path. Capabilities are used, and the program refers to the capabilities it has access to, to do any kind of I/O (and, the capabilities can be used for security, too).

The kernel does not know about data types; each message is a sequence of bytes and/or capabilities.

However, there is common conventions (which are the same for all computer types, so that programs on different computers and programs being emulated, can still communicate with each other) for data types.

It does not use Unicode, so there are no Unicode types (the main character code is Extended TRON Code, although 8-bit characters are also possible, including strings of arbitrary bytes/capabilities which can be used for any purpose).

It does not need the marshaling, authentication, etc like D-Bus uses. Other features of D-Bus also are not needed, and can be made simpler and faster.

The Command, Automation, and Query Language would make communication almost as easy as adding numbers together. (It can never be quite as easy (unless adding numbers is difficult), because communication is just a thing that is not as easy, anyways; but, it can be made almost easy.)

Actually, it is like a actor model in some ways.

(The above ideas are a part of an operating system design, which does not yet have a name, at this time.)

> (I am working on this space, as I have been obsessed with this question for a long while.)

What is your work on this space?

> even a number-crunching monolith written in high-performance C++ needs to communicate its result to the user or some other remote service. How many lines of code to do that?

I would think it depends what you need to communicate. Sometimes, I would expect, printf would do.


> communication should be as easy, for the programmer, as adding two integers

I agree wholeheartedly. But the real issue is that Linux doesn't have rich communication primitives (compared to, say, the NT kernel), so we've had to implement those in userspace, and so here we are with systemd and dbus being intertwined with each other and the rest of the desktop. It could have been better if the kernel provided a message bus or at least some primitives.


Even with richer kernel primitives, Apple, Google and Microsoft have learned that user space with process separation for IPC and OS extensions, is a much saner way than expecting developers to write code that doesn't randomly crash their hosts, including the kernel.


You might want to look at how Apple's platform does it. If you use the high level API it's pretty straightforward.

https://developer.apple.com/library/archive/documentation/Ma...


(Not to be confused with the OG Mach messages!)

https://dennisbabkin.com/blog/?t=interprocess-communication-...


You mean OS/2 with Visual Age for Smalltalk and SOM?

It was great, SOM even supported meta-classes, but sadly we all know how it went down.


> even a number-crunching monolith written in high-performance C++ needs to communicate its result to the user or some other remote service. How many lines of code to do that?

I usually output json if the output is hierarchical, or some simple size+block format if it is just fixed size column data.


> We need another variation of the rule: "Any sufficiently advanced desktop operating system contains an informal, buggy implementation of half of Smalltalk"

How about: any sufficiently advanced desktop OS eventually turns into a buggy microservices system.


Frankly, I completely disagree. There was a lot of research into distributed objects back in the 80s/90s; languages which allow objects to seamlessly move between nodes. And there's a very good reason IMO why that never took off. Communication between processes is always fallible and subject to blocking indefinitely, and when such communication happens implicitly and all over the place, the result is a disaster. IPC must be explicit, not implicit.

IPC must also be language-agnostic, which means serialization and deserialization, which means there will always be some manual "impedance matching" between your program's domain model and the IPC protocol's messages.

Could things be better? Sure, yeah, I guess. Languages with a single implicit event loop and with native support for JSON-like trees of values (such as JavaScript) will have an easier time than languages with neither (such as C++). But I don't think there's really some magical 10x better solution out there if only we had all "seen the light" and used SmallTalk.


> Communication between processes is always fallible and subject to blocking indefinitely

You speak as if you have never heard of Carl Hewitt's research, Erlang or Elixir before.


Or Java RMI, Jini, .NET Remoting, Tcl agents, or many others with similar approaches.


You mentioned SmallTalk, not Erlang.

I wasn't aware that Erlang did IPC implicitly.


Erlang puts communication across actors/processes at its core mechanism. Sending a message is as simple as adding two numbers. It is closer to Smalltalk than a lot of other programming paradigms.

The real issue is that Erlang is its own world. But you could also say IPC is its own world (does not work natively on the Internet, does not work on non-UNIX machines, etc.)


Erlang is being sneaky because an Erlang "process" is not the same thing as what the rest of call us a "process". As soon as Erlang needs to talk across real processes is runs into all the same problems. Now, admittedly, Erlang has a nice supervisor toolkit to help with this, but its still a real problem, even for Erlang.


"Real processes".

The point is UNIX is too low-level for communication primitives. Maybe it's time to research new paradigms? It's been 60 years.


Okay but then you're talking about designing a completely new kind of kernel which has a completely different concept of what a "process" is, which is quite far away from "we should've just used SmallTalk".

Besides, no matter how you design the system, there will be a categorical difference between communication which is expected to be fallible and communication which is expected to be infallible. The "communication" between a caller and a callee is expected to be infallible, for example. Communication between threads in a process (in the UNIX and Windows model) is expected to be infallible.

Communication between an app and a system service is expected to be fallible; at the very least, the system service has to expect that the app might misbehave in any number of ways, even if you argue that the app should expect the system service to be perfect.


Depends pretty much how those threads were started.

In Windows there are fun things like COM apartments, owner module scopes, local RPC, wrapping in-proc COM into DCOM remoting.

Apple and Google have similar local RPC features coupled with multithtreading.


I don't understand what you think depends on how threads were started.


An HN comment is not big enough to explain OS specific threading models across component systems, and dynamic libraries.


I wish DBus used a pseudo-filesystem approach with plaintext commands in a Plan9 manner - so you could call DBus object methods just by opening a file and writing/reading from it.

There are probably reasons why DBus is implemented the way it is, but it just doesn't feel elegant at all.


Except Linux hasn't really done this for a while. Most things in Linux are managed via some type of message parsing these days anyway - i.e. you don't configure devices - really - by /dev/ entries, instead you use the /dev/ entry as the target for an ioctl call which basically does whatever it wants. It's not really file-like at all.

Then there's network which is configured usefully by netlink, which is another message passing protocol.

It doesn't seem like an accident that we keep reinventing socket-based messaging protocols to manage systems, and just "forgetting" to do it the Plan9 way.


From client perspective, socket I/O is isomorphic to the file I/O - you get a file descriptor and you read/recv and write/send to it. The only difference is how servers handle it.

The ioctl() is different, mostly because it's used to share memory between kernel and userspace, but the same communication pattern could be implemented in file I/O too, probably with performance penalty. Just send the serialized structures, and receive them back.

But the goal of DBus was never performance, it was to provide a unified interface to global system services. File I/O here seems like an obvious choice - it's the most fundamental interface that almost every program in the world supports.


Plan9 doesn't use plain text commands really. Look at how various services work and you'll find there's usually a C library wrapping the underlying pseudo-file. Plan9 was IMHO to a large extent a failed experiment; the filesystem API wasn't extended enough to be able to express enough operations (e.g. no transactions).

The other problem with Plan9 style designs it the concept of filesystem servers. This kills off a lot more expressiveness still e.g. you can't move device files around to organize them better because they aren't really files and the layout is fixed by the server. A dbus that exported stuff via the FS would have the same problem. It'd be full of "files" that you can't rename, move, delete, add xattrs to, read, etc. Most of the operations you can do on files just wouldn't work so what's the point.


Would you prefer DBus to be implemented in the kernel then, or would you think that the DBus daemon should expose that pseudo-filesystem using FUSE? Neither of those things seems completely optimal to me to be honest.

And it somehow feels very non-Linux that an open syscall or a readdir syscall should be able to block indefinitely as the desktop environment shows a permission syscall...


I wouldn't personally put dbus in-kernel, though if you were going to then a filesystem does strike me as the least-bad interface. But why not FUSE? It shouldn't be performance-sensitive, which is the usual problem.


> Would you prefer DBus to be implemented in the kernel

Wasn't Greg KH himself trying to explore this? I think it was called Bus1.


kdbus was rejected 10 years ago by Linus as a penalty for misbehaviour of the main developer. Here's the exact Linus rant: https://lore.kernel.org/lkml/CA+55aFzCGQ-jk8ar4tiQEHCUoOPQzr...


My understanding is that kdbus was mostly a new transport mechanism, so that the dbus daemon didn't have to implement all of dbus's features on top of unix sockets? I don't think there was an attempt to put the whole daemon in kernel space, and there certainly wasn't talk of exposing DBus as a VFS AFAIK

Though I haven't read about kdbus in a long time, so I might be misremembering or have misunderstood something back when I read about it.

Regardless, I was asking what bheadmaster would've preferred, and if the answer is "the dbus daemon should have been implemented in kernel space and that's what kdbus was trying to do" then that's fine.


kdbus etc. was all about making "faster" dbus back when the original dbus-daemon implementation was "too slow (for our corporate sponsors[1])" so there was unresearched push to put dbus into kernel. Involved special things, including even new syscalls (some I think even got refactored out and actually became part of the ABI).

Ultimately it got nuked because the code was considered problematic, and performance turned out to not require kernel code.

[1] AFAIK mainly automotive companies that were porting, for some reason, QNX IPC over to D-Bus instead of updating a pre-existing "QNX IPC on Linux" code, and I think Binder didn't surface fast enough for this porting effort.


No, performance wasn't the main reason (although that was a big part), it was mainly about race conditions. If userspace implements the bus, then only the half of userspace that comes up after the daemon can use it. Having a reliable IPC mechanism that is available from the moment PID1 is started was the goal.

Sadly that was rejected, and Bus1 was too, because apparently it's really bad to have IPC primitives in the kernel, can't have that. Then of course 6 months later Google showed up with a few wheelbarrows of cash and Binder, and strangely all such objections evaporated and it was merged without a peep. Funny how these things go.


Binder predated (significantly![1] Depending on how you count, it predated D-Bus itself[2]) both Bus1 and kdbus projects, and unlike either of them included some extra features for more efficient IPC (like mentioned by me elsewhere ability to pass CPU quanta from caller to callee and back).

Making it available from before init formed properly was, quite possibly, considerable part of why it never got merged. Kernel team had spent significant amount of work pushing out things to userland for cases like configuration et al. There's possible issue of applications failing when non-kernel-managed resource goes away, but honestly that's going to be a problem even for handling loss of service on the other end of the bus, too.

Funny thing, IIRC, ultimately both kdbus and bus1 had same userland requirements as binder does, which is userland component to set it up and IIRC provide management information too.

[1] Binder was mainlined (as experimental, but present in mainline) code in 2012, and made it into stable in 2015. kdbus started unnanounced development the same year, and was proposed for inclusion in 2014

[2] OpenBinder which became Android Binder had 1.0 release in 2005, a year before D-Bus' initial release. And that's not counting original Binder, which shipped for the first time to public in 1995


By "making it available" it doesn't necessarily mean a fully-working, configured broker, but only having enough working primitives that you can bring up userspace without races, with requests that can immediately be accepted, even if not answered until a bit later. The stuff that we currently just cannot do with D-Bus, and that we have to reinvent protocols for.

> Binder predated both Bus1 and kdbus projects

It's not about when they started, it's about when they got merged/rejected. One got rejected because "ipc in kernel is bad", and some months later the other was merged because "ipc in kernel is good". One rule for thee...


That's my point: Binder got merged before kdbus started development.

Also, I still don't see what kind of races you're getting other than having an init system too stupid to handle ordering for applications too stupid to handle graceful connections. (Looking at last version of kdbus docs, the main coherent argument about races is relevant only to how Unix sockets can't pass around authenticated metadata along the message)


> That's my point: Binder got merged before kdbus started development.

It didn't, it was merged in 2015, a year after or so. No, staging doesn't count, it's a dumping ground for all sort of things that do not see the light of day.

> Also, I still don't see what kind of races

Just because you don't see it, it doesn't mean it's not there. Do some research and you'll see it.


> Just because you don't see it, it doesn't mean it's not there. Do some research and you'll see it.

You're the one claiming there are races when starting services, so it's your responsibility to justify that claim.

You've mentioned before that "If userspace implements the bus, then only the half of userspace that comes up after the daemon can use it", but any service manager that supports service dependencies could be configured to run the bus first, and only run the rest of the userspace after the bus is up.


What a genius idea, how bizarre that nobody ever thought about that before! We just need to get the bus dependencies up first and then... oh. Oh wait. Oh no.


> It didn't, it was merged in 2015, a year after or so. No, staging doesn't count, it's a dumping ground for all sort of things that do not see the light of day.

kdbus didn't even make it into staging. Project mainline put in some serious work to get where it got.

And quite probably part of it going better was not insisting on becoming mandatory solution for everyone. If anything, it might have been less "wheelbarrows of cash" and more conflicts involving kdbus principal developer and Linus.

> Just because you don't see it, it doesn't mean it's not there. Do some research and you'll see it.

The kdbus docs could do better job declaring why it's needed then.


> kdbus didn't even make it into staging. Project mainline put in some serious work to get where it got. > > And quite probably part of it going better was not insisting on becoming mandatory solution for everyone. If anything, it might have been less "wheelbarrows of cash" and more conflicts involving kdbus principal developer and Linus.

You mean, because it would actually be _used_ somewhere in open source distributions (that is, not just deep inside some de-facto proprietary half-closed-source fork owned by a single mega-corp)? Yeah when things are hidden away in some cupboard in the basement it's much easier not to ruffle feathers. But yeah the LKML is bad today, but back then it was a veritable open-air cesspool

> The kdbus docs could do better job declaring why it's needed then.

Well, it's dead, so what's the point...


I mean it would be forced down the throat, which wasn't making proponents any friends when their other actions culminated in Red Hat being temporarily banned from having any code merged into kernel.

As for documentation...

Maybe that's part of why it's dead and buried.


I guess it would be interesting to implement this for DBUS using FUSE.


It's the unix way /s

COM (dbus) + Registry & services (systemd)


While you are joking, COM was born out of the UNIX way, more precisely Distributed Computing Environment.

https://en.wikipedia.org/wiki/Distributed_Computing_Environm...

Then we had that great experience called Taligent

https://en.wikipedia.org/wiki/Taligent

And naturally CORBA,

https://en.wikipedia.org/wiki/Common_Object_Request_Broker_A...

NeXTSTEP's Portable Distributed Objects

https://en.wikipedia.org/wiki/Portable_Distributed_Objects

Which inspired Sun's Distributed Objects Everywhere written in Objective-C, followed by its reboot in Java as Enterprise JavaBeans.

https://en.wikipedia.org/wiki/Distributed_Objects_Everywhere


Worth pointing out that there's now a second bus in systemd, varlink. https://www.freedesktop.org/software/systemd/man/latest/varl... https://varlink.org/

Varlink is a bit simpler, I'd say. It's more about individual services opening their own listening sockets, although there is an optional broker too. This somewhat simplifies the addressing model, a little. Messages are in JSON.

More and more of the systemd internals are being written or rewritten to rely on varlink for IPC, and the scuttlebutt is that eventually there's a goal to be able to run dbus free.


D-Bus will never be removed. Varlink was added because the kernel refused to provide usable IPC primitives, like other OSes have, so a brokered IPC is simply unfeasible in early userspace. So now we have two IPCs that will work in parallel, forever, because such is life on Linux.


It would be hella interesting to me to see internal uses of dbus vs varlink. I don't think the proposition was that dbus is removed or goes away (it's the standard FreeDesktop IPC after all), but the scuttlebutt I ran into was that there was an intent to be able to run without it. That a large part of the systemd mono-repo ought be able to do what it does without it, and especially critical & especially especially boot critical systems. It would be interesting to see & weigh over time which parts have support for which IPC, to see if varlink is indeed growing or converging.

You are 100% dead on, btw; thank you for saying! The overwhelming reason for varlink adoption is early boot, where dbus isn't up yet. That's a crucial fact here.


I think that using JSON is rather inefficient and restrictive compared with using a binary format.


Using complex custom binary encodings is an inefficient use of human's time.

It's error prone. The constraints are well known & optimized for. The binary protocol will also have people implementing their own, poorly. Where-as one can get piping hot SIMD optimized JSON encoder/decoders.

I'm not 100% JSON all the time. But this response strikes me as crufty neckbeardy shit that ignores the obvious & good & uses negativity to step on a pretty clear & straightforward path. Having use d-feet & busctl dozens of times in my life, it's miserable. The encoding scheme is terrible & unpleasant. JSON is an obvious counter reaction, to try to make some damned sense. Given that these bus capabilities are more control plane than data plane (but it's not a hard line), it's ok.


A binary format does not need to be complicated, and does not need to be the same as D-Bus (which has many problems). Furthermore, it can avoid some of the things in JSON, e.g. requiring escaping, requiring Unicode, not properly supporting 64-bit integers, embedded binary data cannot be made directly, etc.


Sure yeah dude, in far off fantasy land it can all be perfect. There's no tradeoffs up in the clouds.

Strikes me as a super bad attitude walking around with your nose way up in the air, leaving these farts behind you as you go. These comments aren't really couched in reality as far as I can see. They don't point anywhere specific, they're just broad swideswipes.

Here in the real world we keep reusing maybe so so protocols because they are scrutable & well implemented & we humans have lots of experience and tools using them.


I learned a bit about dbus when I was scripting the controller of my home battery (victron energy, I use their "venus os" linux on a raspberry pi). I like the discoverability of dbus. One can just query the bus and talk with all registered services, using any client.

But it also seems to have significant flaws. One is that there seems not to be any valid timeout/keepalive concept for dbus over TCP. If a registered component is disconnected without gracefully signing off, the resource keeps registered for a long time and cannot easily replaced if the client re-connects after a while.


This always feels like DBus=DCOM, systemd=Service manager, journald=Event log.

And that makes me feel a little dirty.


Hating something just because Microsoft did something like it at some point is silly.

IMO all of those are absolutely excellent ideas. Like journald is much better than what came before it, and DBus is much better than poking processes with signals, or having to talk some special protocol to each particular service.


It's not the vendor. They were incredibly stateful and opaque which caused numerous problems.


If it makes you more clean you can use instead:

DBus=CORBA(remember Bonobo and Orbit), systemd=Service Management Facility, journald=rsyslogd.


I wonder what happened to Bus1 [1] and other kdbus type proposals. Would be useful to have an android binder like ipc subsystem.

1. https://lwn.net/Articles/697191/ 2. https://lwn.net/Articles/580194/


Binder is honestly much better than D-Bus, but for various reasons I don't think we're going to see it much on desktop (though you can, in fact, use it there - and it originates from exactly such use).

The kdbus finally died when Linus sat down and proved that:

1) then-current userspace d-bus code was simply horrible

which led to kdbus being mainly about speed/latency, but...

2) Linus spent a weekend and got an userspace dbus server 80/20 solution that achieved better performance than kdbus

and

3) kdbus introduced no real new benefits that weren't ultimately triggered by "dbus is too slow", unlike Binder which involves fancy scheduling tricks to pass CPU quanta from one process to another along with RPC call.

Thus the final nail in coffin of kdbus (which was previously hampered by being considered too problematic to merge) was hammered in.


kdbus/bus1 were about making IPC primitives available race-free to userspace since the very first moment it is started. Perf was icing on the cake. Binder is Google-only, so nobody else can really use it, given they might kill it at any moment.

The end result of not having bus1 is that we have to forget about brokered IPC and go back to 1:1 bespoke protocols over af_unix - ie, varlink


Binder has been mainlined since a bit before kdbus project started, and can be in fact utilized with alternate implementations, as the kernel interface is explicitly not married to any specific content of the IPC.


Linux did get memfd, which was originally a component of kdbus.

kdbus seems to have failed because of arguments over attaching process capabilities to messages: https://lwn.net/Articles/641275/ And most if not all of the performance improvements were later realized with an improved user space D-Bus implementation (sd-bus?).

bus1 seems to have followed a similar trajectory--what remains of the effort is a better D-Bus implementation in user space: https://github.com/bus1/dbus-broker


Linux lacks any equivalent of Apple's code signing and entitlements system, and as the kernel devs point out that leads to problems with making a proper kernel IPC system. The right fix would have been to do what Apple did and work out a namespaced and extensible system for capabilities/entitlements, but they never did it.


What about 'standalone' dbus, without systemd?

Is it even possible anymore to build dbus without systemd as a dependency?


A lot of Linux distributions has moved away from the dbus implementation that was merged into systemd to the `dbus-broker` project which has an optional dependency on systemd.

So the answer is "yes", alternative projects exists.


There is an alternative implementation called dbus-broker.

Even found an issue on running it without systemd https://github.com/bus1/dbus-broker/issues/183


One of the ideas behind dbus (and one of the ideas behind systemd! They're sorta linked in this way) is that services should be able to activate on request, like httpd but for local IPC. You could make a stand-alone dbus, but you'd have to either give up on socket activation, or re-implement half of the systemd service manager to make the dbus server responsible for spawning socket activated daemons.

Both of those are totally possible and would probably be interesting for non-Linux systems.


It's also the case that there exist non-systemd based linuxes.

I mean sure, the self-start is important feature, but I'd rather have a dbus with which I need to start services manually, then have no dbus at all :(

Also, why the fuck can't systemd run in chroot.


Probably because chroot does not expose some sort of low-level knob that systemd relies on to work. They had to make their own implementation of that in systemd-nspawn, have a look at that.


Yes, but to use systemd-nspawn, your host must be running systemd to begin with.


Because of that deep integration with systemd-boot, of course. /s

(Yes, I know systemd at this point is just an umbrella brand; I'm no systemd hater.)


I use dbus on Alpine Linux which doesn't have systemd, so empirically yes it's possible to do that.


There is as well ubus from openwrt for inter process communication.

https://openwrt.org/docs/techref/ubus


Every application made with dbus in mind seems to break when I run it remotely.

For example, "eog" (eye of gnome) always either breaks or is incredibly slow.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: