I wish systemd logged information about the source of “transactions”

TimWolla · on Nov 24, 2021

A peer of mine was exiting their SSH sessions with 'exit'. One time apparently they already typed 'systemctl', probably in an attempt to check the status of a service, changed their mind and then later wanted to close the session using 'exit', actually executing 'systemctl exit'. This translated into a shutdown of the machine in question.

After being able to piece together what happened with the machine's logs and the bash history I recommended to simply exit all programs/sessions with Ctrl+D. It works almost everywhere and would have prevented this exact issue.

lathiat · on Nov 24, 2021

I can do one better for exit. The Solaris kernel debugger KDB can be used at runtime for inspecting some stats and also to change some global configurable variables.

For whatever reason if you type a variable/symbol it assigns it the value 0. If you type exit nothing happens immediately but as soon as the next process exits usually a few seconds to 10s of seconds later the entire system kernel panics on a null pointer de-reference.

Kernel paniced our production ZFS filer twice before I cottoned on. Newer releases special cased “exit” not to do that.

heavenlyblue · on Nov 24, 2021

That sounds like the most amazing UX I have ever encountered :)

For those familiar with Solaris, is there any reason they did it this way?

How can you possibly set the default behaviour of assigning any symbol a value of 0 by default?

zaarn · on Nov 24, 2021

Solaris is built on some idiosyncrasies (Speaking as someone who, as a Linux person, has to sysadmin Solaris systems).

For example, on our Solaris machines, after install, "reboot" does not do a clean reboot, it's a hard reset. If you want a clean reboot you need "init 6". Same story for "shutdown" and "init 5".

"killall" also kills all processes. Not the one you specified. Or more specifically it SIGKILLs all processes that have open files (ssh session and server go byebye). If you type "reboot" or "shutdown", this is in fact the binary that gets called do that.

Sadly, Solaris is also one of the few systems that support the NFSv4 ACLs (Linux supports NFSv4 but not the ACL Extension, TrueNAS has a patch for that).

ibotty · on Nov 26, 2021

Doesn't Ganesha support NFSv4 ACLs?

zaarn · on Nov 26, 2021

On Linux only for non-Filesystem backends (Ceph, Gluster, etc.)

The only fork of Ganesha that supports it on a proper filesystem is the TrueNAS fork and that only on their ZFS Fork that brings NFSv4 ACLs to Linux.

So really, no. You can't use NFSv4 ACLs with Ganesha on Linux outside of Forks or using scale-out data stores.

gpderetta · on Nov 24, 2021

That's of course due to the principle of maximal astonishment, a time honoured software design law.

saagarjha · on Nov 24, 2021

Hey, it’s better than what C does for its variables ;)

ChuckMcM · on Nov 24, 2021

because adb(1) did it that way.

cyberpunk · on Nov 24, 2021

Remember how on sparc machines if you turned off your laptop with the console cable connected it powers the servers down too?

That was a fun lesson…

zorr · on Nov 24, 2021

Years ago I had the habit of shutting down my laptop with "sudo shutdown -h now" at the end of the workday. Until one day I did that accidentally in a live SSH session.

Since then I always shut down my machine using the GUI and I have Tmux configured with different colors for SSH sessions.

MattConfluence · on Nov 24, 2021

One time I was using my laptop to remote desktop into a computer I was hundreds of km away from physically, I discovered that if I hit the power key on the laptop (yes, the laptop had a key on the keyboard for power, not a separate power button) while having the remote session window focused, it sent the signal over the wire and put the remote host to sleep instead. Whoops.

After that I looked up how to enable wake-on-lan and open up a port to be able to do that remotely.

ahartmetz · on Nov 24, 2021

I don't have the best experience with Wake-on-LAN, it stopped working after a BIOS update or something. So now my desktop is set to turn on when power comes back and it's connected to a WiFi switchable power socket. I bought one with the usual app/cloud rubbish and flashed Tasmota on it.

iudqnolq · on Nov 25, 2021

I have some smart plugs I was planning to hack the network protocol of. I looked up Tasmota but I couldn't figure out how I could see if it would work with the random crap I bought on AliExpress and if so how to do it. Any advice?

ahartmetz · on Nov 25, 2021

Well. I went the other way, I bought known compatible devices. In my case the inofficial name is "OBI socket 2", from the (German) OBI home improvement store. It's about 10€ a piece.

That said, if your device is based on some kind of Espressif ESP32 module, you might be able to find the right four pins on the circuit board and find or cobble together a configuration to talk to the I/O ports. Hardware required is a (usually USB) RS232 interface at 3.3 volts, some medium-thin cables, screwdriver, soldering iron, probably multimeter to check things. The firmware flashing and WiFi setup are fairly independent of the I/O port configuration, so you can flash something that can bring up the WiFi connection and web interface and experiment from there.

thanatos519 · on Nov 24, 2021

There's a package called molly-guard which can help with that.

qwertox · on Nov 24, 2021

It's a must on my systems. After I once rebooted the wrong server accidentally, I now force myself to go through the pain of confirming the hostname of the machine.

Also, it's possible to circumvent it when you have scripts that need to reboot the machine without interaction by issuing `reboot </dev/null` in the script.

yencabulator · on Nov 24, 2021

I've done the opposite, back in the day. I used server A's keyboard & monitor to SSH to admin server B. When I was done with the system upgrade on server B, I rebooted it with control-alt-del. Except I rebooted server A.

ASalazarMX · on Nov 24, 2021

Happened to me years ago. Fortunately that server had a power saving bug that most times rebooted it instead of powering it down.

iforgotpassword · on Nov 24, 2021

Almost entirely unrelated, but lately I worked with old photos that had missing, incomplete or wrong exif data. While trying to assess and automate fixing the collection via scripts, I used the command line utility "exif" a lot. You can't imagine how many times I typed "exit somefile.jpg" and flinched when the terminal window just closed. Guess I should just have created an alias but that's like resigning to your own stupidity. ;-)

mike_hock · on Nov 24, 2021

Also, spam Ctrl+C if you're gonna issue a new command after being AFK for a while.

Denvercoder9 · on Nov 24, 2021

I recommend not using root privileges unnecessarily. `systemctl status` and most other querying commands don't need root, while the dangerous things like `systemctl exit` do.

usr1106 · on Nov 24, 2021

Sure, but some systems do poweroff without being root. Has happened to me, don't remember the details. Maybe a polkit thing because users are supposed to shut down their own laptop?

CraigJPerry · on Nov 24, 2021

Yeah true, if the user is logged on via a physical tty or local X session (i.e. the policykit subject.local attribute == true) then in some distros they will get permission to shutdown or reboot.

They won’t have the permission if connected remotely though.

TimWolla · on Nov 24, 2021

I can't say what they had in mind when typing `systemctl`. Even they couldn't. Because of the delay in the shutdown to cleanly stop the services, they had already forgotten that they just exited a SSH session and thus the connection between the machine being dead and typing `exit` was not obvious.

Maybe it was `systemctl status`. Maybe it was intended to be a `reload` (which would require elevated privileges).

AceJohnny2 · on Nov 24, 2021

GP's point was more that you shouldn't be able to casually do "systemctl exit" in a standard shell session. All privileged operations should require a sudo. One might be tempted to just do a full "sudo shell" to perform all systemctl operations, but GP's point is that many of the "observation" actions don't require sudo in the first place!

In the end, the user being able to accidentally run "systemctl exit" may be indicative of a policy issue (ie don't allow root logins).

iso1631 · on Nov 24, 2021

All sudo actions on my machines are logged to a remote syslog server (and locally to /var/log/auth.log, which is rotated, compressed, and kept far longer than other logs). That certainly used to be standard. You can't log on as root, you have to log on as your own user and elevate to root (even if that's all you do with sudo), so there's a trail there.

This article suggests there are ways for programs to issue unaudited commands with elevated privileges to systemd.

"Systemd has a D-Bus interface that people can use, there's hardware events that may trigger a reboot, there are various programs that may decide to ask systemd to reboot the system, and under some circumstances systemd itself can decide that a particular, harmless looking process failure or 'systemctl' transaction actually will trigger a reboot through some weird chain of dependencies and systemd unit settings"

Complaining about systemd is as old as systemd, and borders on a religious war, a proxy towards "modern linux" and "old school unix" methods.

Rapzid · on Nov 24, 2021

But may not be. Maybe the policy is fine and changing it based on one machine shutdown is not worth the costs.

usr1106 · on Nov 24, 2021

I did not even know that systemctl exit exists. Tye man page says it's equivalent to poweroff (for the system manager not running in a container, i.e. the most common case).

Having 2 alternative commands for the same functionality is not a good design decision IMHO. But not the most central design decision for systemd.

dharmab · on Nov 24, 2021

there are already multiple commands for turning the computer off without systemd anyway

genewitch · on Nov 24, 2021

On average, systems are poo. The median modes around.

I prefer openRC, and I'll wave a flag or whatever but to each their own.

WinNT4 was goat.

danesparza · on Nov 24, 2021

... eyeing you suspiciously and backing away ...

ungamed · on Nov 24, 2021

I'll just get off your lawn.

corney91 · on Nov 24, 2021

Ctrl+D is great, but will still run a command if it's been typed out so I always Ctrl+C and then Ctrl+D.

TimWolla · on Nov 24, 2021

It does not for me. If the current command line is not completely empty it will not do anything (both with bash and fish). So if the terminal does not close after pressing Ctrl+D I will know that something is wrong and check more carefully.

koheripbal · on Nov 24, 2021

What's wrong with just closing the putty window?

dec0dedab0de · on Nov 24, 2021

Maybe they're not using a GUI ssh client, and wanted to return to their local shell.

gruez · on Nov 24, 2021

alt-f4 is slightly more efforts to do (requires hand contortions) comapred to writing out a word and typing enter. Also, control-D is easier.

dharmab · on Nov 24, 2021

ctrl+D/exit doesn't close the window in all cases. eg if you're sshed to a remote it returns you to local.

ploek · on Nov 24, 2021

Similarly I discovered yesterday that the systemd service definition for auditd includes the `RefuseManualStop` option for this exact reason. When stopping (and thus also when restarting) the service via systemd, auditd is unable to log who shut it down, so it just disallows being stopped. (https://linux-audit.redhat.narkive.com/3weoVaZE/rational-beh...)

The workaround is to use the service command instead. Manually I usually do that anyway, muscle memory etc. But Ansible's service module will default to systemctl if it finds systemd. So there I had to add a "use: service".

OJFord · on Nov 24, 2021

It seems odd to me to go to the effort to tweet & blog about 'wishing' for something like this, but not to open a GitHub issue requesting it: https://github.com/systemd/systemd/issues

(At least, I can't find it searching there, and OP doesn't link one, so I assume there isn't one.)

dijit · on Nov 24, 2021

I'm probably going to get a lot of hate for saying this; because anything anti-systemd tends to attract weird people who claim that you want bash scripts back but: Making issues on systemd's github page is often met with stoicism and reluctance on the part of the systemd developers.

There's countless examples of the systemd maintainers refusing to fix bugs or acknowledge that bugs exist.

Making a public statement could be more effective.

OJFord · on Nov 25, 2021

Maybe, personally though (if that were my experience and I was OP) I'd try an issue first, and then if it was shot down I could tweet & blog 'unfortunately, doesn't look like it's going to happen' or whatever, referencing my attempt.

It just seems like they went to about the same amount of effort, but on something with a much lower (IMO) probability of being actioned. (Not least because if it were it'd probably go though roughly the same process anyway, such as through the issue now opened by a sibling comment to yours. Maybe PR without issue, but which OP could (skills permitting) also have done.)

dorianmariefr · on Nov 24, 2021

Good point, created one https://github.com/systemd/systemd/issues/21497 (I'm not the author though)

gunapologist99 · on Nov 24, 2021

Aside from the maintenance and uptime issues, OP is actually raising a very real security concern; often an attacker will reboot the machine to restart all of the logging processes or to load in a LKM. Knowing why, and when (and which process) just forced a reboot is a very real requirement.

Systemd also seemingly randomly attacks my processes, and it's almost impossible to actually figure out why. (At least the kernel OOM killer actually logs "Out of memory, killed this process."[0])

0. https://linuxwheel.com/oom-killer-explained/

usr1106 · on Nov 24, 2021

While generally the systemd documentation is pretty good (Start with reading the original blog series as a primer, later search for "systemd directives" using your favourite search engine) I have always found the transactions concept underdocumented. Does anyone have a good link?

Edit: Are transactions and jobs the same thing? Both are mentioned in the documentation here and there, without having an own page or chapter AFAIK.

egberts1 · on Nov 24, 2021

“Documentation” is “Pretty Good?

“Start with” the “blog”?

Clearly, the two does not connect well … for systemd.

jeppesen-io · on Nov 24, 2021

> ... as a primer

It's worked very well for me. Not seeing your disconnect here. Blog for concepts, man pages for specifics

noobermin · on Nov 24, 2021

I wish more developers wrote enough into the man page so you don't need to google things. More of bash's man page and less of i3's.

throw0101a · on Nov 24, 2021

Don't forget all the stuff in GNU's info pages.

badsectoracula · on Nov 24, 2021

I'm of the opinion that "info" is for long form guides and "man" is for short references.

(replace "info" with some better reader than the default one though)

egberts1 · on Nov 24, 2021

For instance, I wanted to know why my custom systemd.conf unit file is causing my custom daemon to restart whenever the Ethernet cord gets unplugged or its netdev goes offline.

usr1106 · on Nov 24, 2021

> Clearly, the two does not connect well … for systemd

It's linked from the man page. https://man7.org/linux/man-pages/man1/systemd.1.html

Unfortunately the reading will still need to do ourselves. Not trying to be snarky here, it happens to me all the time that I have not read something and complain it's hard to understand.

bosie · on Nov 24, 2021

I think what OP means is that having good documentation should not require you to read a blog of any sorts, even if it is linked in the man page. Either documentation is good OR you read the blog, you cannot have both.

jcelerier · on Nov 24, 2021

I disagree, for instance in a man page I don't want to read the rationale for something (which the blog provides), just the raw "if you do this, then that will happen".

That would have its place in a GNU info book instead.

zozbot234 · on Nov 24, 2021

Plenty of man pages have a dedicated RATIONALE section. You don't have to read it, but the stuff is documented.

usr1106 · on Nov 24, 2021

It's open source, you get what you pay for. Actually, you get much more, but you need to fill any gaps with your own efforts. Accessing the spread out documentation is one of them.

While they are not comparable entities I'd say systemd documentation is in better shape than Linux kernel documentation. (Not to negate the efforts of those who do work with kernel documentation, but to stress the huge areas of no or pretty obsolete documentation).

Of course you can buy Linux (incl. user space) from the commercial players. I have only worked in one small project in my career that did, but I did not notice that better documentation was worth paying for.

pmontra · on Nov 24, 2021

Paying for something doesn't guarantee good documentation. Example: I read many people complaining about Apple [1]

My take is that pay vs free software and good vs bad documentation are orthogonal and you can be in any of the four quadrants.

[1] https://www.reddit.com/r/swift/comments/ljl6bq/we_were_so_fr...

Sebb767 · on Nov 24, 2021

> My take is that pay vs free software and good vs bad documentation are orthogonal and you can be in any of the four quadrants.

It's definitely not independent. Few people enjoy writing documentation and even then, it takes a lot of time. Plus, writing good documentation is something you need to train for. Very few OS people want to spend their free time writing code and then spend just as much again for - usually boring - documentation and support tasks.

Companies have exactly the same problem, but they have the option to throw money at the problem. Sure, there are some OS projects with good documentation (usually sponsored by a company) and a lot of proprietary stuff without, but proprietary software usually has more financial backing and that's directly related to good documentation.

pmontra · on Nov 25, 2021

I generally agree with you but let me enumerate the FOSS projects that I use and that have good documentation.

Not explicitly company backed: Ruby, Python.

Backed by multiple companies: PostgreSQL, JavaScript.

Backed by one company: Ruby on Rails, Elixir and Phoenix, Nginx.

Don't know: Django, Apache Httpd.

Of course I might be wrong about the categories.

usr1106 · on Nov 24, 2021

> paying for something doesn't guarantee good documentation.

Isn't that what I said when referring to commercial Linux distros? Of course it would be easy to continue enumerating.

But morally it entitles you much more the complain if you pay and it's poor quality than if you just get it for free with no promises.

rkachowski · on Nov 24, 2021

The idea that good documentation requires payment is defeated by the huge amount of open source projects with excellent documentation.

usr1106 · on Nov 24, 2021

I did not say that every open source project produces insufficient documentation. But some central ones that are hard to avoid do. There complaining doesn't help, you just need to invest your efforts. Ideally you could contribute better documentation, but at least you have to make the effort to learn it for yourself.

mkipper · on Nov 24, 2021

I'm not familiar with the term transactions, but it doesn't sound like it's the same as jobs.

Jobs are externally visible. You can see them easily (e.g. systemctl list-jobs), and systemd provides an interface for them over D-Bus[1]. There's no similar interface for anything called a transaction.

From the documentation that does mention transactions, it sounds like transactions are internal to systemd. When systemd starts a unit, it works out the dependency graph and spins up a job for each unit that need to be started before the originally requested unit can start. That would all be considered a single transaction, but it might spin out into dozens of separate jobs that get queued up.

As an example, when systemd starts on boot, all it really wants to do is successfully reach some target (e.g. multi-user.target). systemd starts from there and works backwards, building a dependency graph with every single unit that needs to start up as a part of the boot sequence to reach that target. You could probably consider that a single transaction, but the full dependency graph would probably pull in hundreds of jobs.

I don't work on systemd or anything so this isn't canon.

[1] https://www.freedesktop.org/wiki/Software/systemd/dbus/

FriedrichN · on Nov 24, 2021

Is there a good resource for people who are used to non-systemd systems (something like a gotcha list)? I keep on running into weird situations where I end up finding out systemd is somehow responsible for my woes. Last time that happened was when I changed /etc/fstab but somehow old mounts kept on being remounted, I wasted an hour before I found out I had to reload some systemd service.

blueflow · on Nov 24, 2021

Related incident: "Systemd killing processes each minute at second 27" (German):

https://blog.uberspace.de/systemd-anzuenden/

Ono-Sendai · on Nov 24, 2021

Ran into a similar problem, trying to work out why systemd was stopping my service.

avian · on Nov 24, 2021

systemd just has so many reasons to kill processes, and invents new ones with new releases. Timeouts for things it thinks it should be short-lived, resource limits, service isolation and sandboxing settings, etc. They are mostly documented, but you need to know where to look. While I came to like some aspects of systemd, debugging why things die for apparently no reason after systemd was upgraded has eaten many of my workdays.

buserror · on Nov 24, 2021

New one as of this morning is to kill STOP processes it doesn't like. I have no idea why, or how to revent it. All I did was update my 'sid' debian and here goes

zokier · on Nov 24, 2021

> All I did was update my 'sid' debian and here goes

As it turns out it's called "unstable" for a reason

buserror · on Nov 24, 2021

Hate to tell you, but I've been using sid for well over 20 years, and it is usually more 'stable' than most distro out there.

AndyMcConachie · on Nov 24, 2021

systemd is why I love OpenBSD.

Datagenerator · on Nov 24, 2021

The acronym POLA translates to peace where systemd creates havoc and burns many hours globally having people investigate what the heck is going on in this almost binary blob of spaghetti.

ufo · on Nov 24, 2021

Does OpenBSD log what source caused the machine to reboot?

chasil · on Nov 24, 2021

Guess what?

https://www.itsfoss.net/initware-as-a-systemd-fork-on-openbs...

guilhas · on Nov 24, 2021

A very slim down fork would possibly not be that bad

gunapologist99 · on Nov 24, 2021

Don't forget Artix, Void, MX Linux, FreeBSD, Alpine, Gentoo/Funtoo..