Real-time notifications from systemd to Slack

lima · on March 13, 2017

Why is everyone okay with relying on Slack for sensitive communication or even business critical stuff like monitoring?

It lacks everything a good alerting system has (acknowledgements, fine-grained notifications...).

Even those addresses in the stacktraces could be sensitive in other situations since they contain ASLR offsets.

notheguyouthink · on March 13, 2017

What would you suggest? We're in need of an alerting platform and making use of Slack, not because it's flawless or anything, but because we're already paying for it and it's convenient for the mass junk. Ie, not strictly mission critical alerts.

What medium might be better than Slack?

(to be clear, i'm asking, not saying Slack is good for this)

amyjess · on March 13, 2017

Zabbix. It's open-source and pretty flexible. I maintain my company's Zabbix installation, and I've come to appreciate it (and 3.x finally has a non-ugly UI!).

lima · on March 13, 2017

We use Nagios + CheckMK, but I believe neither this nor Zabbix solve the alerting part.

Our solution is a custom notification broker that decides whom to alert and then waits for an acknowledgement. It uses different backends including our company chat.

Not complicated at all, just 100 lines of Python code that contain the business logic.

Anything that relies on a single medium is unsuitable for anything but unimportant alerts. What if Slack goes down for 2 hour? Unlikely, but definitely possible.

This ensures that every alert is explicitly acknowledged by someone, and that unimportant alerts are quickly forgotten without wondering whether someone handled them or not.

We have different applications sending alerts, not just Nagios (because Nagios sucks at processing events as opposed to states), and it would quickly become unmanageable without some sort of middleware.

daveguy · on March 13, 2017

Is it something where the business logic part could be easily abstracted / separated? It sounds like an interesting and useful yet simple tool. The open source community can always use more of those.

Edit: or maybe something like a blog post to describe the structural details.

lima · on March 13, 2017

Would need some clean-up but sure, why not. Ask again in a month or so :)

deathanatos · on March 13, 2017

Actually, the PagerDuty integration for Slack allows in-Slack acknowledgement (and/or resolution) of alerts.

(we used this in addition to, not in lieu of, the pagerduty dashboard itself, as well as mobile notification via pagerduty, and Nagio's web UI.)

> fine-grained notifications

what do you mean by this, out of curiosity?

lima · on March 13, 2017

With PagerDuty on top it's fine from an operational point of view (you'll still have to trust both PagerDuty and Slack to keep your data secure).

> fine-grained notifications

A common approach is having a "alerts" channel that grabs everyones attention, even though most alerts are only relevant for a small subset of people. A company-wide #general channel is bad enough, and an #alerts channel only makes it worse.

You can totally do this with Slack by having something like PagerDuty in-between. We use a custom-written alerting broker that makes it easier to correctly handle some of the more complicated cases but it's pretty much the same idea.

pavel_lishin · on March 13, 2017

If your company is large enough to have alerts that aren't universally applicable, it makes sense to create different channels for different alerts, grouped however you want.

lima · on March 13, 2017

Totally agree, it's exactly what we do. The broker sits in between and decide which alert goes where.

Xylakant · on March 13, 2017

For a previous setup we built a small script that attaches to dbus and just listens to the notifications and pushes interesting events to our logging service. Once you understand how dbus works, that's actually pretty neat and catches crashes as well as intentional status changes.

arjie · on March 13, 2017

Can anyone who's used systemd share what it's like to have it for process management? We use monit, and it's all right, but I occasionally wonder.

Does anyone run it also as a separate user to manage certain applications? So that you could have certain people log on and operate some things and others operate other things?

geocar · on March 13, 2017

Yes.

I used to use runit/daemontools/inittab. My favourite thing about systemd is that it is increasingly available, and while it has it's faults, it has instances (macros you can use to kickoff a fleet of services easily) and pretty good isolation features.

It also has a "systemctl-over-ssh" feature which is quite nice, and which allows you to use an .ssh/authorized_keys file instead of sudo to allow access to certain administrative tasks.

arjie · on March 13, 2017

Interesting. Thank you! Would you mind sharing some of the faults? By instances, are you referring to template unit files like this[0]? Do you use it with the `--user` option or do you just access it with `sudo systemctl`?

0: https://fedoramagazine.org/systemd-template-unit-files/

geocar · on March 13, 2017

Yes, I do use --user where possible. Because resource limits are per-user on UNIX, I tend to run every program and every instance of a service as a separate user. cgroups/containers make this less necessary, but old habits die hard.

The biggest systemd fault is one of tooling, and that just comes from a project ambitious enough to try and own it's own ecosystem.

When runit/daemontools, you "debug" a service by typing:

    /path/to/service/run

into the terminal. Everything is sensible: grep the output just works. You can use tools familiar like strace/dtruss to "look inside". A unix user can easily leverage their experiences to administer a small unix site, and grow their experiences into larger sites.

With systemd, you run a unit file by copying it into a directory and run some (magic) commands. You need training/internet search to learn those commands. If your unit file doesn't work, you need more training/internet search, but systemd is still so new that your best bet may be to read the systemd source code, or insert a hacky "sleep 30" at the top of your start script and try to race and strace it in another window. Stuff like that.

Want to upgrade your systemd unit? You can't run it along-side an existing version of itself unless you give it a new name, which changes how journalctl can pick up the results. Versioning in the unit name feels wrong, and nobody does this yet which still currently breaks live upgrades where the unit changes.

Eventually the tooling will get better, but then we'll have a way to read files, and away to read systemd files; we'll have a way to run programs, and we'll have a way to run systemd programs; we'll have a way to "test" units, and a side-by-side mode, and so on.

arjie · on March 13, 2017

Ah, hmm. Thank you very much for sharing. Looks like it's got some way before I can use it instead of `monit`. This helps a lot.

thwarted · on March 13, 2017

allows you to use an .ssh/authorized_keys file instead of sudo to allow access to certain administrative tasks.

Another way to do this is pam_ssh_agent_auth. Been using it to authenticate to sudo for years on systems that only maintain keys and no passwords.

hornetblack · on March 13, 2017

systemd has --user option, which makes it a service manager for a particular user, if that's what you mean. I used it for some time to run an emacs daemon. (until I found that it closed between logouts and changed to using the system manager isntead)

tophercyll · on March 13, 2017

You can make user level systemd instances run at startup and stick around regardless of login/logout.

  loginctl enable-linger <username>

This enables some nice deployment strategies that don't require root.

hornetblack · on March 13, 2017

Thanks. I figured this was possible. I just never got around to looking at loginctl's options.

sandGorgon · on March 13, 2017

systemd is far superior to anything else out there. It matured at roughly the same time as docker.

However it doesn't support being the CMD in a Dockerfile. Which is why it's not very common in software deployment scenarios in the post-container world.

For older deployments, it may not be worth switching to systemd because the base OS may not be compatible.

So it's kind of a catch-22.

If you are on baremetal, systemd is much preferable to run it/supervisord

chimeracoder · on March 13, 2017

> However it doesn't support being the CMD in a Dockerfile. Which is why it's not very common in software deployment scenarios in the post-container world.

Er, that's only half-true. systemd isn't great for running as PID 1 inside a Dockerfile, but that's because Docker already monitors PID 1[0], and systemd can be used to monitor your container itself.

In other words, think of containers as individual applications that you want to monitor, and systemd can be used either to monitor them or even to run the containers directly. (Yes, systemd can even run Docker containers directly, without Docker![1])

[0] you are using exec mode, right?

[1] https://chimeracoder.github.io/docker-without-docker/#1

sandGorgon · on March 14, 2017

Actually I'm not sure we are referring to the same thing here.

I'm referring to pid1 inside the docker container. systemd does not run inside the container as pid1 very easily.

Take a look at this - https://github.com/docker/docker/pull/13525

I think your presentation was about replicating docker functionality using systemd-nspawn...Which pretty cool...But it's not the same as what I'm talking about.

I'm referring more generally to production decisions with docker. Also read this https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zomb...

chimeracoder · on March 14, 2017

> Actually I'm not sure we are referring to the same thing here. I'm referring to pid1 inside the docker container. systemd does not run inside the container as pid1 very easily.

We are - I'm saying that you don't actually want to run systemd as PID 1 inside a Docker container; the Docker model is built around the container being an application unit, not a system unit.

But if you want to have isolated system(d) units, you can use systemd to get that behavior inside containers. In that case, you'll want to use systemd to run your containers instead of Docker, because systemd's tooling is container-aware (ie, you can have integration between units that run on your host and units that run inside a machine - 'machine' being the systemd term for 'container', in this case).

sandGorgon · on March 14, 2017

That is a simplification - even Facebook runs sshd inside its containers.

I know what you are saying - that an atomic unit of work is the program itself..But we run stuff under supervisord even if it is a single program. It helps us to make quick debugging changes to scripts,etc and "restart" them without restarting the container.

In theory it seems the same - in practice it is not. This is the reason for the existence of tons of different init tools for docker.

BTW, I had trouble understanding what you meant because you are constantly moving from docker-as-an-application-unit concept (which is reasonably true) to systemd-nspawn-is-better-than-docker (which is something I am not generally opinionated about).

stephenr · on March 13, 2017

> It matured at roughly the same time as docker.

Docker, "matured", in past tense? Proof of time travel right there!

bradmwalker · on March 13, 2017

Systemd works inside LXC and systemd-nspawn.

skarap · on March 13, 2017

What if slack is down/unreachable? Won't it keep the service from starting?

lima · on March 13, 2017

If it makes slacktee fail with an error code, yes, it would prevent startup. The fix is a - sign:

    ExecStartPre=-/usr/bin/foo

Would still mark the service as "failed" but wouldn't prevent it from starting.

orf · on March 13, 2017

What about using sentry and getting better stack traces with local variables as well as slack notifications?

Stack traces are fine and all, but without locals it's often hard to track down the issue.

hendzen · on March 13, 2017

This will also notify if you intentionally shut down the service.

If you want to only see failures, you can use an OnFailure directive.

qznc · on March 13, 2017

I'm currently building something similar for matrix instead of slack. My idea was to clone the mail(1) interface, but tee(1) is also a nice idea!

In the process, I also discovered sendxmpp, which provides mail(1) for XMPP. It does not support encryption and is written in Perl, so I am building my own.

praveshjain · on March 13, 2017

I don't understand. How is it better than using HTTP POST to post to the slack channel yourself?

jon-wood · on March 13, 2017

It doesn't require changing the underlying systems that are being managed, giving it a lot more flexibility. So long as the service can log to STDOUT it can be integrated with Slack.

praveshjain · on March 13, 2017

Oh I get it. Thanks.

Curious42 · on March 13, 2017

Is Systemd a nohup equivalent? If it is, then what's the benefit of using one over the other?

fnord123 · on March 13, 2017

nohup just runs a process that will ignore SIGHUP.

systemd can also run processes that ignore SIGHUP. But systemd does a lot of things that nohup doesn't do. Please don't attempt to use nohup as a daemon management system for anything but the noddiest of tasks.

If you're going to use anything, you're probably best choosing from this list:

https://en.wikipedia.org/wiki/Operating_system_service_manag...

JdeBP · on March 13, 2017

... which misses out quite a lot, from s6 through perp to initng.

koolba · on March 13, 2017

While nohup is used to daemonize a single process, systemd does that and a ton of other stuff. It's a replacement init system that includes everything from daemonization, log handling, auto restarts, and by this time next year will probably make coffee for you as well.

bandrami · on March 13, 2017

This is what I've never understood about systemd. Somebody will ask "doesn't inetd already do that?" or "doesn't nohup already do that?" or "doesn't runit already do that?" and the answer is always "yes, but systemd does a bunch of other things something else already did too".

shrug

Its setup is way, way too brittle for me, but I guess the QR codes are kind of neat.

deathanatos · on March 13, 2017

My requirements (for server side daemons) are approximately: start a daemon as a child, and monitor it; if it dies, restart it. Ideally, give me a command-line interface to the managing daemon to gracefully start/stop the child. nohup doesn't do this, making it simply not the right tool for the job. I've never used runit, but it looks closer.

bandrami · on March 13, 2017

Take a look at PIES[1]. I've switched most of my production systems to it. I especially like that it's inittab-compatible.

[1]: http://www.gnu.org.ua/software/pies/

nodesocket · on March 13, 2017

Interesting service scaledrone. How are you different than pusher, pubnub, or hydna.com?

raresp · on March 13, 2017

At a very first view I can say it's cheaper. And this is important for startups and small/medium businesses.

They also provide examples for a lot of languages, I guess I'm going to try their service.

Good job, Scaledrone!

SEMW · on March 13, 2017

Shameless plug: if you want something almost as cheap but without having to drop the ability to subscribe from all client libs (not just js), queryable history, presence, connection state recovery, webhooks, stats, firehose to queues, etc, then have a look at https://ably.io. (disclaimer: I work there)

lorenzhs · on March 13, 2017

Not to be overly nitpicky here but it's spelled "systemd", not "SystemD". From its website: Yes, it is written systemd, not system D or System D, or even SystemD. And it isn't system d either. Why? Because it's a system daemon, and under Unix/Linux those are in lower case, and get suffixed with a lower case d. And since systemd manages the system, it's called systemd. It's that simple. https://www.freedesktop.org/wiki/Software/systemd/

maccard · on March 13, 2017

if you're not being overly nitpicky, what are you being? if you'd clicked on the article you would have seen that the mistake was only in the HN submission title, and not in the title.

Raidok · on March 13, 2017

Both of them are fixed now.

tannhaeuser · on March 13, 2017

Insisting on Unix naming conventions is odd when SystemD tramples over more important Unix system concepts left and right.