Dustin Sallings: Running Processes

mojombo · on Feb 28, 2010

As the author of god, I'll agree that if your only need is to ensure that a process is running, and that process isn't doing a lot of fancy daemonization or forking, then init is the best solution. In fact, on all our boxes at GitHub we use init to run god, because init is incredibly reliable; who better to ensure that god is actually running?

But if your needs extend to ensuring that your processes are well behaved with respect to memory usage, cpu usage, response to HTTP requests, etc, or any custom metric of your choice, then you have to go a bit farther than the suggestions in this article. God is all about making it easy to keep everything running, no matter how complicated the setup or metrics may be.

dlsspy · on Feb 28, 2010

I've used god quite a bit (that's actually how I found github in the first place, I believe) and found many of its facilities useful.

I was mostly aiming for the stuff your system already does (and daemontools for when it doesn't do what it should already do). As this was becoming a rather large work of writing, I didn't want to spend some extra time to, you know, praise god.

Some of these things do go into the basics, though. At the very least, memory usage. Using rlimit for memory usage and death-of-child events for signaling restart gives really quick turnarounds for almost free.

But rlimit for cpu utilization is less useful for long-lived processes. And for things that are entirely outside of the scope (the most common one for me is, "is my log still growing"), it's just not helpful at all. These are the types of process monitoring where god helps a lot.

strlen · on Feb 28, 2010

Upvoting, because it's a great article.

My own personal preferences (on systems where I don't have smf, upstart or launchd available-- e.g., production RHEL/CentOS machines) is daemontools. My only quibble with it is that the logging system it comes with has a fairly insane feature of fsyncing() on every write (which, if you run a service which spews a lot of logs to STDOUT, can seriously degrade performance). I'd really like to investigate runit as an alternative: http://smarden.org/runit (the site is down at the moment, but here's Google's archive: http://74.125.155.132/search?q=cache:ZPOjz5Z7k9IJ:smarden.or...), but haven't yet.

dlsspy · on Feb 28, 2010

Yes, I meant to look at runit some last night, but it was unavailable. It does look pretty good.

silentbicycle · on Feb 28, 2010

I'm looking into runit, too. freedt (http://offog.org/code/freedt.html) is a similar alternative.

bensummers · on March 1, 2010

The Solaris section is very short, but I suppose that's because SMF is very good.

Although if you do start using it, make sure your daemons fork, contrary to the advice at the beginning of the article. I've encountered some race conditions and other bugs when you use 'transient' processes which don't fork. Solaris doesn't need to poll the process list with forking daemons because it uses Contracts which keep track of a set of forking processes.

dlsspy · on March 1, 2010

I have limited experience with smf (mostly, I work with people who think it's really awesome and who would punch me if I didn't at least mention it). It seems to do a lot of things, which, in turn makes the descriptors a bit more verbose than launchd's.

If you have a good resource for introducing writing one of these, I'd love to link to it. In particular, the idea of forking seems to conflict with classic process monitoring.

I haven't been a solaris sysadmin in over a decade. I'm hopefully going to be taking care of some modern solaris boxes real soon now, though, so I'm looking forward to what all's changed.

bensummers · on March 4, 2010

Here's a good SMF resource, by the ever reliable c0t0d0s0!

http://www.c0t0d0s0.org/archives/4144-Solaris-Features-Servi...

lmz · on March 1, 2010

One of the positives of the shellscript init.d/rc.d scripts are that you can write your own start/stop steps e.g. sending a shutdown command to a socket. Not all programs react well to a SIGTERM.

dlsspy · on March 1, 2010

I don't see that as a huge advantage for a few reasons:

1. We're mostly talking about startup (restart, etc...), not shutdown. You can still write your own start script and that's all I've found myself caring about most of the time.

2. Writing a shutdown script is still plenty possible, though in the worst case, you may need an adaptor (and only one of these needs to exist for all apps).

3. Processes that don't handle TERM signals properly should have bugs filed against them to do so.

To be honest, the software that's the biggest pain for me to deal with right now is one that requires a gentle shutdown. I can't always give it one, and my system sometimes crashes and can require a painful recovery.

I'm becoming more of a fan of crash-only software every day for this reason.

mojombo · on March 1, 2010

God can also use any kind of script you like to perform start/stop/restart. You can execute an external script or even write a Ruby block to interact with your managed process however you like.