Yes, for tracking processes and reliable resource control. Prior to cgroups, in Google's Borg cluster management daemon the best strategy I was able to come up with for reliably and efficiently tracking all the processes in a job was:
- assign each job a supplementary group id from a range reserved for Borg, and tag any processes that were forked into that job with that group id
- use a kernel netlink connector socket to follow PROC_EVENT_FORK events to find new processes/threads, and assign them to a job based on the parent process; if the parent process wasn't found for some reason then query the process' groups in /proc to find the Borg-added group id to determine which job it's a part of.
- if the state gets out of sync (due to a netlink queue overflow, or a daemon restart) do a full scan of /proc (generally avoided since the overhead for continually scanning /proc got really high on a busy machine).
That way we always have the full list of pids for a given group. To kill a job, nuke all the known processes and mark the group id as invalid, so any racy forks will cause the new processes to show up with a stale Borg group id, which will cause them to be killed immediately.
This approach might would have had trouble keeping up with a really energetic fork bomb, but fortunately Borg didn't generally have to deal with actively malicious jobs, just greedy/misconfigured ones.
Once we'd developed cgroups this got a lot simpler.
cgroups was extremely useful for a system I built that ran on Borg, Exacycle, which needed to reliably "kill all child processes, recursively, below this process". I remember seeing the old /proc scanner and the new cgroups approach and being able to get the list of pids below a process and realizing- belatedly, that UNIX had never really made this easy.
No, because multiple jobs being run by the same end-user could share data files on the machine, in which case they needed to share the same uid. (Or alternatively we could have used the extra-gid trick to give shared group access to files, but that would have involved more on-disk state and hence be harder to change, versus the job tracking which was more ephemeral.) It's been a while now, but I have a hazy memory that in the case where a job was the only one with that uid running on a particular machine, we could make use of that and avoid needing to check the extra groups.
Yes, for tracking processes and reliable resource control. Prior to cgroups, in Google's Borg cluster management daemon the best strategy I was able to come up with for reliably and efficiently tracking all the processes in a job was:
- assign each job a supplementary group id from a range reserved for Borg, and tag any processes that were forked into that job with that group id
- use a kernel netlink connector socket to follow PROC_EVENT_FORK events to find new processes/threads, and assign them to a job based on the parent process; if the parent process wasn't found for some reason then query the process' groups in /proc to find the Borg-added group id to determine which job it's a part of.
- if the state gets out of sync (due to a netlink queue overflow, or a daemon restart) do a full scan of /proc (generally avoided since the overhead for continually scanning /proc got really high on a busy machine).
That way we always have the full list of pids for a given group. To kill a job, nuke all the known processes and mark the group id as invalid, so any racy forks will cause the new processes to show up with a stale Borg group id, which will cause them to be killed immediately.
This approach might would have had trouble keeping up with a really energetic fork bomb, but fortunately Borg didn't generally have to deal with actively malicious jobs, just greedy/misconfigured ones.
Once we'd developed cgroups this got a lot simpler.