Hacker News new | past | comments | ask | show | jobs | submit login

I've encountered a ton of these at my previous and current job, in Linux.

The primary problem is that forms of ps output that read the full command line (from /proc/$pid/cmdline) require reading memory from the process. This requires, at least, a read-lock on the process's memory map semaphore (mmap_sem), and lots of other things like to access mmap_sem, including other memory allocation (write lock), a page fault (read lock, so you can figure out what to fault in), etc. In particular, if the process is in the middle of mapping or faulting a mapped page from a slow filesystem - such as NFS or a network-backed block device provided by a hypervisor - then it can sit around with mmap_sem for arbitrarily long.

Usually the process taking a read lock on its own mmap_sem, or someone else taking a read lock, is harmless, since it's a reader-writer lock and there can be multiple readers. But as soon as a writer declares an intent to take a write lock, further readers are blocked to avoid writer starvation, which means a single slow reader will prevent all further readers. See http://blog.nelhage.com/post/rwlock-contention/ for some excitement there.

You can generally read /proc/$pid/comm (short command line) and /proc/$pid/status, which both just reference info in the kernel's task_struct, and don't require taking a lock on the userspace memory map. You can also read /proc/$pid/syscall, which will tell you what syscall it's in and the numeric arguments, and usually you can read /proc/$pid/stack, which tells you the kernel stack of the process. (Though I have recently found that that one also takes a lock, but fortunately one that's much less frequently contended.) If you're trying to make sense of why a system is stuck, and ps aux is unresponsive, my goto is grep 'disk sleep' /proc/ * /status, followed by reading the corresponding /proc/$pid/stack. If you're lucky, you'll see which module is slow (filesystem / block I/O? networked filesystem? FUSE? etc.) and can try to address that. Or perhaps you'll see several processes trying to get a lock on something and one that looks like it's holding a lock and stuck doing work; if you can address (perhaps kill) that process, the system might make progress.

lsof likes to read /proc/$pid/maps, the list of mapped files, which of course requires an mmap_sem read lock. It does this so that it can list files that are mapped but no longer have a file descriptor (e.g., shared libraries get opened, mmaped, and closed). If you know that you're only interested in files with file descriptors - e.g., you're looking for a socket, or something - you can do this with less contention by looking at /proc/$pid/fd/, which is a directory of magical nodes that show up as symlinks to open files. (They're not really symlinks; for instance, they'll work even if the actual file is deleted. But you can ls -l them as if they were symlinks, so ls -l /proc/ * /fd/ * | grep is a pretty decent alternative to lsof.)

[sorry about the formatting, HN is really enthusiastic about asterisks]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: