Small, somewhat nit-picky critique: the man pages for system calls are in section 2. If you want to see the docs for the "read()" syscall, and not the bash builtin "read", saying "man read" w̶o̶n̶'̶t̶ may not (see follow-up) do what you expect. Instead, you should say
Different Unix systems take different syntax for specifying the section. Long ago I learned to say "man -a read" and just get _all_ the "read" pages, scanning for the ones I want (and sometimes learning about things I didn't know were there).
I'm going to hijack this comment to ask a question that I haven't been able to google:
What is the number after the command called? For example, when I look up 'man sed', I find a manpage for 'sed(1)'[0]. When I look up 'man kill', I find manpages for 'kill(1)' and 'kill(2)'[2].
Can somebody tell me what that number is called so I can look it up? Thanks..
That's the "section" I'm talking about. "sed(1)" means that the man page for that sed is in section 1. "kill(1)" versus "kill(2)" likewise: the former is the command; the latter is the syscall. To distinguish between them, you need to tell man(1) which section to look in for the desired context.
That number refers to the section of the man pages that the command is in. See http://en.wikipedia.org/wiki/Man_page#Manual_sections for an example list; however, it seems the list is not the same between systems.
There's a manpage bash-builtins in section 7, but I've never seen a system that had manpages for the individual builtins, let alone having them in section 1. "man read" on every system I've used opens the read manpage in section 2.
I can't remember a unix system for at least the last decade that hasn't given me the section 1 bash "read" instead of the section 2 system "read" on a "man read".
This is why the documentation for COHERENT was changed from "man" pages to what we called a Lexicon, removing the concept of sections. Each entry would have the description of the type of the entry, such as system call, shell command, and so on. Steve Ness has made this available online at http://www.nesssoftware.com/home/mwc/manual.php
You can change the section search order with the SECTION directive in /etc/man_db.conf, or you can override that global setting with the MANSECT environment variable. I just learned this from "man man". :) I'm personally tempted to put "2 3" ahead of the rest, because most of the time I'm just trying to refresh my memory on function parameters.
Don't forget it's userspace equiv (strace is syscalls), ltrace. This tracks all lib calls made by process.
Under windows, strace is an SSL/TLS monitoring tool (also hella useful). It shows payloads passed to CryptoAPI/CNG libs so you can easily troubleshoot explicitly encrypted protocols like ldaps. Especially useful if you use client authenticated TLS where is is not possible to use a TLS mitm proxy to snoop the layer 7 data.
Shameless plug: if you want to trace Windows applications you can take a look at my company products SpyStudio[1] and Deviare[2]. Before downvoting me try them to see how powerful and unique they are in the Windows ecosystem.
VMware is using SpyStudio for creating and troubleshooting application virtualization packages, this is, for example, a twitter post from a VMware escalation engineer: https://twitter.com/DooDleWilk/status/428562701313662977
Mac OS X has a suite of tools built on a similar package called dtrace—opensnoop and execsnoop. Gives really nice real time lists of all files opened on the system and all binaries executed, respectively.
Thanks for those, very cool and useful! Every time I see blog posts like the above, and comments like yours and others, it reminds me just how much stuff I either don't know or should learn more about.
He can ostensibly remember the exact outcome, though. I read the whole post waiting for resolution, and realized that I'm going to do this in my next post as well. Start with a story, interrupt it, finish it at the end.
Thanks for the writeup! strace should definitely be in your toolbox. There is also systemtap, which I like a lot as well. It has some problem on Linux though, especially only being widely supported of Linux > 3.5 if the distro you are using does not ship with patches. Custom userspace probes are a real strong point.
To clarify, systemtap needs UTRACE or UPROBES kernel support to trace user processes. Without those you can still inspect kernel functions. IIRC UTRACE was mainly supported by Redhat, and available intheir kernels for quite a few years. In 3.5 the UPROBES functionality was merged in to mainline. Other tools, like perf, can use UPROBES support as well.
Yep, thats the detailed story. A lot of Linuxes (especially current Ubuntu LTS) ship without those patches, though, which makes the whole exercise of compiling the kernel yourself (and maybe a debug image to go along with it) a tedious one, and probably not fit for a production environment.
Thats why I decided to focus on Linux > 3.5, which will be available with Ubuntu 14.04 LTS, where installing gets much easier if you know the right packages. I definitely wanted to make sure that people can start playing around with it in a few minutes.
Also, UPROBES are in my opinion one of the most interesting features to grab with strace. It allows you to easily combine detailed kernel-level tracing with tracing of your application.
(Not that I want to suggest that compiling your own kernel is hard to do, it just takes the fun out of "let's trace!")
You can use Process monitor
http://technet.microsoft.com/en-us/sysinternals/bb896645.asp...
to see a similar overview of low level activity. You won't see all the system calls, you can't pipe the output directly, but there is a UI and you don't have to look up file descriptors
I find that strace (or dtruss) is more useful when you know less about what exactly you're trying to find. A log of syscalls is often exactly the right granularity to find out the gist of what a process you didn't write is trying (and failing) to do.
So I think the comparison, despite the superficial similarity and similar mechanism of action, isn't really fair.
That's useful and all, but what if you want to instrument arbitrary parts of a program, not just the syscall interface? By function or instruction? Either in userspace or in kernel? With statistical functions? And speculative tracing? And extensive control flow (except loops, which prevent certain safely guarantees DTrace makes). And a lot more.
Don't be fooled by the single-letter change: strace is to DTrace what edlin is to emacs. Or something else ridiculously extreme. They're barely comparable.
>What is this "DTrace" thing? It stands for "Dynamic Tracing",
>a way you can attach "probes" to a running system
>and peek inside as to what it is doing.
It's like awk, except that you match entry/exit of syscalls, function calls, method invocations (in ObjC/Java), and give code to execute with access to arguments, return values, stack trace, etc.
It can be used to write tools like strace (see "dtruss" on OSX), iotop, topsyscall, etc.
Has anyone heard of a program that will take strace (or dtrace) output and create a pretty diagram showing which commands call which commands and which files they read or create?
We've got a fairly complicated bioinformatics pipeline that calls about 100 other programs, and creates or reads about 100 different files. I'd love a way to create a picture of what's going on. Which files each program uses, etc.
If such a program doesn't exist, would that be worth building? Could it be something I could potentially sell?
Sounded like an interesting script, so I just wrote it in about a half hour. You're welcome to sell it if you want... (also, there's probably a default recursion limit of 100; unroll the recursion in walkpid to go farther) Collect logs with 'strace -o pids.log -e trace=process -f [specify your process here]', run with 'perl printpids.pl < pids.log'
#!/usr/bin/perl -w
$|=1;
use strict;
my (%pidmap, @order);
while ( <> ) {
chomp;
if ( /^(\d+)\s+(\w+)(.*)$/ ) {
my ($pid, $syscall, $args) = ($1, $2, $3);
if ( $syscall =~ /(^clone$|fork$)/ and $args =~ / = (\d+)$/ and $1 > 0 ) {
my $clonepid = $1;
$pidmap{$clonepid} = { -parent => $pid };
push(@order, $clonepid);
}
elsif ( $syscall =~ /^exec/ and $args =~ / = (\d+)$/ and $1 == 0 ) {
my $exec = $args;
@order = ($pid) if !@order;
$exec =~ s/^\("([^"]+?)",.*$/$1/g;
push( @{ $pidmap{$pid}->{-exec} } , $exec );
}
}
}
foreach my $pid ( @order ) {
my $spaces = walkpid($pid);
print " " x $spaces . join("\n" . (" " x $spaces), map { $_ . " ($pid)" } @{ $pidmap{$pid}->{-exec} } ) . "\n";
}
sub walkpid {
my $pid = shift;
my $c = shift || 0;
if ( exists $pidmap{$pid}->{-parent} ) {
return walkpid($pidmap{$pid}->{-parent}, $c+1);
}
return($pid, $c);
}
Valgrind's callgrind tool will profile all calls a program makes. You can them feed the output to kcachegrind (or qcachegrind for the Qt version) which will nicely visualize the profiling run.
I use strace all the time doing ops at Crittercism. Some of the random things it's helped with/taught me:
- allowed exploring forking behavior of daemons, in particular the nitty-gritty of gunicorn's prefork behavior, and understanding the rationale behind single- and double-fork daemons generally (very important to understand for job control e.g. writing upstart/init.d jobs)
- isolated hot reads to memcache in situ, by identifying the socket associated with the memcache connection, and finding which key was read the most by a process (we built better logging after the fact, but sometimes there's no substitute for instrumenting prod during tough perf/stress problems)
- let me explore the behavior of node.js's several threads, and find one of them sending "X" over a socket to the other (still not quite sure what this is, some kind of heartbeat/clock tick?)
- helped understanding "primordial processes" and the exact details of how forking/reparenting work on linux
It's a great tool and one that every ops/infrastructure engineer should be familiar with.
let me explore the behavior of node.js's several threads, and find one of them sending "X" over a socket to the other (still not quite sure what this is, some kind of heartbeat/clock tick?)
I don't know about node.js specifically, but this is a common pattern to wake another thread that uses a select()-style event loop.
Quick strace command that I use all the time to see what files a process is opening:
strace -f <command> 2>&1 | grep ^open
Really useful to see what config files something is reading (and the order) or to see what PHP (or similar) files are being included.
There's normally other ways to do this (eg using a debugger) but sending strace's stderr to stdout and piping through grep is useful in so many cases it's become a command I use every day or 2.
Also check out ltrace... Shows the calls to other libraries the process is making...
I'd also like to point out that a key to using strace successfully is the result column... Programs that fail often make system calls that fail right before they exit... You can often tell what the program is trying and failing to accomplish...
I’ve used strace before to help diagnose issues with buggy software I was using and I thought this was a great article.
I just thought I’d let people know that it can be a lot easier to read strace’s output if you read the output log file using Vim as it contains a syntax file which can highlight PIDs, function names, constants, strings, etc. Alternatively, if you don’t want to create an strace log file, you could pipe the output to Vim and it will automatically detect it as being strace output, e.g.
Totally agree that strace is an awesome tool. I've even used it with Java apps that were behaving wierdly, just attach and see what it is saying to the kernel.
Just a few hours ago, a newly minted Ubuntu binary was crashing due a library version mismatch. I thought I had updated the shared libraries to point to the new versions. But definitely something was still hooked to the old version. I just couldn't figure out how/where. ldd wasn't of much help because everything was OK according to it. "If only I can get a bit more info when the binary is running and spit out everything before the crash."
Tried my luck with gdb. Sure enough...there was libQt5DBus pointing to the old libs leading to the crash. If you are feeling particularly adventurous, you can step one instruction at a time after starting. Even without debug symbols, there is quite a lot of info that be used while troubleshooting.
Oh we used those too, but in this case there were also native libraries. I was a regular user of jad, even sometimes recompiling and replacing stuff (ooooweeee) in production.
It's instructive to see how much simpler the strace output for a simple program is when the program is statically linked. Especially if you use an alternative libc like musl (http://musl-libc.org/).
Don't forget that sometimes strace is overkill, and similar more easily parsed things can be used instead, for example, /usr/bin/time (vs bash time) has been coming in more and more handy for me.
This is a tool called ptrace - which does everything that strace does and a lot more. You have working binaries in there, and most of the source - I havent extricated the full build dependencies so it all builds, but this includes extra facilities like reporting summaries of process trees, showing only connections or files, and shlib injection into a target process.
If people are interested more on this, contact me at CrispEditor-a.t-gmail.c-o-m
I can understand being twitchy but it is so easy to get a domain name that it would literally be no barrier at all to an attacker. Not wanting a binary (especially one that will run with root privs) to come from a domain you don't recognise and trust is understandable. Even then it should be by https with AND/OR a signed package are the first steps to security.
Since IP address ranges are allocated in blocks to ISPs, you can do an IP lookup and discover that this fellow is in the UK using a Virgin Cable connection.
I might be wrong, but I'm reasonably sure that you link to the system call's man page. Which is probably what the tool in question uses, but .. not the same thing.
Elsewhere in this discussion: There's a difference between man page section 1 and 2 - and read was quoted as an example for a potential ambiguous result if you invoke "man read" (opens man 1 read here, when man 2 read was the syscall I might want to look at after running strace).
Otherwise, great writeup. Thanks for sharing!
(edited)