Show HN: Sysdig, a tool for Linux system exploration

brendangregg · on April 3, 2014

Impressive. Easy to get going, low overhead, powerful one-liners.

I like the filter syntax - would be nice for perf_events to pick this up. Although, if it did, I hope that the stable filter fields API can be extended with unstable arbitrary expressions as needed, for when dynamic probes are used.

What perf_events realy lacks is a way for custom processing of data in kernel context, to reduce the overheads of enablings. Eg, lets say I want a histogram of disk I/O latency. sysdig has chisels, which look like they do what I want, but from the Chisels User Guide: "Usually, with dtrace-like tools you write your scripts using a domain-specific language that gets compiled into bytecode and injected in the kernel. Draios uses a different approach: events are efficiently brought to user-level, enriched with context, and then scripts can be applied to them." Oh no, not user-level!

I tested this quickly, expecting DTrace's approach (which is the same as SystemTap and ktap) to blow sysdig out of the water. But the results were surprising (take these quick tests with a grain of salt). Here's my target command, along with sysdig and DTrace enablings, and strace for comparison:

  Target: dd if=/dev/zero of=/dev/null bs=1k count=1000k
  sysdig: sysdig -c topfiles_bytes
  DTrace: dtrace -n 'syscall:::entry /execname == "dd"/ { @[probefunc] = count(); }'
  strace: strace -c dd ...

sysdig slowed the target by about 4x. DTrace, between 2.5 and 2.7x. strace (for comparison), over 200x. This is a worst-case test, and if I'm willing to slow a target by 2x then taking that to 4x doesn't make much difference. With what I normally trace, the overheads are 1/100th of that, so DTrace is negligible. The take-away here is that the overheads are closer to the "negligible" end of the spectrum than strace's "violent" end. Which I found surprising for user-level aggregation.

The Sysdig Examples could do with some sanity checking. Eg:

"See the top processes in terms of disk bandwidth usage sysdig -c topprocs_file"

I saw:

  Bytes     Process
  ------------------------------
  134.65M   dd
  4.82KB    snmp-pass
  603B      snmpd
  332B      sshd
  220B      bash
  107B      sysdig

That's while my dd between /dev/zero and /dev/null was running. No "disk bandwidth"! :)

edit: formatting

degio · on April 3, 2014

Brendan, thanks for the feedback. It's really cool to hear comments like this from someone like you. We really respect your work in the field.

Good catch on topprocs_file, we'll have to find a better name for it.

In terms of overhead, we put a lot of effort in it and, as you pointed out, we're already extremely optimized. But we think we can do even better. For example, we don't have any kind of kernel-level filtering yet. Coming soon! :)

SEJeff · on April 4, 2014

Any chance you are working on getting this upstream? I noticed Greg KH as one of the contributors

degio · on April 4, 2014

I guess it's early to tell, but if the kernel folks don't object we would be happy to work at including our driver in the kernel.

fche · on April 4, 2014

A 1 gigabyte no-IO dd is an unusual microbenchmark: it stresses only syscall rates while wasting time with memcpy. (A loop on getpid() or equivalent would have worked just as well.)

On my workstation the plain dd runs in 40 ms, with systemtap one-liner instrumenting/aggregating those 1 million write(2) syscalls in the kernel extended that tiny runtime to about 50 ms for a 1.25x slowdown. But such small numbers are hardly meaningful.

I'm curious to what extent userspace perf script postprocessing is deemed a technological equivalent to this; or why a new kernel module was deemed necessary versus the perf_event_open(2) ring-buffer abi.

otterley · on April 3, 2014

I had the privilege of early access to sysdig thanks to the developers. It's not as powerful as SystemTap or DTrace but it is very useful and easy to use. Think of it as strace(8) with global dump capability (not just per-process), more powerful filters, replayable logging à la tcpdump(8), and Lua plugin support.

Plus the packaging is top-notch; its kernel modules are rebuilt automatically on kernel upgrade via DKMS (which I wish other vendors like FusionIO would do).

0xbadcafebee · on April 3, 2014

I like that you link to the github, where the README is a link to your more-slick website, which has nothing but a couple of examples and an install page, all of which is really linkbait for your company Draios. It almost seemed like you were just sharing a useful tool. The tool might be really useful, but at this point i'm still clicking through links trying to figure out what it does and how.

edit: Nevermind, I found it. It's a kernel module and user app that uses Lua scripts for interpreting data. Sorry about my harsh tone before, but jesus I hate it when there's more gloss than content.

degio · on April 3, 2014

Thanks.

To answer the question "what it does and how", sysdig captures system calls and other system level events using a linux kernel facility called tracepoints, which means much less overhead than strace.

It then "packetizes" this information, so that you can save it into trace files and filter it, a bit like you would do with tcpdump. This makes it very flexible to explore what processes are doing.

We also pack it with a set of scripts that make it easier to extract useful information and do troubleshooting.

0xbadcafebee · on April 3, 2014

See, that is a really good description that would be useful in a README. Right away I know what it is, what it does and whether I should use it.

degio · on April 3, 2014

As you suggested, we've updated the README with the content above.

zokier · on April 4, 2014

I feel like some introductory article about the different instrumentation facilities available for Linux systems would be welcome. Just checking wikipedia and google, I found the following items: SystemTap, Dprobes, LTTng, DTrace, strace, ltrace (and latrace), ktap, utrace, ftrace, kprobes, jprobes. And now we have sysdig too.

zokier · on April 4, 2014

Replying to myself; found this http://www.brendangregg.com/linuxperf.html page which does at least some sort of summary of the tools

shubb · on April 3, 2014

Looks very useful. Some things you can do with it:

Dump system activity to file, so that sysdig can be used to process it later.

* sysdig -w trace.scap

Print process name and connection details for each incoming connection not served by apache.

* sysdig -p "%proc.name %fd.name" "evt.type=accept and proc.name!=httpd"

See the files where apache spends the most time doing I/O.

* sysdig -c topfiles_time proc.name=httpd

Show the network data that apache exchanged with 192.168.0.1.

* sysdig -A -c echo_fds fd.sip=192.168.0.1 and proc.name=httpd

Show every time a file is opened under /etc.

* sysdig evt.type=open and fd.name contains /etc

degio · on April 3, 2014

Thanks! A full list of examples can be found here: https://github.com/draios/sysdig/wiki/Sysdig%20Examples

joshbaptiste · on April 3, 2014

I would like to know what's going more low level, Ktap gives a good break down how they differ from SystemTap, dynamically typed, byte-code design... etc

http://www.ktap.org/doc/tutorial.html#faq

Is Sysdig design similar?

degio · on April 3, 2014

The design is actually quite different.

From the architectural point of view, sysdig is closer to tcpdump/wireshark than to systemtap/ktap.

systemtap/ktap work similarly to dtrace: - a script is loaded into a user level process - the process compiles the script and dispatches it to a kernel module - the kernel module hooks the script into specific places in the kernel - the kernel module sends the results back to userspace where the user can see them

sysdig works this way: - the kernel module hooks into specific places in the kernel (using tracepoints), captures everything, and puts it into a shared memory buffer - the buffer is accessed from the user-level sysdig process that reconstructs state (so it knows that fd 23 means /etc/passwd) - filtering is applied - scripting in Lua is applied - the whole thing is optionally saved to disk so you can analyze later

Both approaches have pros and cons. We think that the sysdig approach creates a more natural workflow, ideal for troubleshooting and system administration tasks. Plus, writing scripts in Lua, with access to its rich libraries, is quite fun. :)

I want to give more details in a future blog post, so stay tuned.

jovi · on April 8, 2014

It's good to see this come out. (I was thinking about to implement this kind of tool, lua or nodejs was my two choices, now sysdig use luajit, great.)

Interesting enough, ktap vm is also based on luajit, if you want to add kernel filter scripting functionality into sysdig in future(without GCC needed), then just integrate ktap into your solution. :)

annulen · on April 9, 2014

"Reconstruction" step looks like a source of unjustified inefficiency: why reconstruct state by traversing proc and doing lots of system calls instead of capturing all data in the first place where it's much cheaper to do?

degio · on April 11, 2014

Of course, we create and update the state by inspecting the incoming stream of system calls. We traverse proc only once, when you start a capture, and the reason to do that is collecting info for the PIDs/FDs that existed before we start the system call collection. That way, you can for example create a filter on the IP address of a socket even if that socket was created before sysdig started.

zobzu · on April 3, 2014

"The definitive tool" they name it, yet its not as powerful as dtrace. So, its not definitive.

Looks nice otherwise. Too bad it needs a kernel module.

degio · on April 3, 2014

dtrace and any other instrumentation tool for linux require a kernel module as well. Ours has the advantage of being very simple and (in theory!) more stable, because decoding, filtering and scripting run at user level, so you have less chances to crash the kernel.

I would also like to point out that the sysdig workflow is quite different from the drace one. In addition to supporting real-time investigation, sysdig lets you take a rich "snapshot" of the machine activity that you can analyzer later. From this point of view, I don't think sysdig is less powerful than dtrace. Quite the opposite. But we're eager to know what you think.

prakashsurya · on April 3, 2014

Is dtrace available on Linux? I know there's been work towards that goal, but I haven't payed much attention to it recently.

gregkh · on April 3, 2014

It's "out of tree" due to licensing issues (i.e. Oracle is not releasing it under a GPLv2 compatible license for various reasons...)

yxhuvud · on April 3, 2014

Ah, the good ol' pipe through sudo bash installation instructions. I wish there was a more structured platform independent way of distributing stuff before the stuff is packaged by distros.

peterwaller · on April 3, 2014

That's what shell armour is for ;-)

http://drj11.wordpress.com/2014/03/19/piping-into-shell-may-...

degio · on April 3, 2014

We decided to offer bash-piping to make it simpler, but it's actually nothing more than a clean deb/rpm package.

And you have the option to install it manually if you want https://github.com/draios/sysdig/wiki/How%20to%20Install%20S....

e12e · on April 4, 2014

Are you maintaining the debian build-scripts in some other repo? I was hoping for a /debian directory and being able to simply dpkg-buildpackage from the git repo, but I can't find any way to build debs (or rpms)?

Does appear to build fine with cmake/make, though.

gighi · on April 4, 2014

We use CPack for the moment, so you can just run "make package" inside the CMake build directory and it will generate RPM/DEB.

yourad_io · on April 3, 2014

pipe sudo bash is one thing, delivery of said bash script over http:// is quite another.

An https link should be the default IMHO.

edit: the tool looks really useful though.

degio · on April 3, 2014

You're right, you can use this:

https://s3.amazonaws.com/download.draios.com/stable/install-...

We'll update the page in a second

simonebrunozzi · on April 3, 2014

Well done Loris. Agree https would be a good default.

simonebrunozzi · on April 3, 2014

Wow, this is really great. From the creator of Wireshark, nonetheless :)

dfc · on April 3, 2014

Wrong.

Gerald Combs is the creator of Ethereal/Wireshark: https://www.wireshark.org/about.html

kristopolous · on April 4, 2014

He's on the team.

nodata · on April 4, 2014

> Wrong.

Less Dwight please.

dfc · on April 4, 2014

krakensden · on April 4, 2014

Given that it involves a kernel module, I was kind of skeptical- but Greg KH seems to have looked it over and fixed it up, which I'd call a compelling seal of approval:

https://github.com/draios/sysdig/commits/master/driver

perryh2 · on April 3, 2014

This tool is very similar to what I had created last summer as an intern (strace/lsof analysis), but it seems to be a lot more rich in features. I analyzed system calls as well as application tracing (New Relic) to find/fix performance bottlenecks.

mh- · on April 3, 2014

is the strace analysis stuff open source? have been messing around with creating something reusable in my spare time, based on hacky methods I've long been using. would be interested to see.

mesuutt · on April 4, 2014

I am getting error during compiling on Arch linux:

https://github.com/draios/sysdig/issues/39

Has anyone encounter with this error before? Any help would be appreciated.

neuronsourcing · on April 4, 2014

After installing sysdig, when I trying to run it I get the following error:

# sysdig fd.type=ipv4

error creating the process list

Has anyone seen this one before? Any help would be appreciated.

degio · on April 4, 2014

It looks like you don't have enough privileges to read /proc. Are you using this as root?

digitalyatri · on April 4, 2014

Some observations

sudo sysdig -w file1.log

file1.log contains lots of junk characters (fix this) ^@^@^@^@^@^@^@^@^@^@^@^@^

Better alternative

sudo sysdig > file2.log

file has proper logs

gighi · on April 4, 2014

That's the wrong way to use it.

"sysdig -w" switch will generate a binary dump (in a pcap format) containing the "raw events" coming from the kernel (plus a snapshot of information gathered from /proc), so it's not supposed to be human-readable, you have to use "sysdig -r" on the dump file to get the output.

If you're used to tcpdump, it's the same thing.

digitalyatri · on April 4, 2014

My bad, works well with -r

krakensden · on April 4, 2014

`less -R`?

pinturic · on April 4, 2014

It is amazing how easy it seams to collect such information with this tool

wesleyac · on April 4, 2014

Just looked at the website, and had a very "small world" feeling:

They're located in my town O.o