50 Shades of System Calls

carlsborg · on Feb 24, 2016

Brilliant. (I note that the founder of this product co-authored the Winpcap packet capture library on Windows back in the day, and therefore made Ethereal/Wireshark on Windows possible.)

truncate · on Feb 23, 2016

Nice work! I sometimes use strace/ltrace and this program would certainly be nice addition my toolbox!

One thing I find useful sometimes for debugging purposes is to actually see the contents of each system call. Do I get to see that when I click on individual boxes here?

degio · on Feb 23, 2016

Blog post author here.

Yes, when you drill down using the mouse you will get the relevant system calls, including the buffers if they are I/O reads and writes.

Also, sysdig and csysdig have pretty advanced system call capture and filtering functionality. See this link for an introduction https://github.com/draios/sysdig/wiki/Sysdig%20User%20Guide

truncate · on Feb 23, 2016

Thanks for answering. Small suggestion - its kind of hard to find Github link. Perhaps make it more visible.

raldu · on Feb 23, 2016

It is fun to observe that everyone might be busy tinkering with mentioned features instead of commenting.

TickleSteve · on Feb 24, 2016

This guy talks of 'latencies' when he really means 'durations'.

The latency is the time between stimulation and response, not the overall duration.

In this case the latency of a syscall would be the time between a user program performing a syscall and it starting to operate, not the entire time taken.

viraptor · on Feb 24, 2016

The way people talk about syscalls depends on your context/point of view. For example doing `open()` for the system is duration, because it does the work of opening a file. For the app, it may as well be latency - how long you're stopped before the file is opened.

Since the app doesn't actually execute the open code, it is latency -> from stimulation (called), to response (returned) is the latency of file access. For the system code latency would be waiting for the disk controller. For the disk controller latency would be waiting for the heads/platters.

TickleSteve · on Feb 24, 2016

I would disagree with that interpretation. I would talk about the duration of the open() call, in the same way as when you're optimising code, you talk about the duration of a function call, not its latency.

Sadly, misuse of the term is rife in software circles.

Latency is correctly used when talking about how long an ISR takes to start running when provoked by an external stimulus, or how long it takes for a task to be scheduled when made ready.

Latency is a very precise term. a syscall, as its name suggests is a 'call', and 'calls' are not considered to be instantaneous.

(Edit: instantaneous, not atomic)

Duration is the correct term for syscalls.

viraptor · on Feb 24, 2016

> and 'calls' are not considered to be atomic in any way.

A lot of syscalls should be considered atomic with regards to resources. open() has at least two options this applies to. What kind of atomic do you mean?

TickleSteve · on Feb 24, 2016

Sorry, didn't mean atomic.

Was intending to say that calls are never considered to be instantaneous from either user or sys point of view, they always have duration.

Latency on the other hand is almost always about scheduling, whether OS-level scheduling or hardware (ISR) scheduling.

Another case; network latency is the time between a packet being sent and it being received.... not the time taken to process that packet. That is a direct analogy to what we're talking about here.

viraptor · on Feb 24, 2016

I still think you need to specify what latency are we talking about. Scheduling latency is about scheduling. Network interface latency is about putting data in the buffer and then on the wire. Network latency is about actually delivering the packet. Web page latency is about time-to-render. First few pages of google results about various types of latency always qualify it with some other word, so it's not clear anymore what people mean if they just say "latency".

These are all both durations in one context and latencies in another. For the app syscalls are "how long did I have to wait for that call to give me a result" so a latency and a duration as well. (just because it's a period of time)

TickleSteve · on Feb 24, 2016

Regarding Web Page latency.... I would expect web developers to be more slack on these types of definitions as they are so far removed from the hardware. Still, having said that: http://blog.iweb.com/en/2014/02/understanding-analyzing-redu...

This indicates that 'web page' latency is nothing to do with redraw-duration, but more akin to network-latency, but for the page as a whole. In comparison to redraw-duration (which is entirely client-side, web-page latency is the time to transit the network... i.e. latency.

In all the above cases, 'latency' is the delay between the stimulus and response, not the processing time.

I'll leave you with the top Google result for 'latency':

"Latency is a time interval between the stimulation and response, or, from a more general point of view, as a time delay between the cause and the effect of some physical change in the system being observed."

jolynch · on Feb 24, 2016

Sysdig makes me giddy like dtrace used to.

I also sometimes feel a bit ... challenged ... by translating the questions I have (e.g. why did this arbitrary program start using a lot of memory and then OOM) into actual sysdig chisel invocations, but I'm learning slowly but surely. This command line spectrogram looks like a really nice addition to the existing toolset!

perlgeek · on Feb 24, 2016

Wow, that looks really impressive.

I wonder if the visual representation should somehow emphasize the slow calls more. The example on the page immediately draws the attention to that cluster of many fast calls, when the interesting part for optimizations is likely in the 100ms and slower range.

SEJeff · on Feb 24, 2016

Sysdig is absolutely incredible software, but when are you going to work on getting it upstream in the Linux kernel? That will massively lower the barrier to entry and make sysdig "win" so to speak.