Brilliant. (I note that the founder of this product co-authored the Winpcap packet capture library on Windows back in the day, and therefore made Ethereal/Wireshark on Windows possible.)
Nice work! I sometimes use strace/ltrace and this program would certainly be nice addition my toolbox!
One thing I find useful sometimes for debugging purposes is to actually see the contents of each system call. Do I get to see that when I click on individual boxes here?
This guy talks of 'latencies' when he really means 'durations'.
The latency is the time between stimulation and response, not the overall duration.
In this case the latency of a syscall would be the time between a user program performing a syscall and it starting to operate, not the entire time taken.
The way people talk about syscalls depends on your context/point of view. For example doing `open()` for the system is duration, because it does the work of opening a file. For the app, it may as well be latency - how long you're stopped before the file is opened.
Since the app doesn't actually execute the open code, it is latency -> from stimulation (called), to response (returned) is the latency of file access. For the system code latency would be waiting for the disk controller. For the disk controller latency would be waiting for the heads/platters.
I would disagree with that interpretation. I would talk about the duration of the open() call, in the same way as when you're optimising code, you talk about the duration of a function call, not its latency.
Sadly, misuse of the term is rife in software circles.
Latency is correctly used when talking about how long an ISR takes to start running when provoked by an external stimulus, or how long it takes for a task to be scheduled when made ready.
Latency is a very precise term. a syscall, as its name suggests is a 'call', and 'calls' are not considered to be instantaneous.
> and 'calls' are not considered to be atomic in any way.
A lot of syscalls should be considered atomic with regards to resources. open() has at least two options this applies to. What kind of atomic do you mean?
Was intending to say that calls are never considered to be instantaneous from either user or sys point of view, they always have duration.
Latency on the other hand is almost always about scheduling, whether OS-level scheduling or hardware (ISR) scheduling.
Another case; network latency is the time between a packet being sent and it being received.... not the time taken to process that packet.
That is a direct analogy to what we're talking about here.
I still think you need to specify what latency are we talking about. Scheduling latency is about scheduling. Network interface latency is about putting data in the buffer and then on the wire. Network latency is about actually delivering the packet. Web page latency is about time-to-render. First few pages of google results about various types of latency always qualify it with some other word, so it's not clear anymore what people mean if they just say "latency".
These are all both durations in one context and latencies in another. For the app syscalls are "how long did I have to wait for that call to give me a result" so a latency and a duration as well. (just because it's a period of time)
This indicates that 'web page' latency is nothing to do with redraw-duration, but more akin to network-latency, but for the page as a whole. In comparison to redraw-duration (which is entirely client-side, web-page latency is the time to transit the network... i.e. latency.
In all the above cases, 'latency' is the delay between the stimulus and response, not the processing time.
I'll leave you with the top Google result for 'latency':
"Latency is a time interval between the stimulation and response, or, from a more general point of view, as a time delay between the cause and the effect of some physical change in the system being observed."
I also sometimes feel a bit ... challenged ... by translating the questions I have (e.g. why did this arbitrary program start using a lot of memory and then OOM) into actual sysdig chisel invocations, but I'm learning slowly but surely. This command line spectrogram looks like a really nice addition to the existing toolset!
I wonder if the visual representation should somehow emphasize the slow calls more. The example on the page immediately draws the attention to that cluster of many fast calls, when the interesting part for optimizations is likely in the 100ms and slower range.
Sysdig is absolutely incredible software, but when are you going to work on getting it upstream in the Linux kernel? That will massively lower the barrier to entry and make sysdig "win" so to speak.