I'm of a similar mind. However, somewhat recently I came across this article which helped provide a framework to think about all these things. Turns it that it's not just a flat space of competing tools:
- Some of those are not in-tree, like LTTng and SystemTap.
- Tracepoints, kprobes, events, and uprobes are all event libraries used by perf or ftrace, just like DTrace had multiple providers (fbt, pid, etc).
The real fragmentation is perf and ftrace, since both are in-tree front ends. That's not too bad, and they both have different strengths.
eBPF is weird in that it's neither an event library or a front end. It's programmatic capabilities. We're mostly using an out of tree project, bcc, to run it.
What's fragmented about it? Almost everything he showed there was a script that use ftrace, kprobes, or BPF to measure something specific. Since those are all available in the kernel at the same time, you can certainly think of them as a single API.
I think lttng has kernel tracing. I don't know why the fragmentation you describe is "bad", though; it really depends on the tools themselves.
If there's just a variety of tools for the same task, then that's healthy competition and how you get better software.
If no single tool can fulfill all your tracing needs, that's still not necessarily a condemnation of the tools. It's entirely possible that each tool can complete a subset of tasks, but is significantly simpler to use as a result, so SUM(effort to learn tools you need) may still be comparable to the effort of a theoretical omni-tool.
He didn't mention this in this snippet, but the BCC (BPF Compiler Collection) intends to make this much simpler[1]. In particular it lets you write a tracer in Python (with the BPF program written in C) that attaches the BPF program to whatever types of probe points you like. So while internally there might be all this fragmentation a user shouldn't have to deal with it as much.
Brendan used to be Mr. DTrace User. (Not Mr. DTrace -- that was bmc, ahl, and mws.) But the world isn't using Solaris or FreeBSD, so I guess he moved on like most of the rest of us Solaris diaspora. Still, every time I see one of Brendan's blogs I know, deep down, he must miss DTrace; I sure do. This video doesn't help me feel at home with Linux, but it's a resource for when I need to trace something. Mostly though, when I have to debug something on Linux, I do it the pre-DTrace way, which is to say: the hard way.
The good (and bad) thing about all of the technologies that make up "containers" on Linux is that they can be used by separate projects. Chromium uses seccomp, systemd uses namespaces and cgroups, a bunch of tools use AppArmor/SELinux.
But ultimately the reason that this is the current state is because of how Linux is developed. Trying to push something like Jails or Zones is an exercise in futility because the patchset would be too large, would touch everything, and the infrastructure would likely not be reusable by other people.
It's like playing a video game, I'd imagine. The sound effects are the bane of everyone's existence but your own. Seriously though, you don't play your _game boy_ in public without headphones; let alone present while playing one! I'm amazed -- so, so tempted to turn of the preso even if I was fascinated by the content.
BPF really seems nice. Ramifications to me though are: if I was willing to pay a few percent overhead on all my production instances, what I would be able to monitor 24/7 and get a return on the investment, and I haven't found much writing in that area.
Seems like there could be a lot of opportunity, hopefully I'll get a chance to dive in and find out myself.
It should be something like: look at your bottlenecks and utilization, look at your costs, look at (cost effective) ways to reduce or remove those bottlenecks or that utilization. Pick the cheapest place to have a bottleneck. Using SSD at an extra 30$ a month lets you use half the CPU and RAM, saving 60$ a month? Go for it.
I feel like despite all this progress, sysdig is still the most accessible solution at the moment. It even includes a slowish, but super simple way of tracing user space (you can even write traces from bash scripts). I wish there was a built-in Linux equivalent.
perf with source annotation is pretty nice if you're profiling for individual hotspots. But I have not found any solution that lets me spot amdahl bottlenecks which get drowned out in raw cycles spent by the parallel parts. In java this is trivial with thread utilization timlines that incorporate sampling.
Maybe this could be solved by weighting samples by the inverse of number of running threads at the time
BPF was used to do the aggregation and calculation in-kernel. You still need ftrace to actually run the BPF program in that context. You can read the cover page for the patch that added this in 2015[1].
Right; plus theres some capabilities where ftrace is (and maybe always will be) better. Eg, function counting: ftrace can count all kernel functions instantly (try my perf-tools funccount tool), whereas the BPF method involves setting a kprobe on everything, which takes much longer (setup and tear down). And function graph tracing from ftrace will likely be better than anything we can do in BPF (as it uses tracing all functions as well).
Five years ago I tried to make some sense of this by researching all of the existing technologies. In the kernel I found:
Now apparently we can add: In user-space we have: This talk seems to add a bunch of other fragmented user-space tools.I don't mean to put down anybody's work, but this stuff will never be user-friendly as long as it remains so fragmented, IMHO.