I’m curious which part of these tenets would feel would have prevented the bug demonstrated, besides “oh we tried harder”? I don’t see any of those that seem unique to DTrace other than limiting where probes can be placed.
Well, we didn't merely "try harder" -- we treated safety as a constraint which informed every aspect of the design. And yes, treating safety as a constraint rather than merely an objective results in different implementation decisions. From the article:
This working model significantly increases the attack surface of the kernel, since it allows executing arbitrary code at a high privilege level. Because of this risk, programs have to be verified before they can be loaded. This ensures that all eBPF security assumptions are met. The verifier, which consists of complex code, is responsible for this task.
Given how difficult the task of validating that a program is safe to execute is, there have been many vulnerabilities found within the eBPF verifier. When one of these vulnerabilities is exploited, the result is usually a local privilege escalation exploit (or container escape in containerized environments). While the verifier’s code has been audited extensively, this task also becomes harder as new features are added to eBPF and the complexity of the verifier grows
DTrace was developed over 20 years ago; there have not been "many vulnerabilities" found in the verifier -- and we have not grown the complexity of the verifier over time. You can dismiss these as implementation details, but these details reflect different views of the problem and its contraints.
No, like, the bug that was demonstrated seems to be fairly fundamental to running any sort of bytecode in the kernel: they need to verify all branches, and this is potentially slow, so they optimize it (which is where the bug is). What are you doing differently? It seems to me that you’re either not going to optimize this or you are?
The DTrace instruction set is more limited than that of the eBPF VM; eBPF is essentially a fully functional ISA, where DTrace was (if I'm remembering this right) designed around the D script language. An eBPF program is often just a clang C program, and you're trusting the kernel verifier to reject it if it can't be proven safe. Further: eBPF programs are JIT'd to actual machine code; once you've loaded and verified an eBPF program, it has conceptually all the same power as, say, shellcode you managed to load into the kernel via an LPE.
That's not to say that security researchers couldn't find DTrace vulnerabilities if they, for instance, built DIF/DOF fuzzers of 2023 levels of sophistication for them. I don't know that anyone's doing that, because DTrace is more or less a dead letter.
For those who read this thread - DTrace is in use in Solaris and in Illumos, and various of us who use Illumos for our production use cases (like Oxide does) still very much use DTrace.
I appreciate the rest of tptacek's comment which is informative. I also acknowledge that there may not be fuzzers written that have been disclosed.
Oh, sorry, totally fair call-out. There's like a huge implicit "on Linux" thing in my brain about all this stuff.
I'd also be open to an argument that the code quality in DTrace is higher! I spent a week trying to unwind the verifier so I could port a facsimile of it to userland. It is a lot. My point about fuzzers and stuff isn't that I'm concerned DTrace is full of bugs; I'd be surprised if it was. My thing is just that everything written in memory unsafe kernel code falls against Google Project Zero-grade vulnerability research, at some point.
That's true of the rest of the kernel, too! So from a threat perspective, maybe it doesn't matter. I think my bias here --- that's all it is --- is that neither of these instrumentation schemes are things I'd want to expose to a shared-kernel cotenant.
- it cannot branch backwards (this is also true of eBPF)
- it can only do ternary operator branches
- it cannot define functions
- functions it can call are limited to some builtin ones
- it can only scribble on the one pre-allocated probe buffer
- it can only access the probe's defined parameters
If the verifier can prove to itself that a loop is bounded, it'll accept it. A good starting place for eBPF itself: if a normal ARM program could do it, eBPF can do it. It's a fully functional ISA.
It depends on what you're using it for. If you want to expose this to untrusted code, yes, but I wouldn't be comfortable doing that with DTrace either.
There's two untrusted code cases here: untrusted DTrace scripts / users, and untrusted targets for inspection. The latter has to be possible to examine, so the observability tools (like DTrace) have to be secure for that purpose. This means you want to make it difficult to overflow buffers in the observability tools.
There's also a need to make sure that even trusted users don't accidentally cause too much observability load. That's why DTrace has a circular probe buffer pool, it's why it drops probes under load, it's why it pre-allocates each probe's buffer by computing how much the probe's actions will write to it, it's why it doesn't allow looping (since that would make the probe's effect less predictable), etc.
Bryan, Adam, and Mike designed it this way two plus decades ago, and Linux still hasn't caught up.
Linux has a different design than DTrace; eBPF is more capable as a trusted tool, and less capable for untrusted tools. It doesn't make sense to say one approach has "caught up" to the other, unless you really believe the verifier will reach a state where nobody's going find verifier bugs --- at which point eBPF will be strictly superior. Beyond that, it's a matter of taste. What seems clearly to be true is that eBPF is wildly more popular.
It's really hard to bring a host to its knees using DTrace, yet it's quite powerful for observability. In my opinion it is better to start with that then add extra power where it's needed.
I understand the argument, but it's clear which one succeeded in the market. Meanwhile: we take pretty good advantage of the extra power eBPF gives us over what DTrace would, so I'm happy to be on the golden path for the platform here. Like I said, though: this is a matter of taste.
And I should say that DTrace probe actions can dereference pointers, but NULL dereferences do not cause crashes, and rich type data is generally available.