Thank you for your feedback. > The real issue with debuggers is that ptrace is a...

saagarjha · on Oct 23, 2020

I'm speaking from a mostly Linux perspective (and macOS-but that API is crippled so even though it's likely more similar I won't mention it much)–while I'll take your word for it that NetBSD has a better API, I am still curious if the various edge cases are handled. Are there multiple stop kinds that are somewhat difficult to distinguish against and keep track of? How you handle a child dying while stopped, or is this not possible? I don't think NetBSD has the same "tasks" model that Linux does, how do you distinguish between requests targeting threads and requests targeting the whole process? How do the rest of the OS APIs interact with a ptrace stopped process?

krytarowski · on Oct 24, 2020

> Are there multiple stop kinds that are somewhat difficult to distinguish against and keep track of?

Every type of an event has a dedicated pair of SIGNAL + SI_CODE (in siginfo_t).

The types of events are as follows: regular signals (usually not interesting for a debugger - NetBSD can mask them with PT_SET_SIGPASS), crashes (SIGSEGV, SIGFPE, SIGILL, SIGBUS) and debugger related events (SIGTRAP). Then, each debugger related event is distinguishable with checking si_code inside siginfo_t (TRAP_TRACE, TRAP_BRKP, TRAP_CHLD, TRAP_LWP, TRAP_DBREG, TRAP_SCE, TRAP_SCX) and in a few more cases with the additional ptrace(2) call PT_GET_PROCESS_STATE that can query additional information (spawned/exited thread; forked/vforked/spawned process).

Thus we always have the exact thread + type of event.

There is one tricky case that is harder to code. It's related to hardware assisted watchpoints, especially in multi-threaded processes with concurrent events of all kinds. We need to diligently handle the context of x86 Debug Registers that delivers the additional information about the fired hardware assisted watchpoint/breakpoint.

> How you handle a child dying while stopped, or is this not possible?

A stopped and traced child cannot just die, but it could be killed with SIGKILL. Then further ptrace(2) calls fail on it.

> I don't think NetBSD has the same "tasks" model that Linux does, how do you distinguish between requests targeting threads and requests targeting the whole process?

Generally, we have a pair of PID (process) + LWP (thread). We have got per-process + per-thread ptrace(2) operations. Whenever a thread is meaningful, like in the management of register contexts, we pass LWP as the 4th argument of the ptrace(2) call or embed in a structure transmitted from/to the kernel.

In Linux, the ptrace(2) call is per-thread only, which allows some flexibility (the GDB non-stop mode), but introduces the complexity of the management. The NetBSD kernel serializes the events inside the kernel and stops all the threads before returning to the debugger.

> How do the rest of the OS APIs interact with a ptrace stopped process?

It's an internal detail whether a process is stopped by a debugger, by the terminal or actively running. This is orthogonal to other system APIs. Generally we try to make the fact of being traced to be transparent to other applications, for example we fake the parent PID (after reparenting, that happens after attach). There are some corner cases and real bugs in applications that are exposed under a debugger, such as missing EINTR handling.

The NetBSD Project over the past few years significantly improved in the domain of debuggers (GDB, LLDB) and developer-oriented tooling (sanitizers, compilers). Thus, there is still room for improvement!

jcranmer · on Oct 23, 2020

> Please note that this article focuses on NetBSD and FreeBSD first.

There's a reason I prefaced my comment with Linux debuggers. I haven't played much with BSD kernels to know how problematic debuggers are there.