Hacker News new | past | comments | ask | show | jobs | submit login

Thank you for your feedback.

> The real issue with debuggers is that ptrace is a pretty broken API.

Please note that this article focuses on NetBSD and FreeBSD first.

As your comment describes only one OS (Linux) please do not generalize as your comment seems to use truth sparingly.

The ptrace(2)/NetBSD API design and implementation is free from all of the difficulties you mentioned in your post.

> Supporting things like spawning threads, forking processes, fork+exec, etc. is difficult, and full of race conditions that are difficult to code correctly.

The difficulty of catching LWP creation events:

ptrace_event_t event = {}; event.pe_set_event = PTRACE_LWP_CREATE; ptrace(PT_SET_EVENT_MASK, child, &event, sizeof(event))

Then whenever a debuggee creates a child, it's fully stopped (so called all-stop mode from GDB) and reported to the debugger by sending a signal that is wait(2)ed.

Then, investigate the debuggee event through checking the signal passed (SIGTRAP) and investigating siginfo_t that contains new thread identifier.

Then, you can resume the whole process with a single PT_CONTINUE.

> forking processes

Same for forking, use PT_SET_EVENT_MASK+PTRACE_FORK. Fork events are reported for the forking parent and forked child. As you poll on events on a single PID only (for all events for all threads within a process), you have the deterministic order of reporting the forked parent first always, followed by polling for the forked child (you know its PID from SIGTRAP + siginfo_t submitted to the parent).

> fork+exec

This is a matter of catching EXEC and FORK events separately. All exec() events are reported as SIGTRAP + siginfo_t specifying TRAP_EXEC. No big deal.

> is difficult, and full of race conditions that are difficult to code correctly

I push this comment to the free market of opinions of the readers.

> Attaching to running multithreaded processes is another challenge.

It's 1-liner always:

ptrace(PT_ATTACH, pid, NULL, 0);

No matter whether this is a single-threaded or multi-threaded process.

> Writing a debugger that can correctly handle multithreaded applications is challenging,

Again, I defer this question to the free market of opinions.

> the documentation gives you zero insight into what the potential pitfalls are,

Please list the pitfails so we can improve the documentation!

> and almost all examples are similarly uninformative, being too complex for their use case.

There are a few hundreds of ptrace programs in NetBSD executing each small feature in minimal code. This is embedded into the regression test framework (ATF). This code can be reused (good license + simple) in 3rd party software.

For external examples, I recommend the most minimal event tracker of debuggers, that I wrote here:

https://github.com/krytarowski/picotrace

In particular, you can trace all events possible in all types of programs (at least in the current version of ptrace(2)) in around 300 LOC, as noted here:

https://github.com/krytarowski/picotrace/blob/master/common/...

FreeBSD has a distinct ptrace(2) API, but not far from NetBSD and is relatively comparable and quickly portable from one BSD to another.

If you have got any more questions or comments, do not hesitate to ask!




I'm speaking from a mostly Linux perspective (and macOS-but that API is crippled so even though it's likely more similar I won't mention it much)–while I'll take your word for it that NetBSD has a better API, I am still curious if the various edge cases are handled. Are there multiple stop kinds that are somewhat difficult to distinguish against and keep track of? How you handle a child dying while stopped, or is this not possible? I don't think NetBSD has the same "tasks" model that Linux does, how do you distinguish between requests targeting threads and requests targeting the whole process? How do the rest of the OS APIs interact with a ptrace stopped process?


> Are there multiple stop kinds that are somewhat difficult to distinguish against and keep track of?

Every type of an event has a dedicated pair of SIGNAL + SI_CODE (in siginfo_t).

The types of events are as follows: regular signals (usually not interesting for a debugger - NetBSD can mask them with PT_SET_SIGPASS), crashes (SIGSEGV, SIGFPE, SIGILL, SIGBUS) and debugger related events (SIGTRAP). Then, each debugger related event is distinguishable with checking si_code inside siginfo_t (TRAP_TRACE, TRAP_BRKP, TRAP_CHLD, TRAP_LWP, TRAP_DBREG, TRAP_SCE, TRAP_SCX) and in a few more cases with the additional ptrace(2) call PT_GET_PROCESS_STATE that can query additional information (spawned/exited thread; forked/vforked/spawned process).

Thus we always have the exact thread + type of event.

There is one tricky case that is harder to code. It's related to hardware assisted watchpoints, especially in multi-threaded processes with concurrent events of all kinds. We need to diligently handle the context of x86 Debug Registers that delivers the additional information about the fired hardware assisted watchpoint/breakpoint.

> How you handle a child dying while stopped, or is this not possible?

A stopped and traced child cannot just die, but it could be killed with SIGKILL. Then further ptrace(2) calls fail on it.

> I don't think NetBSD has the same "tasks" model that Linux does, how do you distinguish between requests targeting threads and requests targeting the whole process?

Generally, we have a pair of PID (process) + LWP (thread). We have got per-process + per-thread ptrace(2) operations. Whenever a thread is meaningful, like in the management of register contexts, we pass LWP as the 4th argument of the ptrace(2) call or embed in a structure transmitted from/to the kernel.

In Linux, the ptrace(2) call is per-thread only, which allows some flexibility (the GDB non-stop mode), but introduces the complexity of the management. The NetBSD kernel serializes the events inside the kernel and stops all the threads before returning to the debugger.

> How do the rest of the OS APIs interact with a ptrace stopped process?

It's an internal detail whether a process is stopped by a debugger, by the terminal or actively running. This is orthogonal to other system APIs. Generally we try to make the fact of being traced to be transparent to other applications, for example we fake the parent PID (after reparenting, that happens after attach). There are some corner cases and real bugs in applications that are exposed under a debugger, such as missing EINTR handling.

The NetBSD Project over the past few years significantly improved in the domain of debuggers (GDB, LLDB) and developer-oriented tooling (sanitizers, compilers). Thus, there is still room for improvement!


> Please note that this article focuses on NetBSD and FreeBSD first.

There's a reason I prefaced my comment with Linux debuggers. I haven't played much with BSD kernels to know how problematic debuggers are there.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: