Writing a debugger from scratch: Breakpoints

whartung · on Sept 27, 2023

I’ll share this anecdote told by a friend of mine.

He was on a team building a Modula-2 compiler for OS/2, and his group was working on the debugger.

At some point a debugger becomes feature complete enough that you can use the debugger to ... debug the debugger.

But this was OS/2 which has true multiple processes (unlike it’s contemporary Windows 3.1). So you could, naturally, run the debugger in one process and attached it to another process which, just so happens to be another instance of the debugger.

As with all things, while doing this they encountered bugs in the debugger that, well, needed to be debugged.

He said there was a certain epiphany when they realized, because of the multi process nature of OS/2, that they could debug the debugger debugging the debugger.

I would imagine this took a bit of focus. Turn away for a moment and probably really messes with your head.

timmisiak · on Sept 27, 2023

I think my record when I was on the WinDbg team was 5 debuggers deep.

I honestly think one of the best parts of writing a debugger is being your own recursive customer. I think that's something you only get to do for a few things. Debuggers, languages/compilers, and operating systems. And probably a few others.

lzybkr · on Sept 27, 2023

Not 5 levels, but I once wrote debug visualizers for a compiler using funceval (the visualizer uses the debugger to run code in the target process).

I think I once had to debug the debugger debugging the compiler compiling itself which felt like another really weird kind of recursion.

robertlagrant · on Sept 27, 2023

Font authors?

cbsks · on Sept 27, 2023

I worked at a company that made JTAG probes. When you wanted to debug the firmware on the probe you’d attach another probe to it. And if you encountered a bug while debugging that probe, then you’d attach another probe…

sroussey · on Sept 27, 2023

Focus is still an issue today. We had a version of Firebug that would let you debug Firebug. It was great! A bit buggy though, so you can see where this is going…

That said, even today when debugging Chrome DevTools with Chrome DevTools, window placement is key!!! Ideally, different screens. That keeps the mind clear.

meepmorp · on Sept 27, 2023

It's bugs all the way down.

mywittyname · on Sept 27, 2023

I want to hear stories about debugger-on-debugger heisenbugs.

gpderetta · on Sept 27, 2023

Surely you would then attach the debugger being debugged to the original debugger to debug it and stop the recursion (and of course instantly deadlock).

ekidd · on Sept 27, 2023

This is a great series!

I noticed that the author was using https://github.com/hydro-project/rust-sitter as a parser. Which is based on https://tree-sitter.github.io/tree-sitter/. I've been hearing about Tree-sitter a lot recently, so I dug into it.

Tree-sitter is a tool for generating fast, incremental parsers. In particular, the algorithm is suited towards writing "language servers" for IDEs, which re-parse code incrementally as the user works. These kinds of incremental parsers have historically been a huge problem. It looks like Tree-sitter is an enormous practical advance in this area.

And discovering that there's a way to use Tree-sitter from Rust is fantastic. From the post:

    #[rust_sitter::language]
    pub enum EvalExpr {
        Number(
            #[rust_sitter::leaf(
                pattern = r"(\d+|0x[0-9a-fA-F]+)",
                transform = parse_int
            )]
            u64
        ),
        Symbol(
            #[rust_sitter::leaf(
                pattern = r"(([a-zA-Z0-9_@#.]+!)?[a-zA-Z0-9_@#.]+)",
                transform = parse_sym
            )]
            String
        ),
        // ...

Getting easy access to fast, incremental parsing is a huge win. And Tree-Sitter has support for being used from a huge list of languages, not just Rust.

junon · on Sept 27, 2023

Tree sitter also has a bunch of deficiencies that don't make it ideal for a number of usecases, or sort of act bizarrely in some edge cases. Just evaluate tools like this cautiously, of course. But I like what it's done for the ecosystem as a whole!

a1o · on Sept 27, 2023

I still don't know how to deal with forward declarations when using tree-sitter. :/

jamra · on Sept 27, 2023

Because tree-sitter lexes as it parses, you may have to use an external scanner in order to deal with this kind of stuff. Where are you stuck trying to deal with forward declarations?

a1o · on Sept 27, 2023

It's a simple parser that was originally made to be used through Atom that I would like to repurpose elsewhere

https://github.com/edmundito/tree-sitter-ags-script/issues/1

If this could be solved, we could port this AGS Script parser to the AGS Editor. Today, the parser Adventure Game Studio uses for the needs like auto-complete and it's very simple refactor like things uses a custom handmade parser built in C#. I think if we could leverage tree-sitter we could speed things up and repurpose it to build things like a LSP for AGS Script.

jamra · on Sept 27, 2023

It looks like a bug in the grammar. I’ll bookmark this and see if I can make time for it later. Probably won’t be able to. I recently build a grammar from scratch so I’m okay at tree sitter

a1o · on Sept 28, 2023

Oh, but if you do find time I would be externally grateful! :) The Tree Sitter generated parser is amazing for being super fast and also for being able to tolerate partially written code. This working would mean a lot for the AGS community.

jamra · on Sept 27, 2023

The part that makes tree-sitter useful for this kind of thing is the error recovery. It's hard to do error recovery correctly. Tree-sitter gives you the ability to continue parsing your code which makes it useful for authoring tools.

timmisiak · on Sept 27, 2023

Absolutely, rust sitter is fantastic. I haven't used any other parsers in Rust so I don't have much of a comparison point, but it's probably hard to get much more clear and concise, which I think really helps.

danparsonson · on Sept 27, 2023

Great article, thanks - one question I couldn't see answered there is, what do you do when you want to set more than four breakpoints at once?

ithkuil · on Sept 27, 2023

You use software breakpoints.

Basically you overwrite the instruction you want to break at with a breakpoint instruction (e.g. int 3 on x86). This will cause the process to trap and the OS will then let the debugger process know about out somehow, e.g. via the SIGTRAP signal on Unix.

The debugger then replaces the int 3 opcode (which is a single byte conveniently) with the first byte of the original instruction so that the execution can continue.

hinoki · on Sept 27, 2023

If you revert the int 3 to the original instruction’s byte, when do you put it back? The breakpoint could still be active.

In a trivial example, the breaking instruction could be a jump to itself, which you’d expect to immediately break into the debugger again.

I thought the debugger had to emulate the instruction instead, but it’s not like I’ve ever implemented one…

i_don_t_know · on Sept 27, 2023

I believe when you resume the debugger, you can tell the process/thread to single-step over one instruction. So it's something like this:

1. Overwrite instruction with int 3.

2. When you hit the breakpoint, restore the original instruction.

3. Single-step over the original instruction by changing the thread's EFlags (Intel).

4. Restore the breakpoint with int 3.

5. Resume normally.

hinoki · on Sept 27, 2023

Wouldn’t that race against any other thread in the process? I guess you could stop all threads when you hit the breakpoint and start them again after you restore the breakpoint, but the synchronisation of that would be really tricky too.

Veserv · on Sept 27, 2023

Yes. And yes that is one of the ways to solve it.

You could also do something like have a clean mapping table (i.e. the code with no breakpoints installed) that you install for just the thread doing the step. You then revert back to the normal mapping table with the breakpoint after the step. As you are only modifying the executable section, as long as you are not using self-modifying code, there should be no data inconsistency with having a multiple copys of the executable transiently.

ithkuil · on Sept 27, 2023

Emulation is an option, rotating hardware debug registers is another option, detecting self-jumps is another option.

I really only implemented a debugger for the esp8266 and it was just good enough for me and my team to get our job done so it didn't handle many edge cases like that

spc476 · on Sept 27, 2023

As i_don_t_know stated, if the CPU has the ability to single step an instruction, you use that. Otherwise:

* Restore the original instruction byte.

* Find the next instruction, and set a temporary software breakpoint there.

* Resume the one instruction

* Restore the original instruction byte at the temporary software breakpoint.

* Set the software breakpoint in the original instruction

* Resume running

The other thing to keep in mind is dealing with JMP, CALL and conditional branch instructions. It can get pretty messy pretty quick, which is why I find low level debuggers on old 8-bit CPUs a marvel as they had to deal with only software breakpoints.

timmisiak · on Sept 27, 2023

Yes, software breakpoints are difficult to get correct (the main reason why I started with hardware breakpoints). It gets more complicated with kernel debugging, where a single step (trap flag) could get pre-empted by an interrupt handler. And you can't always single-step a CPU and leave all other CPUs frozen.

vinge · on Sept 27, 2023

Others have already mentioned software breakpoints where the instruction is replaced, another option is to run the code in an emulator that supports a virtually unlimited set of breakpoints. For example, using QEMU with its GDB stub.

saagarjha · on Sept 27, 2023

Use software breakpoints (which are mentioned but not described, the short story for those is you overwrite the address you care about with an illegal instruction and execution traps when it encounters that code, and then you undo it to continue).

parttimenerd · on Sept 27, 2023

I've done something similar in Python: A Python debugger from scratch in Python

- https://github.com/parttimenerd/python-dbg/ - Part 1: https://mostlynerdless.de/blog/2023/09/20/lets-create-a-pyth... - Part 2: https://mostlynerdless.de/?p=1102&preview=1&_ppp=a17cda3e36

elischleifer · on Sept 27, 2023

Working at Microsoft back in the early 00s I spent a lot of unfriendly hours with windbg. On one particular project we hunted for a terrible crash for months until it was uncovered that we were compiling against the single thread CRT when using threads extensively...whoops

zubairq · on Sept 27, 2023

Really nice read. Does anyone know any other good articles or videos about how to write a debugger?

jansommer · on Sept 27, 2023

I wrote this: https://ja.nsommer.dk/articles/x86-debugger-for-windows-and-...

It's a debugger for Windows (and Wine), like the one in the article, written in C. It uses software breakpoints (infinite breakpoints)

mrazomor · on Sept 27, 2023

I really liked https://blog.tartanllama.xyz/writing-a-linux-debugger-setup/

I learned a lot.

emmanueloga_ · on Sept 27, 2023

I was asking this myself this while reading the book "Crafting Interpreters". I posted a few resources I found on an issue about implementing debuggers [1] -- although honestly I still haven't gotten down to read all of them (or to implement a debugger! :-/).

--

1: https://github.com/munificent/craftinginterpreters/issues/92...

a1o · on Sept 27, 2023

Besides breakpoints, any ideas on inspecting the value of a variable in each step, figuring out what variables are in scope, for the case of an interpreter?

emmanueloga_ · on Sept 27, 2023

I’m guessing you’ll have to work with the scopes in the resolver:

https://github.com/munificent/craftinginterpreters/blob/mast...

a1o · on Sept 28, 2023

Ooh, thanks for this! :)

Modified3019 · on Sept 27, 2023

Not what you asked for, but if that's your interest you'll probably appreciate https://justine.lol/blinkenlights/

HansLambda · on Sept 27, 2023

Take a look heere: https://eli.thegreenplace.net/tag/debuggers It was eye-opening for me.

wila · on Sept 27, 2023

How about a book?

I like "Advanced Windows Debugging" by Mario Hewardt and Daniel Pravat.

matt3210 · on Sept 27, 2023

Does anyone know a similar article using c/c++? Interesting concept.

SebastienWae · on Sept 27, 2023

Yes, the "Writing a Linux Debugger" series in C++. https://blog.tartanllama.xyz/writing-a-linux-debugger-setup/

And more generally there is "The Debugging Book" in python. https://www.debuggingbook.org/

emmanueloga_ · on Sept 28, 2023

The last book is an example of a book that is really about debugging, not about debugger implementation. This is the case for most books you will find with a title matching /.debug./.

Debugger knowledge seems to be scattered across the internet and language implementations. Also I never found a language implementation book that talks about how to make the implementation friendlier/compatible with writing a debugger.

dgb23 · on Sept 27, 2023

This is amazing! I've been thinking about writing a debugger (for learning how they work etc.). This series is going to be a massive help!

timmisiak · on Sept 27, 2023

Honestly it's a great exercise for learning how low level stuff works in general. Happy to answer any questions you have!

xvilka · on Sept 27, 2023

Writing cross platform (different OS, POSIX and not, different architectures, different endianess, etc) debugger, this is where all pain lies.

tibbydudeza · on Sept 27, 2023

For the want of a decent JS React debugger.

uxp8u61q · on Sept 27, 2023

Debugging JS with vscode is probably the nicest debugging experience I've had, bar maybe C# with the latest VS.

kaycey2022 · on Sept 29, 2023

Is there a guide that covers this?

heelwood · on Sept 27, 2023

[flagged]

_hl_ · on Sept 27, 2023

@dang