I’ll share this anecdote told by a friend of mine.
He was on a team building a Modula-2 compiler for OS/2, and his group was working on the debugger.
At some point a debugger becomes feature complete enough that you can use the debugger to ... debug the debugger.
But this was OS/2 which has true multiple processes (unlike it’s contemporary Windows 3.1). So you could, naturally, run the debugger in one process and attached it to another process which, just so happens to be another instance of the debugger.
As with all things, while doing this they encountered bugs in the debugger that, well, needed to be debugged.
He said there was a certain epiphany when they realized, because of the multi process nature of OS/2, that they could debug the debugger debugging the debugger.
I would imagine this took a bit of focus. Turn away for a moment and probably really messes with your head.
I think my record when I was on the WinDbg team was 5 debuggers deep.
I honestly think one of the best parts of writing a debugger is being your own recursive customer. I think that's something you only get to do for a few things. Debuggers, languages/compilers, and operating systems. And probably a few others.
I worked at a company that made JTAG probes. When you wanted to debug the firmware on the probe you’d attach another probe to it. And if you encountered a bug while debugging that probe, then you’d attach another probe…
Focus is still an issue today. We had a version of Firebug that would let you debug Firebug. It was great! A bit buggy though, so you can see where this is going…
That said, even today when debugging Chrome DevTools with Chrome DevTools, window placement is key!!! Ideally, different screens. That keeps the mind clear.
Surely you would then attach the debugger being debugged to the original debugger to debug it and stop the recursion (and of course instantly deadlock).
Tree-sitter is a tool for generating fast, incremental parsers. In particular, the algorithm is suited towards writing "language servers" for IDEs, which re-parse code incrementally as the user works. These kinds of incremental parsers have historically been a huge problem. It looks like Tree-sitter is an enormous practical advance in this area.
And discovering that there's a way to use Tree-sitter from Rust is fantastic. From the post:
Getting easy access to fast, incremental parsing is a huge win. And Tree-Sitter has support for being used from a huge list of languages, not just Rust.
Tree sitter also has a bunch of deficiencies that don't make it ideal for a number of usecases, or sort of act bizarrely in some edge cases. Just evaluate tools like this cautiously, of course. But I like what it's done for the ecosystem as a whole!
Because tree-sitter lexes as it parses, you may have to use an external scanner in order to deal with this kind of stuff. Where are you stuck trying to deal with forward declarations?
If this could be solved, we could port this AGS Script parser to the AGS Editor. Today, the parser Adventure Game Studio uses for the needs like auto-complete and it's very simple refactor like things uses a custom handmade parser built in C#. I think if we could leverage tree-sitter we could speed things up and repurpose it to build things like a LSP for AGS Script.
It looks like a bug in the grammar. I’ll bookmark this and see if I can make time for it later. Probably won’t be able to. I recently build a grammar from scratch so I’m okay at tree sitter
Oh, but if you do find time I would be externally grateful! :) The Tree Sitter generated parser is amazing for being super fast and also for being able to tolerate partially written code. This working would mean a lot for the AGS community.
The part that makes tree-sitter useful for this kind of thing is the error recovery. It's hard to do error recovery correctly. Tree-sitter gives you the ability to continue parsing your code which makes it useful for authoring tools.
Absolutely, rust sitter is fantastic. I haven't used any other parsers in Rust so I don't have much of a comparison point, but it's probably hard to get much more clear and concise, which I think really helps.
Basically you overwrite the instruction you want to break at with a breakpoint instruction (e.g. int 3 on x86). This will cause the process to trap and the OS will then let the debugger process know about out somehow, e.g. via the SIGTRAP signal on Unix.
The debugger then replaces the int 3 opcode (which is a single byte conveniently) with the first byte of the original instruction so that the execution can continue.
Wouldn’t that race against any other thread in the process? I guess you could stop all threads when you hit the breakpoint and start them again after you restore the breakpoint, but the synchronisation of that would be really tricky too.
You could also do something like have a clean mapping table (i.e. the code with no breakpoints installed) that you install for just the thread doing the step. You then revert back to the normal mapping table with the breakpoint after the step. As you are only modifying the executable section, as long as you are not using self-modifying code, there should be no data inconsistency with having a multiple copys of the executable transiently.
Emulation is an option, rotating hardware debug registers is another option, detecting self-jumps is another option.
I really only implemented a debugger for the esp8266 and it was just good enough for me and my team to get our job done so it didn't handle many edge cases like that
As i_don_t_know stated, if the CPU has the ability to single step an instruction, you use that. Otherwise:
* Restore the original instruction byte.
* Find the next instruction, and set a temporary software
breakpoint there.
* Resume the one instruction
* Restore the original instruction byte at the temporary software breakpoint.
* Set the software breakpoint in the original instruction
* Resume running
The other thing to keep in mind is dealing with JMP, CALL and conditional branch instructions. It can get pretty messy pretty quick, which is why I find low level debuggers on old 8-bit CPUs a marvel as they had to deal with only software breakpoints.
Yes, software breakpoints are difficult to get correct (the main reason why I started with hardware breakpoints). It gets more complicated with kernel debugging, where a single step (trap flag) could get pre-empted by an interrupt handler. And you can't always single-step a CPU and leave all other CPUs frozen.
Others have already mentioned software breakpoints where the instruction is replaced, another option is to run the code in an emulator that supports a virtually unlimited set of breakpoints. For example, using QEMU with its GDB stub.
Use software breakpoints (which are mentioned but not described, the short story for those is you overwrite the address you care about with an illegal instruction and execution traps when it encounters that code, and then you undo it to continue).
Working at Microsoft back in the early 00s I spent a lot of unfriendly hours with windbg. On one particular project we hunted for a terrible crash for months until it was uncovered that we were compiling against the single thread CRT when using threads extensively...whoops
I was asking this myself this while reading the book "Crafting Interpreters". I posted a few resources I found on an issue about implementing debuggers [1] -- although honestly I still haven't gotten down to read all of them (or to implement a debugger! :-/).
Besides breakpoints, any ideas on inspecting the value of a variable in each step, figuring out what variables are in scope, for the case of an interpreter?
The last book is an example of a book that is really about debugging, not about debugger implementation. This is the case for most books you will find with a title matching /.debug./.
Debugger knowledge seems to be scattered across the internet and language implementations. Also I never found a language implementation book that talks about how to make the implementation friendlier/compatible with writing a debugger.
He was on a team building a Modula-2 compiler for OS/2, and his group was working on the debugger.
At some point a debugger becomes feature complete enough that you can use the debugger to ... debug the debugger.
But this was OS/2 which has true multiple processes (unlike it’s contemporary Windows 3.1). So you could, naturally, run the debugger in one process and attached it to another process which, just so happens to be another instance of the debugger.
As with all things, while doing this they encountered bugs in the debugger that, well, needed to be debugged.
He said there was a certain epiphany when they realized, because of the multi process nature of OS/2, that they could debug the debugger debugging the debugger.
I would imagine this took a bit of focus. Turn away for a moment and probably really messes with your head.