I've lost the original reference, but Joe Marshall once wrote in comp.lang.lisp:
Here's an anecdote I heard once about Minsky. He was showing a student how to use ITS to write a program. ITS was an unusual operating system in that the 'shell' was the DDT debugger. You ran programs by loading them into memory and jumping to the entry point. But you can also just start writing assembly code directly into memory from the DDT prompt. Minsky started with the null program. Obviously, it needs an entry point, so he defined a label for that. He then told the debugger to jump to that label. This immediately raised an error of there being no code at the jump target. So he wrote a few lines of code and restarted the jump instruction. This time it succeeded and the first few instructions were executed. When the debugger again halted, he looked at the register contents and wrote a few more lines. Again proceeding from where he left off he watched the program run the few more instructions. He developed the entire program by 'debugging' the null program.
Everything needs a catchy name, so I call it Debugger Driven Development.
I write the first few lines of a function, however much I feel sure about, then add a dummy statement at the end and set a breakpoint there. When the code stops at the breakpoint, I can see exactly what my code did and what data I now have on hand.
I use that new knowledge to write the next few lines, again until I get to something I'm unsure of or where I'd just like to get a better view of the data. Set a breakpoint there and view that new data.
Repeat as needed until the function is done.
At my work we have many internal APIs that are "documented" but the documentation is fairly lacking. With the debugger, I can see not only what the API claims to do, but what it really does with my actual input.
I am bummed that so many developers today eschew debuggers. I even read an article recently along the lines of "These famous programmers don't use debuggers, and you shouldn't either". Why would anyone want to talk people out of using such a useful tool? It makes no sense to me.
We did that in the 1980’s and the 1990’s on the Commodore 64 and the Amiga, only then the debugger was called a monitor and it was quickly discovered that programming in them was slow and error prone because they lacked the ability to recompute. That is how assemblers were invented: Turbo Assembler on the Commodore 64 and ASM-One, TRASH’M-One, Seka and MasterSeka on the Amiga. ASM-One and TRASH’M-One include an excellent debugger built natively into the integrated development environment, and stepping through the code after assembling it is a joy.
I'd love to use a debugger more, but in my day job at least, the use of Docker makes it a hassle to set up.
For standalone Java applications, or when using Visual C++ though, it's so much better than printing out state.
I am bummed that so many developers today eschew debuggers. I even read an article recently along the lines of "These famous programmers don't use debuggers, and you shouldn't either". Why would anyone want to talk people out of using such a useful tool? It makes no sense to me.
That would be because of what I call "long vs. short-term". If you're only thinking of the next few lines and doing that a lot, you will have effectively trained yourself out of looking at the bigger picture. As someone who taught programmers, I've seen what "debugger driven development" code looks like (because that's how some of them will try to start writing code.) It's not pretty. There's a reason a lot of highly productive (but not necessarily famous) programmers consider debuggers as a last-resort tool: writing code that needs debugging should be a rare occurrence.
Many PC magazines of the late 80s/early 90s had program listings (in Asm) for small utilities that you created by typing them into DEBUG, the very basic debugger that came with DOS. The C64 and ZX ones had similar listings, although I believe they were more commonly in the platform's variant of BASIC. Unfortunately, I don't think this culture existed around Apple's machines since the Macintosh and the Lisa that came before it.
the way to low-level format early MFM drives was to run DEBUG and enter assembly and call some special routine in the controller chip on the drive. These were instructions that came in the manual with the drive. Pretty wild times. We've come a long way.
Yes, we actually had to do stuff like this. When I started my first job in the late 1980s, it was normal for hard disks to have a little label with a list of the (known) bad blocks on the drive. (Hand-written by someone at the factory on the first (circa 15-20MB) disks I used; later, on things like big ESDI disks, dot-matrix printed.) In some formatting tools, you had to manually enter them; formats took a long time, so eliminating retries on bad blocks could save half an hour.
Novell Netware came with its own low-level formatter called `COMPSURF`: COMPrehensive SURFace analysis. Dozens of people would be sharing a server's hard disk, so data losses would be extra-bad -- and might well bring down the server, losing everyone's work.
Note: the assumption in the early days of Novell was that workstations didn't have hard disks of their own and booted off the server too, making a LAN tens of thousands of £/$ cheaper than giving everyone their own HDD.
Running COMPSURF before you installed took hours. Server HDDs were big -- hundreds of megabytes! Scanning all that took ages.
I do a limited form of this with C/C++ projects that take forever to compile, with many conditional breakpoints that alter control flow to my liking that I then "solidify" into actual code when I have it looking like I want.
Before VS 2022, it used to be called edit-and-continue, now they are doubling now on it improving the use cases that are actually supported, and it got renamed as hot reload.
That's how I learnt x86 assembly using MS-DOS and "debug" - which was a program that came with DOS. The proper assemblers at that time were Microsoft's MASM and Borland's TASM. With no access to those, the only option was to use the one bundled in DOS. Fun times, where you had to compute relative addresses of JMPs, based on the address where the JMP instruction sat. And then, you could even write to a particular cylinder/sector/offset on the hard disk and replace the boot sector with your own code.
Those who enjoy Asmrepl might also enjoy "Cheap EMUlator: lightweight multi-architecture assembly playground" [0] it supports 32 and 64 bit variations of intel, arm, mips and sparc instruction sets and also provides a visual experience and supports many operating systems.
If you are on Windows and need something in a console, a nice colorful asm repl is available WinRepl [1] which is similar to " yrp604/rappel (Linux) and Tyilo/asm_repl".
Not exactly the same, but https://www.endbasic.dev/ tries to achieve precisely that: a REPL with built in graphics for learning purposes, albeit with BASIC instead of asm.
These two lines are deeply ingrained in the minds of a whole generation of programmers. They start a graphics mode of size 320x200 with a 256 byte palette and you can start dumping your pixels in the 0xa0000 segment right away.
I am yet to find a modern graphics programming environment that is so comfortable and easy to use as this.
Well your sorta comparing heavyweight OS graphics stack APIs with old school firmware ones. Even so, things like SDL2 are dead simple, one requests a window region and its possible to write bytes to the resulting buffer that show up in a window. That said modern firmware interfaces are still pretty clean. If you write a UEFI hello world, its possible to access the raw frame buffer with just a connection to the GOP, which is just a couple lines of code in C. Its conceptually pretty close to what your describing, except its designed to work with a slightly more modern programming paradigm.
I wouldn't call SDL2 "dead simple", unless sarcastically. Just opening an empty SDL window requires writing about 20 lines of code that deal with several different abstractions, a "window", a "surface", a "renderer, an "event". I only want an array of pixels that I can edit and see the results in realtime. It is of course possible to do that, but it seems ridiculously overcomplicated.
I was taught as a kid to program simple graphical demos using peek and poke in basic. Then in assembler. In either case, stupid me got colored pixels on the screen after a few minutes of work. Kids these days, how do they start? Please, don't tell me "matplotlib" or I will cry myself to sleep.
Its basically, grab a window, show it, grab a render/draw buffer update it and make it visible.
I don't really find the base C version much more complex, although it does have a bit of boilerplate around init/window creation/grab surface/display surface/etc. I'm not sure I would consider that particularly complex. Sure SDL can get complex when you start trying to use GL/etc but if all you want is a buffer to write bytes that become pixels its pretty straightforward IMHO.
I would say its roughly the same level of complexity (if not less) than HTML canvas+JS.
Wow, this brings back memories of my final project for my "programming for Engineering students" course in the mid-80s.
I wrote a DOS TSR program (remember those?) which would pop up a window when you pressed a key sequence and present you with an ASM86 REPL.
You could selectively 'save' pieces of code, and then when you exited the window, it would paste the saved code as inline assembly code (a hex byte array surrounded by some Turbo Pascal syntax) into your keyboard buffer - the assumption being that you are running the Turbo Pascal IDE, of course.
The TSR itself was written in x86 assembly, which added a level of complexity. I would have given and arm and a leg to be able to do it in a high-level language like Ruby.
I can only take a guess: that for a TSR on an early DOS PC, you really wanted it to be small. TSRs took a significant chunk of your base memory, and as you only got 640 kB of that, you wanted to save as much as you could.
In the later days of DOS, programs grew so big that they wanted all of that 640 kB to themselves. Optional-extra type TSRs went out of fashion and DOS (first DR DOS 5, then MS played copy-cat with MS-DOS 5) gained built-in memory managers to load necessary TSRs (e.g. CD, mouse and keyboard drivers, disk cache, etc.) into Upper Memory Blocks.
UMBs were a 386 thing: you used a 386 memory manager to map any unused bits of the I/O space in the PC's memory map (i.e. from 641 kB up to 1 MB) as RAM. Anything that wasn't being used for ROM or memory-mapped I/O, you could put RAM there and then load TSRs into these little chunks of RAM -- 1 or 2 dozen kB of RAM each.
Yes, we were that desperate for base memory. It didn't matter if you had 2 or 4 or 16MB of RAM, DOS could only run programs in the 1st 1 MB of it, and only freely use the first ⅓ of that first meg. All the rest could only be used for data, disk caches, and other non-executable stuff.
A side-effect of having a 386 memory manager, for real DOS power users, was that fancy 3rd party ones like Quarterdeck QEMM could also offer multitasking. Quarterdeck sold a tool called DESQview that let you run multiple DOS programs side-by-side and switch between them -- radical stuff in the 1980s.
But once you had that, you didn't need TSRs any more.
I fondly remember writing my first game using assembly that I hand typed from a magazine article on an Amiga. It didn't work because of a reversed peek/poke. It took us all day to figure it out, but we got it working!
Apple //e & ][+ had one built in. It was called the "monitor". You typed "CALL -151" and you started typing assembly code. You could run, save, dump memory and read registers. When I got my first 286 I was surprised I couldn't do the same thing.
I didn't know that until about a decade later, unfortunately!
People forget that in 1984 information wasn't a click away.
The problem with owning a Hong Kong-made 286 clone in 1984, and using pirated software, is that it was extremely hard to learn things. I was limited by the books at my local "Waldenbooks" computer section, which was about 20 books. Computer shopper and Byte magazine were kinda helpful, but I learned very, very slowly. It wasn't until I entered college that I started learning rapidly, but the focus wasn't on PCs (it was still MTS mainframes). It took until my first job writing 16-bit drivers that I finally started learning the nuts and bolts of MSDOS.
The Apple ][ with Integer basic had a better one which had a built in mini-assembler. Very fun and useful. It was a real shame it got pushed out by the bloated Microsoft Basic. ;-)
I wrote it because I can never remember what the `test` instruction does to the zero flag. Every time I use the instruction I have to look up the docs. Looking up docs is fine, but running code in a REPL helps me remember things better.
It's a shame that modern debuggers don't have mini-assemblers included like the original Apple II. Having a REPL would be real nice. For one I wouldn't have to type 90 (NOP) into memory windows to blank out code like non mortal ASSERTS.
Thanks; my assembly experience was with earlier processors, with a single argument for their test instruction (kind of like calling x86 test with two same arguments). I should have checked what the x86 test instruction does before replying.
I do a lot of program analysis work, and it's occasionally useful to see the pre- and post-machine states of arbitrary instructions. I have my own (more? less?) hacky version of this program that I use for that purpose; I know other people use GEF and similar GDB extensions for similar purposes.
Learning assembly can be a pain, especially without something like gdb (with layout regs &layout asm).
This is much simpler and doesn't require you to type like 4-5 extra commands(start gdb, put breakpoint, set layouts, step through the code),thus avoiding the pain that gdb can be for very-simple asm programs.
This reminds me of a fun project I once did, writing an x86 assembler in Lotus 123, using lookup tables. On the odd occasion when it worked, it was immensely fulfilling.
This could be implemented with Jupyter notebooks as a Jupyter kernel or maybe with just fancy use of explicitly returned objects that support the (Ruby-like, implicit) IPython.display.display() magic.
More links to how Jupyter kernels and implicit display() and DAP: Debug Adapter Protocol work: "Evcxr: A Rust REPL and Jupyter Kernel" https://news.ycombinator.com/item?id=25923123
Using intrinsics correctly generally requires understanding assembly, because they are supposed to match the assembly you'd want to generate. Just sprinkling them around because you're not familiar with x86 assembly is unlikely to be productive.
A toy project I have in mind is bootstrapping a lisp in asm and then using lisp macros as assembler macros to build up a high level language that would effectively be native code.
Sounds like it'd be cool for the sake of it, but just in case you (or other readers) aren't aware (Edit -- looks like you are very aware ;) SBCL already compiles Lisp code to native code. It's not the same as (asm) macros all the way down, but still. You can even inspect the assembly of a function with the built-in function DISASSEMBLE, and see how it changes with different optimization levels or type declarations or other things. https://pvk.ca/Blog/2014/03/15/sbcl-the-ultimate-assembly-co... is worth a read too for a cool experiment in generating custom assembly for a VM idea.
My understanding of wasm (which could be very wrong) is that it's a stack-based virtual machine (like cpython), rather than a load/store or register/memory ISA.
You could probably visualize the operand stack and opcode sequence, but it wouldn't be quite as "flashy" as x86's state transitions look when visualized here.
It's not emulating x86: it looks like it's assembling instructions on the fly and executing them in a mmap'd region. In other words, it's a very simple JIT.
But you probably can run it on an M1 anyways, since Apple's Rosetta will do the dynamic binary translation for you under the hood. YMMV.
It's a bit more complicated than that. Code is assembled into a shared memory buffer. The application spawns a child process that runs the code in the shared memory buffer. The parent process attaches to the child using ptrace to inspect and manipulate the CPU state and memory of the subprocess.
The app is entirely written in Ruby. So, it might run on Apple M1, but only if you're running an x86 Ruby interpreter through Rosetta.
would it? rosetta is a jit translator isn't it? how would it know to translate the instructions that are being generated on the fly interactively? unless there's hardware support in the m1 for translation or some other interrupt that gets triggered to do translation on the fly...
In general, the way you handle translation of machine code tends to resolve around compiling small dynamic traces (basically, the code from the current instruction pointer to the next branch instruction), with a lot of optimizations on top of that to make very common code patterns much faster than having to jump back to your translation engine every couple of instructions. The interactive generation this article implies is most likely to be effected with use of the x86 trap flag (which causes a trap interrupt after every single instruction is executed), which is infrequent enough that it's likely to be fully interpreted instead of using any sort of dynamic trace caching. In the case of x86 being generated by a JIT of some sort, well, you're already looking at code only when it's being jumped to, so whether the code comes from the program, some dynamic library being loaded later, or being generated on the fly doesn't affect its execution.
Rosetta contains both an AOT static binary translator and a JIT dynamic binary translator. That’s how Apple managed to get JS engines working even when the host browser was running as x86-on-M1.
I'd assume Rosetta works for newly marked executable pages by not actually flagging them as executable. When control flow attempts to transfer there, a page fault will occur since the page is not actually executable, this is the interrupt that allows Rosetta to step in, see what code was about to be executed, and write out a new ARM equivalent of the code to other memory, and redirect execution to the new equivalent ARM code, before resuming.
This basic sort of support is needed for any application that targeting x86 that uses any form of dynamic code generation, which is probably a whole lot more than most people think (even some forms of dynamic linking utilize small amounts of generated code, due to being more efficient than calling a method though a pointer to a pointer to the method).
x86 code is never actually marked as executable from the CPU's point of view, since that CPU does not know how to execute x86 code. The pages which contain the translated code are, but those are not something the x86 code knows about.
> x86 code is never actually marked as executable from the CPU's point of view, since that CPU does not know how to execute x86 code. The pages which contain the translated code are, but those are not something the x86 code knows about.
No, pages and the executable bit are something that the processor knows about.
Sorry, I don't understand what you are trying to say. Of course the CPU knows about pages and the executable bit? But there is no executable bit on a page filled with x86 code running on an ARM CPU, because the ARM CPU cannot execute that. It can only execute the translated ARM code that sits somewhere else, essentially out of sight for the x86 program.
The JIT'd ARM code pages are W^X, and that's not optional on macOS ARM. But W^X was opt-in on x86 macOS, so for backwards compatibility Rosetta can't require the x86 code to implement it in order to function.
So your model of how Rosetta works is off - the translation would need to support remapping the original code page read-only regardless of whether the x86 code did so, and letting a subsequent write invalidate the JIT cache of that page, instead of relying solely on the emulated process to implement W^X.
Systems that install new machine code without changing page permissions run an instruction cache barrier after installing and before running. Rosetta catches this instruction.
x86 does not require an icache flush because it has a unified cache. Rosetta emulates this correctly, which means it must be able to invalidate its code without encountering such an instruction.
The region is RWX, and code is put into it and then executed without a cache flush. This requires careful setup by the runtime, and here's how Rosetta does it, line by line:
1. buffer is created and marked as RW-, since the next thing you do with a RWX buffer is obviously going to be to write code into it.
2. buffer is written to directly, without any traps.
3. The indirect function call is compiled to go through an indirect branch trampoline. It notices that this is a call into a RWX region and creates a native JIT entry for it. buffer is marked as R-X (although it is not actually executed from, the JIT entry is.)
4. The write to buffer traps because the memory is read-only. The Rosetta exception server catches this and maps the memory back RW- and allows the write through.
5. Repeat of step 3. (Amusingly, a fresh JIT entry is allocated even though the code is the same…)
As you can see, this allows for pretty acceptable performance for most JITs that are effectively W^X even if they don't signal their intent specifically to the processor/kernel. The first write to the RWX region "signals" (heh) an intent to do further writes to it, then the indirect branch instrumentation lets the runtime know when it's time to do a translation.
Writing to an address would invalidate all JIT code associated with it, not just code that starts at that address. Lookup is done on the indirect branch, not on write, so if a new entry would be generated once execution runs through it.
> How do you think it detects a change to executable memory without a permissions change or a flush?
One way how this could be implemented was the way mentioned above: By making sure all x86-executable pages are marked r/o (in the real page tables, not from "the x86 API"). Whenever any code writes into it, the resulting page fault can flush out the existing translation and transparently return back to the x86 program, which can proceed to write into the region without taking a write fault (the kernel will actually mark them as writable in the page tables now).
When the x86 program then jumps into the modified code, no translation exists anymore, and the resulting page fault from trying to execute can trigger the translation of the newly modified pages. The (real, not-pretend) writable bit is removed from the x86 code pages again.
To the x86 code, the pages still look like they are writable, but in the actual page tables they are not. So the x86 code does not (need to) change the permission of the pages.
I don't know if that's exactly how it is implemented, but it is a way.
How you are disagreeing with me, then? The actual page table entries that the ARM CPU looks at will never mark a page containing x86 code as executable. x86 execution bit semantics are implemented, but on a different layer. From the ARM CPU's POV, the x86 code is always just data.
> The implementation of AMD64 is in software. It knows about page executable bits. The 'x86' code knows about them.
Where did I claim anything else? The thing I claimed the x86 code does not know about is the pages that contain the translated ARM code, which are distinct from the pages that contain the x86 code. The former pages are marked executable in the actual page tables, the latter pages have a software executable bit in the kernel, but are not marked as such in the actual page tables.
> Again, how do you think things like V8 and the JVM work on Rosetta otherwise?
Did I write something confusing that gave the wrong impression? My last answer says: "x86 execution bit semantics are implemented, but on a different layer".
you think that x86 pages are marked executable by the arm processor? probably not.
maybe arm pages with an arm wrapper that calls the jit for big literals filled with x86 code are, or arm pages loaded with stubs that jump into the jit to compile x86 code sitting in data pages are... but if the arm processor cannot execute x86 pages directly, then it wouldn't make a lot of sense for them to be marked executable, would it?
Ah, in this case I took "x86 execution semantics" just as how it behaves from user space, i.e. what permissions you can set and that they behave the same from an x86 observer (no matter what shenanigans is actually going behind the scenes).
Here's an anecdote I heard once about Minsky. He was showing a student how to use ITS to write a program. ITS was an unusual operating system in that the 'shell' was the DDT debugger. You ran programs by loading them into memory and jumping to the entry point. But you can also just start writing assembly code directly into memory from the DDT prompt. Minsky started with the null program. Obviously, it needs an entry point, so he defined a label for that. He then told the debugger to jump to that label. This immediately raised an error of there being no code at the jump target. So he wrote a few lines of code and restarted the jump instruction. This time it succeeded and the first few instructions were executed. When the debugger again halted, he looked at the register contents and wrote a few more lines. Again proceeding from where he left off he watched the program run the few more instructions. He developed the entire program by 'debugging' the null program.