I really liked reading the README. It shows that you are passionate about Project Oberon and the book. This in turn motivates me to read it. Thank you for your work.
> The largest chapter of the 1992 edition of this book dealt with the
compiler translating Oberon programs into code for the NS32032 processor. This processor is now neither available nor is its architecture recommendable. Instead of writing a new compiler for some other commercially available architecture, I decided to design my own in order to extend the desire for simplicity and regularity to the hardware. The ultimate benefit of this decision is not only that the software, but also the hardware of the Oberon System is described completely and rigorously. The processor is called RISC. The hardware modules are decribed exclusively in the language Verilog.
I wonder how it compares to Nand To Tetris. Other than that, it seems really interesting. Has anyone read it?
As another user said, it's not oriented towards a beginner. You would want some, maybe most, of the background in NAND 2 Tetris first. Chapter 16 about that mentioned processor for example, and it just throws you in the deep end with the Verilog code. Here's the CPU interface with the system bus, the registers, and how the multiplier unit works.
The good part is the commentary about the design decisions and trade-offs. That is invaluable, because it is the wisdom of Niklaus Wirth, towards the end of his career, drawing on a lifetime of experience. He was one of those rare polymaths, with a broad and deep understanding, of both the circuitry, and the more abstract parts of CS. He always generalized, tried to understand the principle, and places things in their historical context and explains how they developed. Because it's Wirth, the history lesson is often based on personal experience. It produces a good synthesis, in my opinion. For example:
> The second [interface] (MouseX) is included here for historical reasons. It was used by the computer Lilith in 1979, and used the same Mouse as its ancestor Alto (at PARC, 1975). It is distinguished by a very simple hardware without its own microprocessor, which is currently contained in most mice. This goes at a cost of a 9-wire cable. But today, microprocessors are cheaper than cables. We include this interface here, because it allows for a simple explanation of the principle of pointing devices.
Nand2Tetris is the perfect introduction project of this field that spans Hardware, OS and compiler design. That said, I completed the project and found it lacking depths and in-depth investigation of all topics. But this is what makes it the perfect introduction project.
If you don't like it (probably due to lack of documentation), at this stage you can also design your own projects. Basically a CPU sub-project that uses Verilog or any HDL, which leads to an OS and compiler project.
the wirth-the-risc processor is immensely simpler to describe and program than the tecs/nand2tetris processor, which borders on unusable. i've gone through the process of 'designing' the nand2tetris processor on nandgame, and i'm pretty sure the nand2tetris processor is simpler to wire up from gates. but the wirth-the-risc processor is a lot easier to get running on an fpga or, i bet, to simulate with verilator, because it uses a real hdl instead of something that someone who's never designed hardware thinks an hdl might look like. probably the nand2tetris processor would require less code in verilog if you coded it up, quite similar to chuck thacker's 'a tiny computer for teaching' https://www.cl.cam.ac.uk/teaching/1112/ECAD+Arch/files/Thack... which is, like the nand2tetris processor, based on the dirty genitals nova architecture
nand2tetris will get you from nand gates to tetris and to bytecode interpreters. but oberon will get you from synthesizable verilog (which can be easily converted into nand gates but almost never is) to a fully usable gui operating system that can recompile its own source code. sadly it cannot resynthesize its own fpga bitstream because you cannot run vivado on it (though i see 71bae0447c737f454371dcf3b84fc62c says below it can at least simulate its own hardware)
a thing they both have in common is the lack of a usable name for the processor architecture
IIUC, in its 1990s incarnation Wirth had been able to get bitstream formats, but this century everything was closed off, so now it transpiles to Verilog.
The idea of an education computer is bugging my mind in the last 5 years.
If you consider modern hardware and OS is practically impossible to have a simple enough machine that you could teach the young generation. Fantasy consoles like pico-8 are good options for programming, but not for understanding the hardware underneath. That way you still have school who use old architectures for teaching.
A minimal RISC-V implementation is quite simple. There is a RISC-V implementation of xv6 - though that’ll require slightly more than the absolute minimum RISC-V implementation, specifically you’ll need CSRs, M, S and U mode and paging.
If you don’t care about paged memory you could do with just M and U mode. I have a small rtos that targets some of the WCH microcontrollers with that configuration. It does use the PMP but even that isn’t really necessary.
> Selfie is a self-contained 64-bit, 12KLOC C implementation of: (...) a tiny (...) subset of C called C Star (C*) (...) to a tiny (...) subset of RISC-V called RISC-U[;] a[n] (...) emulator (...) that executes RISC-U code[;] (...) a (...) hypervisor (...) that provides RISC-U virtual machines
so they have a self-hosted instruction set architecture, compiler, and operating system, though the operating system is much simpler than xv6. because the instruction set is a subset of risc-v you can run its code on actual risc-v hardware (or qemu-system-riscv), but presumably you could also design risc-u hardware in verilog that was simpler than a full implementation of rv64i with whatever extensions the hypervisor needs
or maybe a better question is, why care about it? as far as I've understood, paged memory is a legacy from a time where cheap and fast memory wasn't a thing
how feasible is it to get rid of memory pages? I guess the hardest thing would be untangling interprocess memory safety from pages??
The answer is memory fragmentation. Process memory safety can theoretically be done with a different scheme such as hardware memory capabilities. You could probably even do demand paging if the memory capability is not a true pointer, but some sort of memory system coherent handle. But, as soon as a process wants to allocate a 1 GB chunk and you have 2 GB, but only in 64 KB chunks you have a problem. You could go around copying data to compact the physical memory, but now you have serious interference problems.
You then run into the next problem of using say 32 MB in a "hot loop" in the middle of a 64 GB demand-loaded data structure on a 32 GB machine. You can not greedily load in the entire data structure from disk, so you need some sort of subset feature on your memory handles. But then what do you do about using two disjoint sections separated by over 32 GB? You need some way of having multiple subsets that correspond to physical addresses that do not respect the handle offset. Subset 1 corresponds to physical address range A and subset B corresponds to a uncorrelated physical address range B. Congratulations, you have reinvented memory mapping with extra steps.
Project Oberon does not use paged memory. For the most part memory defragmentation is done by a relatively simple system-wide garbage collector (by copying live objects to one end of the heap).
adding an mmu to a basic rv32e processor might double its size and power consumption, and more than double the verification effort; moreover, if you're targeting applications where deterministic execution time or even worst-case execution time (wcet) is a concern, the mmu is a very likely source of nondeterminism
so it's not so much that you don't care about it as that you might not be able to afford it. 'cheap and fast' depends on what scale of machine you're talking about; a 1¢ computer (not yet available) can afford less than a 10¢ computer like the pms150c or (rv32e) ch32v003, which can afford less than a 1-dollar computer like the stm32f104c8t6 or (rv32i) gd32vf104, which can afford less than a 10-dollar computer like a raspberry pi, which can afford less than a 100-dollar computer like a cellphone, which can afford less than a 1000-dollar computer like a gaming rig, which can afford less than a 10-kilobuck computer like a largish cpu server, which can afford less than a 100-kilobuck computer like a petabox
unix originally ran on the pdp-11, the relevant models of which had interprocess memory safety in the form of segmentation, but no paging. i've never used a pdp-11. adding paging (for example, on the vax the sun-1, and the i386) enabled a variety of new unix features:
- as you point out, it enables a process to be larger than physical memory;
- fork() became immensely faster because it didn't have to copy all of process memory, just the page table, and mark the existing pages copy-on-write;
- execve() became immensely faster for a similar reason: it could 'demand-page' the program into memory as you executed parts of it, instead of waiting to start executing it until the whole thing had been loaded from disk;
- shared libraries became possible, so that executable code used by many programs at once could exist as only a single copy in memory (though you could do this without paging if all the processes share the same address space, perhaps with different permissions imposed by an mpu — this wasn't considered an option for unix in part because it would involve either giving up fork() or only having one process in memory at a time);
- similarly, it became possible for processes to communicate through shared memory buffers, which is commonly used to get images onto the screen quickly;
- it became possible to memory-map files, like on multics, so you can access data in them without copying it, which normally takes about twice as long as accessing it;
- it became possible for user programs to use the paging hardware to implement the write barriers for their garbage collectors by using mprotect(), though that's never been a very popular thing to do because sigsegv handlers are slow and usually nonportable;
- and, as veserv pointed out, it eased fragmentation.
non-unix systems used paging for a variety of even more creative purposes:
- efficient system image checkpointing as in keykos or eumel, by way of atomically marking all pages on the system copy-on-write and then streaming out the dirty ones to disk, so you never had to reboot; after a power failure or system crash, all the same programs would be running in the same state as at the last checkpoint. qemu can do this too, i think
- distributed single-address-space oses, where memory pages migrate around a cluster over a network according to where they're being accessed, so every program on the cluster is running in a single shared 64-bit address space; this didn't turn out to be as useful as it sounds
- insert your mindblowing creative idea here
anyway it's totally possible to implement memory protection without paging, and lots of computers have, past (with segmentation) and present (with mpus). but paging gives you a lot more than just memory protection
Vintage 8-bit hardware is extremely comprehensible, and it teaches you fundamentals which absolutely still apply today. Ben Eater's YouTube videos are fantastic for this, both his 6502 project and his homebrew "from scratch" breadboard computer.
I think that somehow a computer than can learn to talk to other computers would be the pinacle solution
this is within reach now with LLMs, the remaining challenges would be somehow connecting the computers (a hardware compatibility issue) but the software should be able to figure the other software out somehow
Back when I took compilers in college we wrote a compiler for Oberon. I couldn’t quite find my original class site, but this one seems roughly right (from a few years before I took the class): https://cseweb.ucsd.edu/~wgg/CSE131B/
The original 1992 book is longer. As mentioned earlier the code generation chapter is smaller because the RISC CPU is much simpler than the original.
But, also, the original, I think, has much more source code. I don’t know if the original book contains the entirety of the system, but the new one seems to contain more highlights of interfaces and selected examples. It likely relies on the ubiquity of source availability on internet, which clearly was not the case in 1992.