Just like you read the classics in literature, I think it would be interesting to read some of the classic codebases. They should fit the following criteria:
- The project was highly successful or influential
- Freely available source code
- Ideally the bulk of the project was written by one person
So far my list includes Linux, the John Carmack id releases (wolf3d, DOOM, Quake), SQLite, and vim. Any others I'm missing?
Those aren’t necessarily written as people would do it today, but Knuth’s literate sources for TeX[1] and METAFONT[2] are explicitly meant for reading. The literate program family also includes LCC[3], but A retargetable C compiler is a book that you’d need to buy; and PBRT[4], but Physically based rendering is more exposition than program (even if the program is a perfectly good one). The source for Unix V6[5] with the accompanying commentary by Lions is probably as much of a classic as it’s possible to get. And as an eccentric choice in a similar format, may I suggest cmForth[6], perhaps paired with Footsteps in an empty valley[7]?
Also, though this is not precisely what you’re asking for, The architecture of open-source applications and its sequels[8] have the original designers’ reflections on some well-known codebases.
I can also recommend Niklaus Wirth's "Project Oberon" - not quite literate programming, but extremely well explained code examples and data structures of the Oberon OS/language/UI system: http://www.projectoberon.net
Wirth's "Compiler Construction" is very concise and written in a similar way: https://people.inf.ethz.ch/wirth/CompilerConstruction/ - he manages to describe a complete compiler for an Oberon subset (earlier versions of the book used PASCAL) in a bit more than 100 pages.
I used to read the OBS classic code base before it was replaced by OBS Studio. The original OBS was monumental when it came out. Yes, it was a Windows-only code base, with the GUI coded in straight Win32, and didn’t support the extensive array of sources that OBS Studio now does. But it opened instantly, and had the best D3D and OpenGL capture performance available. To my mind OBS Classic single-handedly boosted Twitch streaming to the stratosphere. This happened despite existing and new competition in live streaming apps (primarily Wirecast and XSplit). OBS Classic was just that good for its time.
OBS Classic even enabled shared texture capture of Direct3D9 contexts, via a hack to d3d9.dll which had to be maintained for every new major Windows update that came out.
OBS Studio carries the torch with a much cleaner cross-platform core library written in C, and better performance and stability, which itself is an achievement.
Since OBS Classic is now gone, I’d recommended anyone interested in programming video, audio and cross platform native apps to read through the current OBS Studio code base on GitHub.
> I find it interesting that originally each command was a separate executable.
That's actually still the case. `git` is just a wrapper that when called as `git <command>` looks for `git-<command>` under `libexec/git-core/`.
You can add any command named `git-foo` into your PATH and then run it with `git foo`. For example, I have `git-gsr` which is a simple shell script that does a global search and replace in a git repo. I run it as `git gsr <old> <new>`
I am not a programmer (yet). I am an openBSD fanboy. I've read stories of people reading the code and raving about the quality. It does tick 2 of the checkboxes you mentioned (not just one person). Does this qualify?
I needed to learn how a very specific OS thing worked and I read the code from Linux, openbsd, L4, hurd, minix, and a few other projects.
The openBSD kernel code was easiest to follow because it favored simplicity over all other things (improved security was just a byproduct).
For the record, the microkernels where hardest to follow despite having tons of books and a few experts nearby. But that's a whole different discussion
Yes, I think so. Being written by one person isn't a hard requirement, it's just that I feel like you get a better sense of programming style and someone's approach to problem solving when you read code that hasn't been touched by too many people.
Projects with a maintainer who strictly enforces code style and quality would still fit the description. From what I've heard, OpenBSD falls under this category. I'll add it to my list.
If people are interested in looking at Unix kernel implementations: Open Solaris code is out there, too. Now, i haven't looked at it myself (not a kernel hacker) and it's more like the exact opposite of a one person effort, but i heard praises about its alleged code quality more than once. So it might be educative (as far as the design of a commercial Unix kernel goes).
Stockfish is a good one. Though the original codebase was called glaurung or something like that. Written mostly by one person or a small handful(the original anyway), I think. Obviously successful, wildly influential in its niche, and the code is open source.
Though honestly if you were gonna read a chess engine, might as well read modern Stockfish. You'll learn a lot about squeezing out every last cpu cycle and byte of memory, as well as parallelism in a non- "embarassingly parallel" problem domain. I recommend starting with the transposition table(ttable.cpp), since it's very self-contained.
Yeah, chess engines are a good call. They're often relatively small and self-contained, and feature a lot of good programming tricks and optimizations. I have read some of the Stockfish code, but if were going for something more "classic" I might try to dig into GNU Chess, Fruit, or Crafty.
While I was reading Julia Evan's "Behind "Hello World"[0] on Linux, I had some fun digging up the zsh[1] source code. Granted, I needed some help from Cody AI[2] and even wrote some notes from my learning
https://blog.yelinaung.com/posts/behind-hello-world-on-linux...
eval[0]. Originally written by John McCarthy to define LISP, it has had massive influence over the implementation of all interpreted and dynamic languages.
MacCarthy described EVAL as a theoretical entity; it was Steve Russel who said he would code it. MacCarthy was surprised and said that Russel is misunderstanding it; it's just a theoretical description, not code.
But Russel basically hand-translated the calls into assembly code and got an interpreter out of it.
An interpreter written in Lisp cannot be executed if you don't already have an existing interpreter to run it.
The key insight that we can describe the interpreter in Lisp, and then somehow compile it into code belongs to Russel. Eventually it became possible to do that by machine: when you have a Lisp compiler, the Lisp-in-Lisp interpreter can be compiled by machine rather than by hand.
Assorted members of the unix and BSD family probably would qualify - research unix (at least v6, but maybe others) and 4.4bsd-lite come to mind. Maybe minix.
Obvious reply, but don't forget e.g. Andrew Tanenbaum's Minix - some code was made exactly for educational purpose. (There should be a cone for that: classic codebases for production and classic codebases for education.)
Having hacked on that codebase (the heirloom-vi project has replaced my personal ex-vi fork for my own use since) the heavy use of globals and assorted trickery means it's not the easiest thing to read.
For a simple but really rather elegant piece of code, I very much enjoyed reading the sources to abduco a while back.
I feel like we can be more (or less) flexible about the 'impact' and 'single author' criteria, but we _definitely_ need to be able to see the source :)
> counter = (counter >> 1) + priority ... Note that counter >> 1 is a faster way to divide by 2 ... This means that if a task is waiting for I/O for a long time, and its priority is higher than 2, counter value will monotonically increase when counter is updated.
Actually the effect is that counter will exponentially decay up to 2 * priority (if the task is not runnable). You can sanity check this by looking at the limit: if counter is 2*priority then that expression will leave it at its current value.
When Redhat IPOed (or soon after? quarterly report? I can't remember) they sent out a poster with the 0.01 source code to their share holders. It fits and is fun to read from time to time.
Par for the course honestly. Obviously major C compilers are only becoming more strict about acceptable code. For instance when -fno-common was made the default in GCC 10 it ended up breaking an rather incredible number of packages.
GCC supports a plethora of ISO standards, language extensions, pragmas, etc. It simply isn’t tractable to guarantee support for every behavior or language construct which GCC has tacitly supported. Compilers like GCC may go above and beyond the ISO C standards, but they aren’t in the business of supporting every C dialect possible.
Years of language-lawyering really refined our own understanding of what is and isn't correct code. Any older code is just so full of UB that they can barely be said to be valid code.
That's besides the point. The latest release of, let's say Linux, has bugs in it (statistically, unless it's the first perfect bug-free release ever). Would it be reasonable for a compiler to refuse to compile it? The compiler's job isn't to refuse to compile bad code if it could compile it, only to warn the user and try to help fix it if possible.
The problem isn't that the compiler chooses not to compile something if it deems it broken. The problem is that the broken code's behaviour is coupled to the compiler of the time.
The behaviour you want isn't "compile this version of Linux even though it has a bug", it's "compile this version of Linux as if you were a compiler at the time which accepted this broken code in this way", which is a very tall order, along the lines of bug-for-bug compatibility while also fixing the bug.
GCC 2.95 was a tremendously sticky release. The 3.x series was notorious for compiler bugs for a long time so many people stayed with 2.95 for years. I think a good number of people ended up skipping the entire 3.x release.
I vaguely recall that when I was building LFS some time in the past, gcc 2.95 was recommended over the 3.x series for some reason. So I looked it up and found a line about it in an old version of the book:
> This is an older release of GCC which we are going to install for the purpose of compiling the Linux kernel in Chapter 8. This version is recommended by the kernel developers when you need absolute stability. Later versions of GCC have not received as much testing for Linux kernel compilation. Using a later version is likely to work, however, we recommend adhering to the kernel developer's advice and using the version here to compile your kernel.
I wish I had a better reference. The bit about "compile this version of Linux as if you were a compiler at the time which accepted this broken code in this way" made me think of that.
I had an askhn about a "linux internals" book (lookup "windows internals"), I would be glad to contribute to crowdfunding such an effort, but not strictly for the kernel but every major userspace subsystem as well.
I don't know if what's shown is the actual source code, but I notice spaces were used instead of tabs. Maybe Torvalds became a tab advocate only further down the line.
I actually went ahead and asked Linus Torvalds himself. He said he'd never use spaces, and always used tabs from the get go. So, my theory fails :)
> What? No.
> It always used tabs.
> And by "always used tabs" I mean "mistakes probably happened, and there may be spaces in some places", but afaik I've always used hard-tabs as the normal indentation.
> If you see something else, you probably have some archive that has been reformatted.
I feel like looking at the first working versions of a big successful project is a great way to understand how it works.
Usually it will only contain the most important core features without a lot of abstractions/generalizations. So it is actually manageable to read through all of the code in a couple of days.
I don’t know about this. A lot of the success of a project has to do with how they react to changing conditions and the needs of their users. Moreover I would say that most projects owe a great deal of their success to the nontechnical factors, e.g. licensing or other social factors.
Reading the initial working versions of a successful project will inform you on how that project used to work. In a sense it’s akin to reading a working draft of a great piece of literature without reading the final product.
Well, 1993 or 94, I had Linux, X11, fvwm(2?) and emacs running in 5MB. To be fair it was not very fast, and one later and extremely expensive £600 upgrade to 16MB made it perform much better.
Wow, an Atom N270? I'm envious, I've always wanted to find a fun way to make use of old hardware like that. I have a Google Cr-48 chromebook with a similar CPU and 2GB of RAM. I'd be interested in learning more about your setup to see if I could get something similar going.
I have an EEE 1005HA laptop with N270, 2 GB RAM and an old SSD. I'm running the latest Debian (it is the last version to support x86-32) with Xfce and I'm using it to read books in fbreader and Okular, play podcasts and SD videos with MPV, read IRC (via SSH). Actually it is completely useful except web browsers -- even simple pages which have not changed much since the laptop was new (such as this site or Wikipedie) are pretty slow in modern Chromium/Firefox; "modern" webpages are almost unusable. You can use links2 -g (in graphics mode) or Dillo, they are fast, but the usability is lacking.
- Install ZRAM, set 1/4 of your RAM as compressed ZRAM
- Use Luakit, set hardware-acceleration-policy to either always or never, just try.
- Setting an Android 4.x User Agent in Luakit will help, too. Often pages sent for phones/tablets are much lighter.
- git clone https://github.com.com/stevenblack/hosts ; cd hosts ; sed -n '/Start Steven/,$p' < hosts > hosts.new, append that hosts.new file to your current /etc/hosts file.
- MuPDF it's far lighter than Okular.
- Fluxbox + Rox + Zukitre GTK2/3 and Fluxbox theme can be far lighter than the whole DE. Ping me back for an easy setup.
In like 1997 I accidentally removed the wrong memory stick and booted up my 486dx2 with only 4MB of RAM. I heard the hard drive seek more or less constantly. But X started and some xterms. Never tried to use it, just wanted to shut it down cleanly. This was a FreeBSD 2.x system IIRC.
I believe the traditional resource is Dinosaur book (OS Concepts by Silberschatz). I've also seen Operating Systems: Three Easy Pieces https://pages.cs.wisc.edu/~remzi/OSTEP/ come up.
Another common recommendation is to find a hackable existing OS/kernel. You already have a working OS, but you can inspect and rewrite portions without having to do everything yourself upfront. http://www.minix3.org/ seems to be a popular one. Various BSDs seem to be mentioned, as well.
Thanks. I just flipped through a lot of Dinosaur book (10th ed) and it's not hands-on, though.
Back in undergrad in the early 90s, our OS course was we spent the semester building a crappy x86 OS which could boot, run a couple of processes with basic I/O and simple signals and memory management. Even though it had nearly zero concepts relevant to modern OS's, it was so much fun. It didn't set me up in any way to be a serious OS designer, but it was the essence of programming a CPU and some devices, with nothing in between us. I'd love to experience that again. But I don't know where to find resources for the lowest levels: booting into a raw CPU and connecting directly to devices (including display and keyboard). To be sure, I can find plenty of OS texts on advanced memory and process management schemes and so on, but what about the lowest level?
After I started using Linux in 1994 I wanted to write my own OS that was entirely GUI based. I used this book to learn the fundamentals and it really would help explain a lot of the Linux code, and would also teach you how to even get your kernel loaded and booted, which is the hardest bit for a new OS:
No. CPUs need to be designed to tolerate busy loops without damaging themselves (with appropriate cooling). CPUs of that era did not measure their own temperature, but they also weren't trying to squeeze as much juice through as the later Pentium 4s that ran very hot. Modern CPUs are self-regulating and try very hard to avoid damaging themselves even with inadequate cooling.
Im assuming the "halt and catch fire" thing is only really possible on pre-microprocessor machines built of discrete components, where the individual parts are small enough to overheat by themselves if driven wrong.
I'd guess the physical CPU package of an i386 would cope just fine with the few (hundreds?) of gates toggling in that small loop.
I wonder if it might be possible to do on a modern FPGA, if you artifically create some unstable circuit and pack deliberately it in a corner of the die?
There's probably some AVX512 concoction that would be the closest equivalent on a modern X64. There's probably an easy experiment -- if that concoction makes CPU freq drop and also makes whole-package thermals drop, it /could/ be due to localized die heating.
No. Intel CPU's, even way before Linux, were microcoded, so you're still using the full instruction fetch, decode, and microcode system for every step in your infinite for loop. You aren't wearing out the CPU any more than running any other code.
None of the instructions likely to be emitted by that loop will be microcoded, and the instruction will always be fetched from L1 cache. That said, this won’t be an issue simply because CPUs are designed and tested to be able to handle hot loops.
I remember win95 didn't use hlt instruction in the idle thread, it just did the same as linux. Power management wasn't a thing back then. I think ACPI and hlt came with winnt only.
>> All x86 processors from the 8086 onward had the HLT instruction, but it was not used by MS-DOS prior to 6.0[2] and was not specifically designed to reduce power consumption until the release of the Intel DX4 processor in 1994. MS-DOS 6.0 provided a POWER.EXE that could be installed in CONFIG.SYS and in Microsoft's tests it saved 5%.[3]
I stand corrected, I was under the impression that the original Pentium was the first architecture that had HLT, but maybe that was the first architecture I ran Rain on, since it had benefits (having ran Win95 on a 586, but never DOS on a 486 laptop)
Even single tasked systems like MS-DOS still had interrupts. You could HLT the processor and a keyboard interrupt could wake it straight back up and resume execution anywhere in the MS-DOS kernel. It's just that the typical TDP of a CPU back then was a couple of watts so there was literally no point in HLTing instead of busy-waiting so nobody bothered.
every x86 has had halt. win95 was just not using it even though you could write a 10 line program to get context switched in when idle that would halt it. it was one of my first programs as a child on a 486 66 dx2.
i just had chat gpt generate said program and i think its very similar to what I wrote. I'm unsure if it ever did anything but i've always been interested in efficiency:
#include <stdio.h>
#include <windows.h>
void main() {
printf("Setting process priority to low...\n");
SetPriorityClass(GetCurrentProcess(), IDLE_PRIORITY_CLASS);
printf("Halting the processor when no other programs are running...\n");
while (1) {
__asm {
hlt
}
}
}
The CPU would probably run cooler since it's not doing anything. Most of the circuit would be static, not flipping from 0->1 or 1->0 which is what tends to expend the most power.
Nope. They're built for it. Typically now on x86 at least you'd do a CLI then for(;;)HLT. That'd park the CPU unless a non-maskable interrupt was latched.
So, modern Linux is 4500x the lines of code of Linux v0.01. Does 36M lines of code in modern Linux mean that the complexity demon is winning? Or do we really, actually need that many?
I would love to see a graphical breakdown of how many lines of code, how many functions, etc are used for each major software module. Are there existing tools that do this?
You can use plain old 'wc' to count lines. Most of the Linux kernel codebase is drivers. 2nd most is architecture specific code.
I don't know what's the actual count of lines of code that I actually have on the kernel compiled on my specific x64 machine, but I think considering how many significant changes were made to the Linux kernel over the years, I don't think it's a victim of unnecessary bloat. The various virtualization features alone were industry changing features.
Modern linux has 4500x the features as v0.01 - multi CPU architecutre support, better scheduling, better memory allocators, different filesystems, permissions and isolation, more advanced networking, drivers, security. While there is bloat in all software systems, comparing them difficult because one is a MVP and the other is the starship enterprise.
Fascinating. I always love seeing the start of things. Often times we put people like Linus above the critical eye because of their great accomplishments. Things like this remind us they, too, are human. What an interesting look at how OS dev was so long ago.
Would Linux 0.01 boot work in a virtual environment? (VirtualBox or QEmu)
If not it might be interesting to fork it into a linux-zero-dot-zero-one project that merely tries to make minimal adjustments so it runs virtualized on modern hardware.
- The project was highly successful or influential
- Freely available source code
- Ideally the bulk of the project was written by one person
So far my list includes Linux, the John Carmack id releases (wolf3d, DOOM, Quake), SQLite, and vim. Any others I'm missing?