Hacker News new | past | comments | ask | show | jobs | submit login
Exploring the Internals of Linux v0.01 (seiya.me)
519 points by ingve on Aug 12, 2023 | hide | past | favorite | 125 comments



Just like you read the classics in literature, I think it would be interesting to read some of the classic codebases. They should fit the following criteria:

- The project was highly successful or influential

- Freely available source code

- Ideally the bulk of the project was written by one person

So far my list includes Linux, the John Carmack id releases (wolf3d, DOOM, Quake), SQLite, and vim. Any others I'm missing?


Those aren’t necessarily written as people would do it today, but Knuth’s literate sources for TeX[1] and METAFONT[2] are explicitly meant for reading. The literate program family also includes LCC[3], but A retargetable C compiler is a book that you’d need to buy; and PBRT[4], but Physically based rendering is more exposition than program (even if the program is a perfectly good one). The source for Unix V6[5] with the accompanying commentary by Lions is probably as much of a classic as it’s possible to get. And as an eccentric choice in a similar format, may I suggest cmForth[6], perhaps paired with Footsteps in an empty valley[7]?

Also, though this is not precisely what you’re asking for, The architecture of open-source applications and its sequels[8] have the original designers’ reflections on some well-known codebases.

[1] http://mirrors.ctan.org/info/knuth-pdf/tex/tex.pdf

[2] http://mirrors.ctan.org/info/knuth-pdf/mf/mf.pdf

[3] https://github.com/drh/lcc

[4] https://pbr-book.org/

[5] http://v6.cuzuco.com/

[6] https://github.com/ForthHub/cmFORTH/blob/combined/cmforth.ft...

[7] http://forth.org/OffeteStore/4001-footstepsFinal.pdf

[8] https://aosabook.org/


I can also recommend Niklaus Wirth's "Project Oberon" - not quite literate programming, but extremely well explained code examples and data structures of the Oberon OS/language/UI system: http://www.projectoberon.net

Wirth's "Compiler Construction" is very concise and written in a similar way: https://people.inf.ethz.ch/wirth/CompilerConstruction/ - he manages to describe a complete compiler for an Oberon subset (earlier versions of the book used PASCAL) in a bit more than 100 pages.


> may I suggest cmForth[6] Hmm, not a forth I've come across before - enjoyed reading JonesForth though.


I used to read the OBS classic code base before it was replaced by OBS Studio. The original OBS was monumental when it came out. Yes, it was a Windows-only code base, with the GUI coded in straight Win32, and didn’t support the extensive array of sources that OBS Studio now does. But it opened instantly, and had the best D3D and OpenGL capture performance available. To my mind OBS Classic single-handedly boosted Twitch streaming to the stratosphere. This happened despite existing and new competition in live streaming apps (primarily Wirecast and XSplit). OBS Classic was just that good for its time. OBS Classic even enabled shared texture capture of Direct3D9 contexts, via a hack to d3d9.dll which had to be maintained for every new major Windows update that came out. OBS Studio carries the torch with a much cleaner cross-platform core library written in C, and better performance and stability, which itself is an achievement.

Since OBS Classic is now gone, I’d recommended anyone interested in programming video, audio and cross platform native apps to read through the current OBS Studio code base on GitHub.



Funny, if not helpful, error message:

     pw = getpwuid(getuid());
     if (!pw)
        usage("You don't exist. Go away!");

I find it interesting that originally each command was a separate executable.


> I find it interesting that originally each command was a separate executable.

That's actually still the case. `git` is just a wrapper that when called as `git <command>` looks for `git-<command>` under `libexec/git-core/`.

You can add any command named `git-foo` into your PATH and then run it with `git foo`. For example, I have `git-gsr` which is a simple shell script that does a global search and replace in a git repo. I run it as `git gsr <old> <new>`


They could search it in the source so uniqueness was enough.


Interesting how little there actually is


I am not a programmer (yet). I am an openBSD fanboy. I've read stories of people reading the code and raving about the quality. It does tick 2 of the checkboxes you mentioned (not just one person). Does this qualify?


Can confirm.

I needed to learn how a very specific OS thing worked and I read the code from Linux, openbsd, L4, hurd, minix, and a few other projects.

The openBSD kernel code was easiest to follow because it favored simplicity over all other things (improved security was just a byproduct).

For the record, the microkernels where hardest to follow despite having tons of books and a few experts nearby. But that's a whole different discussion


Yes, I think so. Being written by one person isn't a hard requirement, it's just that I feel like you get a better sense of programming style and someone's approach to problem solving when you read code that hasn't been touched by too many people.

Projects with a maintainer who strictly enforces code style and quality would still fit the description. From what I've heard, OpenBSD falls under this category. I'll add it to my list.


If people are interested in looking at Unix kernel implementations: Open Solaris code is out there, too. Now, i haven't looked at it myself (not a kernel hacker) and it's more like the exact opposite of a one person effort, but i heard praises about its alleged code quality more than once. So it might be educative (as far as the design of a commercial Unix kernel goes).


Stockfish is a good one. Though the original codebase was called glaurung or something like that. Written mostly by one person or a small handful(the original anyway), I think. Obviously successful, wildly influential in its niche, and the code is open source.

Though honestly if you were gonna read a chess engine, might as well read modern Stockfish. You'll learn a lot about squeezing out every last cpu cycle and byte of memory, as well as parallelism in a non- "embarassingly parallel" problem domain. I recommend starting with the transposition table(ttable.cpp), since it's very self-contained.

GNU Emacs also comes to mind I suppose.


Yeah, chess engines are a good call. They're often relatively small and self-contained, and feature a lot of good programming tricks and optimizations. I have read some of the Stockfish code, but if were going for something more "classic" I might try to dig into GNU Chess, Fruit, or Crafty.


The Apollo guidance computer source code? https://github.com/virtualagc/virtualagc


Very cool. Unfortunately it's probably not very relevant to current programming practice, but it definitely fits the bill for historical impact.



Redis is a great codebase. Very readable, probably even for someone unfamiliar with C.


Of course Redis was originally written in TCL:

https://gist.github.com/antirez/6ca04dd191bdb82aad9fb241013e...


While I was reading Julia Evan's "Behind "Hello World"[0] on Linux, I had some fun digging up the zsh[1] source code. Granted, I needed some help from Cody AI[2] and even wrote some notes from my learning https://blog.yelinaung.com/posts/behind-hello-world-on-linux...

[0]: https://jvns.ca/blog/2023/08/03/behind--hello-world

[1]: https://github.com/zsh-users/zsh

[2]: https://docs.sourcegraph.com/cody


Wozmon, the ROM on the Apple 1 that fit in 256 bytes. Ben Eater did a good video (1) recently going through the 6502 assembly.

(1) https://youtu.be/SpG8rgI7Hec



eval[0]. Originally written by John McCarthy to define LISP, it has had massive influence over the implementation of all interpreted and dynamic languages.

[0]: https://web.archive.org/web/20130319041327/http://www-formal... - this is a link directly to the page that has eval on it, but the rest of the paper leading up to it is important to really understand what's going on.


MacCarthy described EVAL as a theoretical entity; it was Steve Russel who said he would code it. MacCarthy was surprised and said that Russel is misunderstanding it; it's just a theoretical description, not code.

But Russel basically hand-translated the calls into assembly code and got an interpreter out of it.

An interpreter written in Lisp cannot be executed if you don't already have an existing interpreter to run it.

The key insight that we can describe the interpreter in Lisp, and then somehow compile it into code belongs to Russel. Eventually it became possible to do that by machine: when you have a Lisp compiler, the Lisp-in-Lisp interpreter can be compiled by machine rather than by hand.


Assorted members of the unix and BSD family probably would qualify - research unix (at least v6, but maybe others) and 4.4bsd-lite come to mind. Maybe minix.


Obvious reply, but don't forget e.g. Andrew Tanenbaum's Minix - some code was made exactly for educational purpose. (There should be a cone for that: classic codebases for production and classic codebases for education.)


Fairly old Glasgow Haskell Compiler: https://github.com/ghc/ghc/commit/e7d21ee4f8ac907665a7e170c7...

Bitcoin?

Redis?

Coreutils?


I studied assembly graphics programming on the Amiga 500 and had a good time


The original releases of gcc and GNU emacs both fit all your criteria.


>Any others I'm missing?

I would suggest MS-DOS: https://github.com/microsoft/MS-DOS


The Amoeba distributed operating systems, from Andrew Tanenbaum.

The distribution also includes Amake, a parallel version of Make.

https://archiveos.org/amoeba/


Lua's codebase is also great for getting started


Not a single author, but the Erlang parts of the Erlang source code are quite readable. (I haven't looked at the C parts)


Redis...


(soon, for #1 I hope): serenityOS and/or ladybird browser


Early versions of programming languages, like Ruby.


"vi" (Bill Joy's) not "vim"


Having hacked on that codebase (the heirloom-vi project has replaced my personal ex-vi fork for my own use since) the heavy use of globals and assorted trickery means it's not the easiest thing to read.

For a simple but really rather elegant piece of code, I very much enjoyed reading the sources to abduco a while back.



Tiny C Compiler


I was going to say, the list should include something by Fabrice Bellard. Tiny C Compiler is one.

https://bellard.org/tcc/

I was thinking, maybe first version/commit of QEMU would be interesting to read.


PostgreSQL


BBedit ?


That's not open source, is it?

I feel like we can be more (or less) flexible about the 'impact' and 'single author' criteria, but we _definitely_ need to be able to see the source :)


You might also be interested in the "A Heavily Commented Linux Kernel Source Code": http://www.oldlinux.org/download/ECLK-5.0-WithCover.pdf


> counter = (counter >> 1) + priority ... Note that counter >> 1 is a faster way to divide by 2 ... This means that if a task is waiting for I/O for a long time, and its priority is higher than 2, counter value will monotonically increase when counter is updated.

Actually the effect is that counter will exponentially decay up to 2 * priority (if the task is not runnable). You can sanity check this by looking at the limit: if counter is 2*priority then that expression will leave it at its current value.


When Redhat IPOed (or soon after? quarterly report? I can't remember) they sent out a poster with the 0.01 source code to their share holders. It fits and is fun to read from time to time.


> I thought GCC (or C itself) has good backward compatibility, but it's not sufficient.

Ha! It's been a long time since GCC was even able to compile older versions of _itself_.


Par for the course honestly. Obviously major C compilers are only becoming more strict about acceptable code. For instance when -fno-common was made the default in GCC 10 it ended up breaking an rather incredible number of packages.


Setting stricter defaults makes sense, but not being able to compile older code even with options to disable checks seems like a regression.


GCC supports a plethora of ISO standards, language extensions, pragmas, etc. It simply isn’t tractable to guarantee support for every behavior or language construct which GCC has tacitly supported. Compilers like GCC may go above and beyond the ISO C standards, but they aren’t in the business of supporting every C dialect possible.


Years of language-lawyering really refined our own understanding of what is and isn't correct code. Any older code is just so full of UB that they can barely be said to be valid code.

It's just a product of the times.


That's besides the point. The latest release of, let's say Linux, has bugs in it (statistically, unless it's the first perfect bug-free release ever). Would it be reasonable for a compiler to refuse to compile it? The compiler's job isn't to refuse to compile bad code if it could compile it, only to warn the user and try to help fix it if possible.


The problem isn't that the compiler chooses not to compile something if it deems it broken. The problem is that the broken code's behaviour is coupled to the compiler of the time.

The behaviour you want isn't "compile this version of Linux even though it has a bug", it's "compile this version of Linux as if you were a compiler at the time which accepted this broken code in this way", which is a very tall order, along the lines of bug-for-bug compatibility while also fixing the bug.


GCC 2.95 was a tremendously sticky release. The 3.x series was notorious for compiler bugs for a long time so many people stayed with 2.95 for years. I think a good number of people ended up skipping the entire 3.x release.


I vaguely recall that when I was building LFS some time in the past, gcc 2.95 was recommended over the 3.x series for some reason. So I looked it up and found a line about it in an old version of the book:

> This is an older release of GCC which we are going to install for the purpose of compiling the Linux kernel in Chapter 8. This version is recommended by the kernel developers when you need absolute stability. Later versions of GCC have not received as much testing for Linux kernel compilation. Using a later version is likely to work, however, we recommend adhering to the kernel developer's advice and using the version here to compile your kernel.

https://www.linuxfromscratch.org/museum/lfs-museum/5.0/LFS-B...

I wish I had a better reference. The bit about "compile this version of Linux as if you were a compiler at the time which accepted this broken code in this way" made me think of that.


Maintaining bugwards compatibility is a pain in the arse and eventually you really just have to cry havoc and let slip the dogs of breakage.


For code that has undefined behavior, the compiler may refuse to compile it or compile it into literally anything.

A common mistake is expecting code that has undefined behavior to be compiled into something that makes logical sense.

You may disagree with it but that’s how compilers work and how the standard describes they should work.


I had an askhn about a "linux internals" book (lookup "windows internals"), I would be glad to contribute to crowdfunding such an effort, but not strictly for the kernel but every major userspace subsystem as well.

https://news.ycombinator.com/item?id=33728590


>"Linus didn't have a machine with 8MB RAM:

'* For those with more memory than 8 Mb - tough luck. I've

  * not got it, why should you :-) The source is here. Change

  * it. (Seriously - it shouldn't be too difficult. ...'
Today, machines with 8GB RAM are very common. Furthermore, 8GB is not enough at all for software engineers ;)"

Bill Gates, 1981: "640K ought to be enough for anyone..."

Linus Torvalds, 1991: "8MB ought to be enough for anyone..."

Jen-Hsun Huang, 2023: "144TB ought to be enough for anyone..."

:-) <g> :-)

(Disclaimer: The above quotes are written for comedy purposes only! The above individuals referenced probably didn't actually say those things! <g>)


What is <g>? My first thought was https://developer.mozilla.org/en-US/docs/Web/SVG/Element/g but that cant be right


It means to grin. Sometimes it shows up as *g* too.


Grin, if I remember rightly.


I don't know if what's shown is the actual source code, but I notice spaces were used instead of tabs. Maybe Torvalds became a tab advocate only further down the line.


They were trying to save storage space and build times as the source grew in size.


Any truth to this? Somebody must have ran a benchmark at least?


I actually went ahead and asked Linus Torvalds himself. He said he'd never use spaces, and always used tabs from the get go. So, my theory fails :)

> What? No.

> It always used tabs.

> And by "always used tabs" I mean "mistakes probably happened, and there may be spaces in some places", but afaik I've always used hard-tabs as the normal indentation.

> If you see something else, you probably have some archive that has been reformatted.


I hope the parent was being humourous.

1 tab is 1 byte, 4 spaces is 4 bytes, so it doesn't actually work anyway.

And you can't rely on it being multiples of 4 anyway, so you'd have to iterate and check the bytes to check they are actually spaces.

So no, it's be slower, but not enough to matter.

Edit: or is the parent proposing the opposite? I sti hope they're being humourous as it still doesn't really matter


No, that was my actual theory for what happened, but apparently not true at all: https://news.ycombinator.com/item?id=37104012


A parse of the topic on the style guide doesnt suggest anything of the sort

https://www.kernel.org/doc/html/v4.10/process/coding-style.h...


I feel like looking at the first working versions of a big successful project is a great way to understand how it works.

Usually it will only contain the most important core features without a lot of abstractions/generalizations. So it is actually manageable to read through all of the code in a couple of days.


I don’t know about this. A lot of the success of a project has to do with how they react to changing conditions and the needs of their users. Moreover I would say that most projects owe a great deal of their success to the nontechnical factors, e.g. licensing or other social factors.

Reading the initial working versions of a successful project will inform you on how that project used to work. In a sense it’s akin to reading a working draft of a great piece of literature without reading the final product.


On 8MB, it was the minimal usable in mid late 90's for minimal systems such as today's 1GB systems with some light X environment.

There was even a Howto for 4MB laptops.

If you wanted to run X confortabily, even with FVWM and RXVT, you needed 16MB.


Well, 1993 or 94, I had Linux, X11, fvwm(2?) and emacs running in 5MB. To be fair it was not very fast, and one later and extremely expensive £600 upgrade to 16MB made it perform much better.


If you had 5MB you would just spawn Emacs in a tty and use SVGALib stuff to display images.

Today I do the same with Gemini/Gopher/light HTTP pages and nsxiv to display the pictures and mpv+yt-dlp to watch videos/music/podcasts.

That on an Atom N270, which today is like having 4MB and a 386 back in the day in 1999.


Wow, an Atom N270? I'm envious, I've always wanted to find a fun way to make use of old hardware like that. I have a Google Cr-48 chromebook with a similar CPU and 2GB of RAM. I'd be interested in learning more about your setup to see if I could get something similar going.


I have an EEE 1005HA laptop with N270, 2 GB RAM and an old SSD. I'm running the latest Debian (it is the last version to support x86-32) with Xfce and I'm using it to read books in fbreader and Okular, play podcasts and SD videos with MPV, read IRC (via SSH). Actually it is completely useful except web browsers -- even simple pages which have not changed much since the laptop was new (such as this site or Wikipedie) are pretty slow in modern Chromium/Firefox; "modern" webpages are almost unusable. You can use links2 -g (in graphics mode) or Dillo, they are fast, but the usability is lacking.


- Install ZRAM, set 1/4 of your RAM as compressed ZRAM

- Use Luakit, set hardware-acceleration-policy to either always or never, just try.

- Setting an Android 4.x User Agent in Luakit will help, too. Often pages sent for phones/tablets are much lighter.

- git clone https://github.com.com/stevenblack/hosts ; cd hosts ; sed -n '/Start Steven/,$p' < hosts > hosts.new, append that hosts.new file to your current /etc/hosts file.

- MuPDF it's far lighter than Okular.

- Fluxbox + Rox + Zukitre GTK2/3 and Fluxbox theme can be far lighter than the whole DE. Ping me back for an easy setup.


In like 1997 I accidentally removed the wrong memory stick and booted up my 486dx2 with only 4MB of RAM. I heard the hard drive seek more or less constantly. But X started and some xterms. Never tried to use it, just wanted to shut it down cleanly. This was a FreeBSD 2.x system IIRC.


To read with the recent https://cpu.land/


If one wanted to write their own little OS today, what would you recommend in terms of:

* Books or articles to read? Hopefully with a hands-on and practical approach.

* Target environment: VM or real hardware, and what specifically?


I'd start with https://wiki.osdev.org/Expanded_Main_Page . There's a whole community of people writing little OSs, and many OS 101 articles as a result.

VMs would be the most common target, but the Raspberry Pi is also quite popular.


Cool, thanks!


I believe the traditional resource is Dinosaur book (OS Concepts by Silberschatz). I've also seen Operating Systems: Three Easy Pieces https://pages.cs.wisc.edu/~remzi/OSTEP/ come up.

Another common recommendation is to find a hackable existing OS/kernel. You already have a working OS, but you can inspect and rewrite portions without having to do everything yourself upfront. http://www.minix3.org/ seems to be a popular one. Various BSDs seem to be mentioned, as well.


Thanks. I just flipped through a lot of Dinosaur book (10th ed) and it's not hands-on, though.

Back in undergrad in the early 90s, our OS course was we spent the semester building a crappy x86 OS which could boot, run a couple of processes with basic I/O and simple signals and memory management. Even though it had nearly zero concepts relevant to modern OS's, it was so much fun. It didn't set me up in any way to be a serious OS designer, but it was the essence of programming a CPU and some devices, with nothing in between us. I'd love to experience that again. But I don't know where to find resources for the lowest levels: booting into a raw CPU and connecting directly to devices (including display and keyboard). To be sure, I can find plenty of OS texts on advanced memory and process management schemes and so on, but what about the lowest level?


One example is Stephen Marz' "The Adventures of OS: Making a RISC-V Operating System using Rust": https://osblog.stephenmarz.com

Unfortunately, this isn't updated to work with current RISC V specs, so you will run into a number of problems which can be frustrating...



Excellent, thank you!


After I started using Linux in 1994 I wanted to write my own OS that was entirely GUI based. I used this book to learn the fundamentals and it really would help explain a lot of the Linux code, and would also teach you how to even get your kernel loaded and booted, which is the hardest bit for a new OS:

https://www.amazon.com/Developing-32-Bit-Operating-System-Cd...


Regarding the following code,

``` printk("Kernel panic: %s\n\r",s); for(;;); ```,

could the for-loop be damaging to the CPU by over-utilizing a small portion over and over again, in terms of it heating up a tiny space on it?


No. CPUs need to be designed to tolerate busy loops without damaging themselves (with appropriate cooling). CPUs of that era did not measure their own temperature, but they also weren't trying to squeeze as much juice through as the later Pentium 4s that ran very hot. Modern CPUs are self-regulating and try very hard to avoid damaging themselves even with inadequate cooling.


Im assuming the "halt and catch fire" thing is only really possible on pre-microprocessor machines built of discrete components, where the individual parts are small enough to overheat by themselves if driven wrong.

I'd guess the physical CPU package of an i386 would cope just fine with the few (hundreds?) of gates toggling in that small loop.

I wonder if it might be possible to do on a modern FPGA, if you artifically create some unstable circuit and pack deliberately it in a corner of the die?

There's probably some AVX512 concoction that would be the closest equivalent on a modern X64. There's probably an easy experiment -- if that concoction makes CPU freq drop and also makes whole-package thermals drop, it /could/ be due to localized die heating.


Maybe not the for-loop, but there has been research into damaging CPUs by repeated execution of some particular instructions:

https://www.semanticscholar.org/paper/MAGIC%3A-Malicious-Agi...


No. Intel CPU's, even way before Linux, were microcoded, so you're still using the full instruction fetch, decode, and microcode system for every step in your infinite for loop. You aren't wearing out the CPU any more than running any other code.


None of the instructions likely to be emitted by that loop will be microcoded, and the instruction will always be fetched from L1 cache. That said, this won’t be an issue simply because CPUs are designed and tested to be able to handle hot loops.


My AMD 386DX/40 didn’t even have a fan or a heatsink.


All CPUs didn't consume much power back then even at full load (several watts[1]), and thus leaving the CPU in a busy loop was the norm.

[1] http://www.cocoon-culture.com/lib/noise-report/external-docs...


I remember win95 didn't use hlt instruction in the idle thread, it just did the same as linux. Power management wasn't a thing back then. I think ACPI and hlt came with winnt only.


You could use Rain to cool down your CPU. That tool was useful under VM's and DOSBox too.


On Pentium or higher. 486s and earlier didn’t really have an HLT instruction, iirc


>> All x86 processors from the 8086 onward had the HLT instruction, but it was not used by MS-DOS prior to 6.0[2] and was not specifically designed to reduce power consumption until the release of the Intel DX4 processor in 1994. MS-DOS 6.0 provided a POWER.EXE that could be installed in CONFIG.SYS and in Microsoft's tests it saved 5%.[3]


I stand corrected, I was under the impression that the original Pentium was the first architecture that had HLT, but maybe that was the first architecture I ran Rain on, since it had benefits (having ran Win95 on a 586, but never DOS on a 486 laptop)


Idle loops are harder to implement when your system doesn't have multitasking.


Even single tasked systems like MS-DOS still had interrupts. You could HLT the processor and a keyboard interrupt could wake it straight back up and resume execution anywhere in the MS-DOS kernel. It's just that the typical TDP of a CPU back then was a couple of watts so there was literally no point in HLTing instead of busy-waiting so nobody bothered.


every x86 has had halt. win95 was just not using it even though you could write a 10 line program to get context switched in when idle that would halt it. it was one of my first programs as a child on a 486 66 dx2.

i just had chat gpt generate said program and i think its very similar to what I wrote. I'm unsure if it ever did anything but i've always been interested in efficiency:

#include <stdio.h>

#include <windows.h>

void main() {

  printf("Setting process priority to low...\n");

  SetPriorityClass(GetCurrentProcess(), IDLE_PRIORITY_CLASS);

  printf("Halting the processor when no other programs are running...\n");

  while (1) {
    __asm {
      hlt
    }
  }
}


That’s pretty much what DOS did as well.


The CPU would probably run cooler since it's not doing anything. Most of the circuit would be static, not flipping from 0->1 or 1->0 which is what tends to expend the most power.


It's not a risk or anything, but it does waste power compared to using the HLT instruction.


I think you need CLI; HLT as HLT by itself still allow machine to be woke up from interrupt.


Even with interrupts enabled sticking a HLT in that loop would be better than not.


Nope. They're built for it. Typically now on x86 at least you'd do a CLI then for(;;)HLT. That'd park the CPU unless a non-maskable interrupt was latched.


No.


I really enjoyed this, thanks for sharing.

I’m pleased to see the frequency and depth of the comments in the code too, makes it all very accessible.

Anyone managed to get it to compile?


So, modern Linux is 4500x the lines of code of Linux v0.01. Does 36M lines of code in modern Linux mean that the complexity demon is winning? Or do we really, actually need that many?

I would love to see a graphical breakdown of how many lines of code, how many functions, etc are used for each major software module. Are there existing tools that do this?


You can use plain old 'wc' to count lines. Most of the Linux kernel codebase is drivers. 2nd most is architecture specific code.

I don't know what's the actual count of lines of code that I actually have on the kernel compiled on my specific x64 machine, but I think considering how many significant changes were made to the Linux kernel over the years, I don't think it's a victim of unnecessary bloat. The various virtualization features alone were industry changing features.


Modern linux has 4500x the features as v0.01 - multi CPU architecutre support, better scheduling, better memory allocators, different filesystems, permissions and isolation, more advanced networking, drivers, security. While there is bloat in all software systems, comparing them difficult because one is a MVP and the other is the starship enterprise.


"More advanced networking" is an understatement given the lack of networking in v0.01.


> better scheduling

Some people might debate this :)


Fascinating. I always love seeing the start of things. Often times we put people like Linus above the critical eye because of their great accomplishments. Things like this remind us they, too, are human. What an interesting look at how OS dev was so long ago.


Would Linux 0.01 boot work in a virtual environment? (VirtualBox or QEmu)

If not it might be interesting to fork it into a linux-zero-dot-zero-one project that merely tries to make minimal adjustments so it runs virtualized on modern hardware.


You can use an emulator like PCem instead for a period-accurate machine.



The strcpy implementation is probably similar, just moved to gcc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: