Hacker News new | past | comments | ask | show | jobs | submit login

Very interesting. Does anyone have any idea how the C compiler was bootstrapped. Was it first written in B, or was it written in PDP-11 assembler?



It was written in B. And B was written in B. To trace the bootstrap up into the high level, you have to go further back than the PDP-11. To bootstrap B on the PDP-7 and GE-635 machines he had access too, Ritchie wrote the minimum subset required to compile a compiler in B, in a language known as TMG (in the vein of, and a big influence on, Yacc and similar). This minimal bootstrap was then used to compile a B compiler written in B, and further development was self-hosted.

Later, the language would be retargeted to the PDP-11 while on the PDP-7. Various changes, like byte-addressed rather than word-addressed memory, led to it morphing into C after it was moved. There was no clear line between B and C -- the language was self-hosting the whole time as it changed from B into C.

Mr. Ritchie wrote a history from his perspective published in 1993. I've mostly just summarized it above. It's available here: https://web.archive.org/web/20150611114355/https://www.bell-...


I like that the Bootstrappable Builds folks are working on a bootstrap process that goes from a small amount of machine code (~512 bytes) all the way up to a full Linux distro, including compilers/interpreters for modern languages.

https://bootstrappable.org/


Nice, I hadn't heard of this project before!

In particular BOOTSTRA [1] looks really fun. I have also toyed with the idea of using MS-DOS 3.30 as a guaranteed-ubiquitous build environment.

It comes with a filesystem, a text editor (EDLIN.EXE), an object file linker (yup!), a debugger/assembler (DEBUG.EXE is an amazing tool), a programming language with decent string handling (GWBASIC.EXE), and a command interpreter with batch file scripting to glue it all together.

Not sure I would write the assembler as batch files though, that is really hardcore. :)

That entire build environment would fit on a single 360KB floppy.

[1] https://github.com/mniip/BOOTSTRA


I don't think that BOOTSTRA is in any way involved in the process bootrappable.org are working on, which starts with hex0 (the 512B machine code binary seed), proceeds through a ton of layers and eventually reaches non-interactive bash.

https://github.com/fosslinux/live-bootstrap/blob/master/part...


That repo is very cool. I want to see them build further on it, go through recursive gcc / clang versions, etc. all the way to a kernel.


What is "guaranteed-ubiquitous" about DOS 3.30?


There are (still!) a lot of copies floating around -- both online and offline -- and it runs on the most pervasive commodity platform that ever existed: 16 bit real mode x86.

Those binaries will boot on anything from an ancient IBM PC with an 8088 CPU all the way through to fairly recent x86 systems (if they still have legacy BIOS boot support).


You have a different meaning of "guaranteed" and "ubiquitous".

It wouldn't be defensible to use those descriptors for Windows XP. In a word, it would be wrong. Calling DOS 3.30 "guaranteed-ubiquitous" is even wronger than that.


When people say bootstrapping, is this how all computer languages came to be? Or there has been multiple attempts at bootstrapping? What I’m getting at is whether all languages have a single root.


Yes, bootstrapping with regard to languages is the trick of getting that first compiler running so you can compile the rest of the compiler.

It has been done many times. The first assemblers were written directly in machine language. The first compilers were written in assembly. Many implementations of FORTRAN, ALGOL, COBOL, etc. As late as the 1970s it was not unknown to write a new high level language directly in machine code. Steve Woz's BASIC for the Apple II was "hand-assembled", as he put it.

So, taken literally, certainly not. People still do it today as a hobby or educational project.

But if we take the question in a looser sense? Yes kind of, at least in the UNIXish world. No one has implemented a serious high level systems language, except in a high level systems language, for a long time now. Rust, for example, was initially implemented in Ocaml. And normally that's what would be called the first Rust bootstrap. But OCaml was implemented in C, probably compiled by Clang or GCC. GCC was written in C and... so on.

Often one finds it does lead back to DMR at a PDP-7. On the other hand, I strongly suspect something like IBM's COBOL compiler for their mainframes (still supported and maintained today!) would not have such a heritage.


IBM's COBOL was implemented in PL/X, which traces its heritage to PL/S. PL/S itself was developed as a replacement for using assembler.


> Steve Woz's BASIC for the Apple II was "hand-assembled", as he put it.

Very cool - so basically (haha) Woz's integer BASIC was bootstrapped by writing it in assembly language and translating it into machine code by hand.

I wonder if someone (Woz?) has written a 6502 assembler in integer BASIC to allow it to bootstrap itself?


I'm not an expert, but I guess "bootstrapping" only applies to compilers, and Apple II BASIC was an interpreter. An interpreter doesn't have to compile itself, only run the programs you give it. So on an 8-bit computer the priority is to have an implementation that (a, and most importantly) takes up as little space as possible and (b) is reasonably fast.


There are metacircular interpreters, like in Scheme (see SICP [1]). But this is rare, compared to bootstrapping compilers.

[1] https://mitpress.mit.edu/sites/default/files/sicp/full-text/...


The Apple ][ has a monitor in ROM that allows to enter machine code directly. The legend says that for the Apple I, Woz knew the complete integer basic machine code and could type it live into the monitor.


Also, the monitor on pre-][+ machines with integer BASIC also had a mini-assembler, so short assembly programs could be typed in as such instead of keying in the bytes. Both versions of the monitor had a mini-disassembler.

On the other hand, I still remember at least a few 6502 hex opcodes, not that I have any use for that information anymore. The instruction set is small enough that it doesn't surprise me that Woz would have it memorized.


I’m not aware of any basic that is implemented in itself. There must be one or two somewhere, but typically for systems like the apple ii or Commodore PET, basic was implemented always in assembly.

So most weren’t exactly bootstrapped the way we are talking here.


> No one has implemented a serious high level systems language, except in a high level systems language, for a long time now.

LuaJIT is a prominent counterexample. The base interpreter in written in assembly for performance. It comes with its own tradeoffs, especially portability.


LuaJIT uses it's DynASM library for both static and JIT code generation. DynASM is implemented in a mixture of Lua and C, and Lua is of course implemented using C. So LuaJIT actually depends on a C compiler to generate assembly. Plus, not all of LuaJIT is implemented using assembly, only some critical hotspots (e.g. the bytecode interpreter loop). The rest is just C.

DynASM is actually really cool.

I almost forgot: apropos bootstrapping, because LuaJIT requires Lua to build, it includes a single-file, stripped down version of PUC Lua 5.1 which it can build to bootstrap itself if the host lacks a Lua interpreter.


LuaJIT 1.x uses DynASM for JIT but LuaJIT 2.x doesn't and instead uses the IR to ASM compiler in https://github.com/LuaJIT/LuaJIT/blob/v2.1/src/lj_asm.c

A program using DynASM doesn't depend on Lua or a C compiler to generate assembly at runtime but you need both to build it.


Fascinating and inspiring, thank you for taking the time to distill it so clearly.


Interesting article on how the original CDC 6000 Pascal compilers were bootstrapped (including the scanned source code listings as PDFs!?!):

http://pascal.hansotten.com/niklaus-wirth/cdc-6000-pascal-co...

"The method used to create the first Pascal compiler, described by U. Ammann in The Zurich Implementation, was to write the compiler in a subset of unrevised Pascal itself, then hand-translate the code to SCALLOP, a CDC specific language. The compiler was then bootstrapped, or compiled using the Pascal source code and the SCALLOP based compiler, to get a working compiler that was written in Pascal itself. Then, the compiler was extended to accept the full unrevised Pascal language."


I don't know if I'm stating the obvious, but you only need to bootstrap if you're writing a compiler in the language it's compiling. So any language compiler written in a different language isn't bootstrapped.

Even if you're using a loose definition of "root" as in a parent language is any language involved in writing another language, I doubt there is one root. There are certainly have been languages and computer architectures (with assembly languages) that are independent islands.


Some languages have multiple implementations in different languages. The Bootstrappable Builds folks are working on proper bootstrap for everything.

https://bootstrappable.org/ https://bootstrapping.miraheze.org/wiki/Main_Page


Chuck Moore is an interesting example of someone who has bootstrapped several languages without touching a C compiler. He has written systems directly in machine code, and has gone as far as designing his own chips to directly run Forth and ColorForth.


Bootstrapping can also get recursive, eg. GCC and binutils are bootstrapped by compiling their sources with the GCC you already have on your build system; the result of this is an intermediary compiler, which is then used to compile the sources of itself again. (I should look up the following bit, but IIRC, the second generation result compiles the sources a third time, and the last two outputs are compared -- they should be equal, barring any non-deterministic optimizations.)


The basic sequence was:

- Rewrite B compiler in B (generating threaded code)

- Extend B to a language Ritchie called NB (new B). This compiler generated PDP assembly There is no version of the NB compiler known to exist.

- Continue extending NB until it became the very early versions of C.

You can read the longer version of this history here:

https://www.bell-labs.com/usr/dmr/www/chist.html


I think it was written in B as Ritchie extended B for the PDP-11. By that point the B compiler was written in B

https://sci-hub.do/10.1145/155360.155580


Real programmers use butterflies


A reference to https://xkcd.com/378/, for those wondering.


Flint knives and bear skins.


Abacus




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: