Bootstrapping a Compiler

marktangotango · on Jan 17, 2015

Using a scheme subset seems like cheating almost. The no syntax, s expression languages allow one to avoid the entire lexing, parsing, ast generation aspect of the endeavor.

Tratt has a blog post that expands on the bit about languages implemented in themselves and the consequences:

http://tratt.net/laurie/blog/entries/the_bootstrapped_compil...

ihnorton · on Jan 17, 2015

"ML -- a domain-specific language for writing ML compilers."

https://twitter.com/samth/status/481759160007004160

pjmlp · on Jan 17, 2015

ML was created for the LCF theorem prover, not for writing compilers.

It is just a nice side effect that functional programming, as well as, logic programming, make it so easy to write compilers.

seanmcdirmid · on Jan 17, 2015

Does ML have any logic programming capabilities beyond algebraic pattern matching?

pjmlp · on Jan 17, 2015

I don't agree.

Niklaus Wirth has bootstrapped all of his languages.

I doubt anyone would say Algol W, Pascal, Modula-2 or Oberon(-7) have compiler development specific features.

sklogic · on Jan 21, 2015

Actually, I always suspected that the tagged unions support in Pascal was added specifically for compiler development.

cosmez · on Jan 17, 2015

i don't mind this approach, you can waste an entire day with parsing even for the simplest language, this goes right to the point.

"Parsing is mostly a distraction, because we want to study the parts of programming languages that are not parsing"

http://cs.brown.edu/courses/cs173/2012/book/Everything__We_W...

seanmcdirmid · on Jan 17, 2015

Parsing is easy, you can hand write a recursive descent parser for a reasonably sane language in a couple hours. As the joke goes, XML and Lisp are languages for those who are afraid of parsers and syntax. Though I'm not sure I would want to argue this point with Shriram.

sklogic · on Jan 20, 2015

The language used in the article had come to quite an elaborate syntax later in the bootstrap cycle ( http://www.meta-alternative.net/pfdoc.pdf ). So, yes, it's a cheating: using a Lisp-like language to bootstrap a compiler for a high level language with a complex parser.

integricho · on Jan 17, 2015

I am no expert in this field, but I was always wondering, are there issues that could theoretically come up, due to bugs in a previous implementation of the bootstrapped language and leave no way to resolve them?

So specifically, while the language is not bootstrapped, the developers work in a probably 100% tested and working language to implement theirs. Suppose they leave some bugs in the first implementation of their language, and they bootstrap it using their buggy first implementation. Could some of these bugs later be impossible to correct? Like they are on such a low level of the language that even the language constructs that would be needed to correct them are buggy, so they can't do it?

alexchamberlain · on Jan 17, 2015

I vaguely remember reading a theoretical article about a compiler that would embed a virus in the result. If it propagated itself somehow, it could be impossible to remove it.

aneeshm · on Jan 17, 2015

You're probably referring to Reflections on Trusting Trust[1], by Ken Thompson.

[1] http://cm.bell-labs.com/who/ken/trust.html

pgeorgi · on Jan 17, 2015

not impossible. build an interpreter for your language and run the compiler in it, to build itself. the result will likely not be "infected" since the compiler hook that reinserts itself misses its cue. (also: compilers with a different structure)

samatman · on Jan 17, 2015

No. For any string, you can manipulate that string into any other string, given a Turing machine or equivalent.

integricho · on Jan 17, 2015

Hm I see, thanks for answering that. By any chance, do you know/recommend any book for total beginners about language implementations?