To be clear about the `del block` thing, the only thing that the `del` keyword does in Python is unbind a name from the local scope. It does not cause an object to be destructed, call `__del__`, or anything else. It's morally equivalent to "block = None", except future references to block will raise a NameError instead of giving you None.
When used at the end of a function, when all locals go out of scope, it is actually doing nothing.
I learned this with PyPy (or maybe it was Jython?). My usual bad habit for small scripts to depend on open files to close when they went out of scope would suddenly exhausted all file descriptors. Ouch.
Just what I love, a scientific and pedagogic CS article, in well formulated English.
I would appreciate some introductory notes or links to prerequisites though. That would perhaps entice an audience who has no previous experience in writing compilers.
I really want to be able to wrap my head around compilers but I'm not sure what a good "user friendly" resource is that wont stuff lots of fancy compiler buzzwords in my face and become white noise to me.
I would recommend "Compiler Construction" by Niklaus Wirth. This small book (< 200 pages) will teach you the basics and show you how to build a single-pass compiler for a subset of Oberon (a Pascal successor).
For a more comprehensive example, you can have a look at my compiler for the full Oberon language targeting the JVM. It is based on the principles of that book: https://github.com/lboasso/oberonc
Start simple. If you haven't written an interpreter, do that first. Then figure out how to write a similar program that emits code for a virtual machine rather than evaluating the language.
I'm actually trying to write an easy-to-digest compiler and companion reading series. It's for a compilers independent study class. You can track my progress here: https://github.com/tekknolagi/tigress
At this point it's not well-documented, but the layers are numbered in order: l00, l01, l02... Each builds on top of the previous one.
EDIT: I just added a mini evaluator for debugging purposes and so people can see what the language is like.
Is it really that friendly when starting out, though? I found it brilliant, but quite the challenging read. I'd rather just read chapters four and five of SICP [1] and move on to Queinnec as a follow-up on more advanced topics.
I enjoyed this read because it was a very user-friendly overview. After reading, I knew exactly what major pieces did what and therefore what to search for next on my own terms.
I'm writing my first compiler now, and it's been quite a learning curve. I've also realized how much nicer my language is for writing compilers than the language in which I'm implementing the compiler. That made me realize I could write the compiler in my language and then write an interpreter for my language which would only need to run once--to generate the compiler--after which point my compiler would be self-hosting. :p
I haven't looked at any resources in particular. Basically my compiler has 3 parts--parser -> type checker -> code generation. Parsing was tricky at first, but then I stumbled upon a pattern which is pretty close to parser-combinators. Type checking seemed daunting, but my language happened to be Hindley-Milner compatible, and there's lots of reference information for HM. Code generation is easy in the base case, but I'm still wrapping my mind around template expansion.
I think Wirth's compiler book is more approachable for a beginner. Once you are done with that book you can also read "Project Oberon" from the same author. This way you can understand the implementation of a full graphical OS written from scratch for a custom made RISC processor implemented on a FPGA. The book and source code are available at: http://www-oldurls.inf.ethz.ch/personal/wirth/ProjectOberon/
In my opinion, the most tedious part of writing this sort of thing from scratch is the x86/x86-64 instruction encoding. Grab the Intel/AMD manuals or an opcode database and have fun. Other projects to study: LLVM, dynasm, asmjit, peachpy, luajit jit, Intel XED, etc.
off topic question: When do you find time do such interesting things as these? I'd really like to dive deeper in this topic. As a CS student, who also got to work as a dev at the same time for a living, I'm even happy to get to sleep 8 hours.
Almost always in evenings, whenever I have time and feel like it. If you're a student, you're probably hosed with deadlines, so pick very small things to try out — things like really simple interpreters.
This particular project luckily turned out to be quick to get working. Also, I kept the scope very small, calling it a day right before getting big ideas.
A nice trick for debugging code generated at runtime is temporarily adding a CC instruction at the beginning of your assembly function, so you don't have to care where in memory your snippet is going to end up
This is a very interesting and well-written article, but I am puzzled by the technique being described as "JIT compilation".
Isn't JIT compilation the process of compiling some sort of intermediate code into machine code at runtime? For example, compiling JVM bytecode into machine code during execution, and using the cached machine code transparently on subsequent calls to that segment of code.
Not to detract from the article, or the interesting techniques and explanation, but I didn't see any compilation other than by gcc, which IMHO makes this AOT rather than JIT compilation.
I do see your point, but I tried to make it clear that this is compilation in an extremely restricted sense: It specializes and executes a function at runtime. And I wanted to cover the core technique of doing just that.
Perhaps I should have included a small example of compiling a linear list of abstract instructions into machine code. But that again would consist of compiling small templates in a coherent way, just like many actual compilers do.
Anyway, point taken, and maybe I'll expand the article or follow up on it.
My subsequent comment acknowledges that JIT means more than compilation at runtime. In .NET desktop, it means compilation of all CIL to native on every run, just before execution. To me, that's just executing the second pass of a compiler at launch time, but Anders Hejlsberg says it's JIT[1], and I'm not going to argue :)
Anyway, I've made a follow-up post that shows how to JIT compile a subset of Python, directly based on the previous techniques: https://csl.name/post/python-compiler/
For me the key concept that comes to mind for a JIT compiler, is profiling at runtime to decide what to compile and how aggressively which this article doesn't deal with. The other big thing is how to make the profiling and compilation not be too expensive to negate itself.
The article author seems to be using GCC not at runtime, but more as a generator for templates (at the machine code level) for the rest of the software to use. You don't really want to be assembling a template at runtime in this context.
Mixed-mode execution (interpreter+JIT when justified) isn't the only way to go.
.Net doesn't work that way, for instance. Unlike Java's HotSpot, .Net never interprets CIL instructions. They're always compiled down to native code for execution.
I wasn't aware of this. To be clear, it's not 100% accurate to say "never", because it depends on the .NET environment, but point taken.
I now understand that .NET (on desktop, anyway) compiles all intermediate code (CIL) to native code on every run. This is a way different kind of JIT than Java HotSpot, which executes intermediate code (bytecode) initially, profiles intermediate code execution, and only compiles the hot spots to native code after a while.
It seems like what "JIT" means has evolved since the time it meant "JVM HotSpot".
".Net" and "Common Language Runtime" refer specifically to Microsoft's implementation. The spec is called the "Common Language Infrastructure".
Wikipedia seems to use "Common Language Runtime" in both ways - the Mono article uses it in the generic sense, whereas the CLR's own article explicitly states that it refers to .Net's implementation.
.Net also has ngen.exe which isn't really JIT at all, it's traditional 'offline' ahead-of-time compilation. In a sense that's rather like the traditional Unix compile-and-install model. One sees the oxymoron "pre-JIT" used to describe this.
This is another difference from HotSpot, which historically never cached native code to disk, though other JVMs have always been able to do this (Excelsior Jet, and GCJ, for instance).
Apparently AOT may be coming to Java though - it may even already be here, I'm not quite sure.
I recall once reading that one of the reasons HotSpot didn't cache native code, was the issue of securing that cache. Not sure what they've done to address that, nor what .Net does; it seems a valid concern.
That's an interesting read re. Hejlsberg. Worth noting that Mono takes them up on that and does feature an interpreter.
Does anybody know the easiest way to compile to JVM bytecode?
I have a Scheme interpreter written in Kotlin, what is the best way to compile it, instead of interpreting? Where do I start?
When used at the end of a function, when all locals go out of scope, it is actually doing nothing.