I'm writing a chapter for AOSA on something like this: a self-hosting compiler of a subset of Python to Python bytecode. It'll present the full code (about the same length) and try to explain it well for people not into compilers yet, but in the meantime I recommend the Wirth book.
What's your favorite shorter intro? I'd especially like to reference other educational compilers that are self-hosting.
'Compiler Construction' by Niklaus Wirth[1] is a pretty good book too. It's got the hands-on feel of Crenshaw's book with a quick but, not too superficial, introduction to theory. The book is little more than 100 pages long.
The compiler is written in Oberon however, which is also the source language (actually the source language is Oberon-0, a subset) but, Oberon is super simple and can be learned on the go.
The Wirth book gives a fairly quick overview that other compiler books can expand on. If you started with (say) the dragon book, you'll be wading through 200 pages about automata theory and parsing algorithms before you even get to anything else. Wirth provides a twenty-page overview, recommends starting with recursive descent parsing, and just moves on to the next issue.
I recommend starting with the Wirth book, then Andrew Appel's _Modern Compiler Implementation in ML_. And don't forget about the exercises!
As I wrote elsewhere, I use this tiny compiler as an introduction to the course, which goes into much more details and of course covers how to compile realistic programming languages (with multiple types of data, variables, conditional and loop control structures, functions, pointers, list and arbitrary structures, etc.) down to assembly code.
My first introduction to this subject was “Let’s Build a Compiler” by Jack Crenshaw. Even though it’s quite old now, I still highly recommend if you’re looking for a gentle, no-frills introduction: https://compilers.iecc.com/crenshaw/
Writing a toy compiler, the entire pipeline end-to-end, for a language no matter how small. I can't think of another exercise that covers so many good issues to solve, and good practices for solving them.
The tiniest compiler book I have ever written. It covers all steps in the compilation process from high level language (T3X) to executable (ELF). Lots of diagrams, lots of code, zero theory. The tour through the compiler itself spans less than 100 pages. All code in the book is provided under the CC0 license (public domain). It can be downloaded below."
Practical Compiler Construction
This book offers a tour through the full compiler for a clean and sane subset of the C programming language (C89), covering lexical analysis, parsing, semantics, code generation, optimization, and runtime support, including lots of clarifying annotations and diagrams. The SubC compiler discussed in the book is in the public domain and can be downloaded below.
Kilo LISP, a kilo byte-sized LISP system
Kilo LISP is a purely symbolic LISP system that runs in 64K bytes of memory. It is written in C89 and compiles fine using SubC (below), Turbo C, or any modern C compiler. The code is about 25K bytes of comprehensible C and LISP.
The SubC Compiler
SubC is a clean, fast, and simple compiler for a subset of C89 that can compile itself on various BSDs, Linux, Windows, and other systems. It also cross-compiles to DOS. Its code is in the public domain.
The T3X Compilers
T3X is a small, portable, procedural, block-structured, recursive, almost typeless, and to some degree object-oriented programming language. It targets the Tcode machine, the 8086 under DOS, and the 386 and Alpha processors under Unix.
T3X9 is a tiny, block-structured, procedural language. Its compiler can compile itself in the blink of an eye (0.05s on a 750MHz notebook). It currently generates ELF executables for FreeBSD-x86. T3X9 is a subset of T3X.
The XT3X Compiler and Virtual Machine
A T3X9 compiler and Tcode machine extended with functions for X11 graphics and OSS sound. Intended for writing simple video games as they were ubiquituos in the 1970's and 80's."
Disclaimer: I am slightly biased towards small compilers -- but, that being said, there's a ton of other fascinating things on this page...
It's great to see this available and updated. It was one of my favorite compiler books growing up because it's so small and straight-forward. I feel it's a much better starting point for someone actually interested in building a compiler than say, the Dragon book.
Even more immediate and hands-on is Abdulaziz Ghuloum's "An Incremental Approach to Compiler Construction" paper: http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf (which chides Wirth for oversimplifying)
As very much a non-expert in compiler design, I found two books to be helpful introductory reads (neither are free though). I'm not sure whether someone proficient in the field would recommend them as my knowledge in the area is small, but I enjoyed them.
A Wirth-style compiler for a Wirth-style language can be written fairly rapidly by a single person, and the languages, compilers and runtime systems can be easily understood. The Oberon-07 language report, including the grammar, is 17 pages.
If simplicity doesn't matter to you, then this won't appeal to you, and that's fine. I mostly use Ruby these days, whose parser (in MRI anyway) is likely longer than the whole Oberon-07 language report alone (I haven't measured, but ca. Ruby 1.8.6 the parser alone was 6k lines+), so I've surrendered the simplicity, but it still stands out to me as an ideal and inspiration.
Wirth was perhaps too ascetic in his insistence on keeping the compilers minimal, but you can fine less extreme versions carried on in extensions done in the PhD dissertations of students in his group.
What's your favorite shorter intro? I'd especially like to reference other educational compilers that are self-hosting.
reply