What is the *conceptually* smallest *compiler* that can compile itself?
Asked Answered
M

5

18

In the spirit of this question, I'd like to ask a similar question, but about compilers, not about interpreters.

What is the conceptually smallest compiler that can compile its own code?

When I say "conceptually smallest" I mean that it uses only very basic concepts and builds up from there, rather than that it contains very short code. An example of why this is an important distinction is OTCC a very tiny C compiler which is small because it's obfuscated, not necessarily because it is conceptually simple (it may also be conceptually simple, but I don't know; it's obfuscated).

I would also like to add that the following also might be a very conceptually small program, but it doesn't actually tell us anything about what's going on, so it's not really what I'm looking for either:

(writefile argv[2] (generate (parse (readfile argv[1]))))

What I'm really looking for is a language that is:

  1. Turing complete.
  2. Capable of compiling itself.

I'm interested in this because

  1. it would be an interesting case study and
  2. it could be useful as a starting point for bootstrapping compilers.

If it doesn't exist, I may just write it myself. :)

Maintain answered 27/10, 2009 at 12:16 Comment(4)
it's a different question but has some overlap - might wanna have a look --> #1585567Kufic
If you're not sure if OTCC is conceptually simple or not, look at the unobfuscated source code. The symbol names are somewhat terse, but it's decently well-commented.Adowa
possible duplicate of What language features are required in a programming language to make a compiler?Mesentery
Couldn't you just use a language with a built in "compile" command?Hootenanny
P
15

I'm not really clear what you mean by 'conceptually smallest'. Presumably you're not interested in minimal Turing machines or representations in Lambda calculus? If you're talking about physical compiler implementations, then you're really talking about a compiler that generates machine code instructions. TCC, as mentioned by Anthony Mills' comment, is relevant. Another interesting discussion that should have practical application is this detailed description of a bootstrapping compiler written from scratch.

There was an interesting discussion a while back on the comp.compilers newsgroup that's worth checking out.

Papillary answered 27/10, 2009 at 14:48 Comment(0)
L
8

[I know this is a very late entry, but I think this is really relevant].

The smallest self-compiling compiler I know about is Val Schorre's 1963 MetaII compiler. Yes, from Nineteen Sixty Three. (There's a link on that page to his technical paper on the topic). If you like compilers, run to get this paper; its a gas, and its only 10 pages.

This isn't theory; this is practical. His paper provides with the compiler source code (some 20-30 lines IIRC), a description of metacompiling machinery, and a metacompiled program processes the source code and regenerates the exact same metacompiled program. You can replicate this result yourself in 1-2 days of really fun if not mind boggling codeing to implement the metamachine. [I learned to build compilers from this paper back in 1970 by doing exactly this]. Or, you can go play with a modern tutorial on MetaII that has it all prebuilt in JavaScript.

Once you have this metacompiler running, you can extend the syntax and the metamachine easily to bootstrap to larger metacompilers with more features, and/or to generate compilers for real applications. (I built a Pascal like BASIC compiler this way in the early 70s).

You can go the other way: you can start taking things out, and see how much you can remove and still be able to boostrap back up to the MetaII level. I did this once and managed to get rid of about 30% without losing cability or even a lot of expressive power; it dropped to some 20 lines of text and, remarkably, a simpler meta-machine.

A clever fellow named Doug Michels, associated a long time ago with the 1980s (Unix supplier) Santa Cruz Operation, told me that he had gone considerably further and reduced the metacompiler self description to a very small number of characters. I never saw the work so I don't really know how far he got.

[EDIT] Dig, dig, dig... found this gem (on Linkedin):

Bill McKeeman, Adjunct Faculty at Dartmouth said:

Doug was my undergraduate student; his senior thesis assignment was simple: write the shortest, extendable, self-compiling compiler. The front end took 27 characters; the whole thing took 63. It all fit on one IBM card. He published the result.

Dig, dig, dig some more: This seems to be Doug's 27 character paper. See Figure 2. By "front end", McKeeman apparantly means "just the parser"; the paper contains full translators that are a little larger.

You can't get compilers this small unless they are "conceptually simple".

Looseleaf answered 27/2, 2016 at 22:19 Comment(0)
L
7

You don't say what the target machine is or whether the compiler has to exist or just be imagined.

In the world of the imagination, I would say that an adaptation of John McCarthy's metacircular LISP interpreter would come pretty close. You might also want to look at John Reynold's paper Definitional Interpreters for Higher-Order Languages which although dense is a model of simplicity.

In the world of reality, I'd bet on Chez Scheme, but unfortunately the native-code compiler is proprietary and closed source. Still, you can learn from studying the interpreter. Another system worth studying is the Oberon compiler, which was designed to be built and understood by one person, and it is very clean.

Lid answered 27/10, 2009 at 23:31 Comment(0)
M
0

Background At one point, I wanted a small program to compile some Notepad edited scripts and run them on the fly. There is this nice project called "C# Script: The Missing Puzzle Piece". But, that is for professionals. And then, one night I went to do some coding. And came up with a code compiler. But, this was not enough. I wanted to store the source-code for this program into the program itself, and a final spec was to generate this same source code out of the program.

In short:

  1. There is only one executable.
  2. When starting the executable, it generates its own source code.
  3. When starting the executable again, it compiles this source code and executes it, showing the same user interface!

A nice test is to delete the executable and compile the generated source code by using Visual Studio or the command line C# compiler:

 del SelfReplication.exe
 csc SelfReplication.cs
 move SelfReplication.cs SelfReplication-old.cs
 SelfReplication.exe

The last statement generates a SelfReplication.cs file. The old and the new generated files are exactly the same!! A feature of the program is you can change (mutate) the source code, adding new functionality and generating a totally new executable. The new program will be able to replicate itself, including your mutation, in the same way as the original one.

https://www.codeproject.com/Articles/21297/Real-Self-Replicating-Program

Miscall answered 2/8, 2017 at 11:58 Comment(0)
I
0

Maybe some old response but I think this is probably the most simple high-level self-compiling compiler. It's only one file, one pass, zero-dependencies, and native compiler, whose sole objective, as a compiler, is to be able to compile itself: https://github.com/t-edson/CreaTuCompilador

Ivetteivetts answered 9/4, 2023 at 20:33 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Mooney

© 2022 - 2024 — McMap. All rights reserved.