How can Lisp be both dynamic and compiled?
Asked Answered
G

2

16

Okay, so first to get this out of the way: I have read the following answer:

How is Lisp dynamic and compiled?

but I don't really understand its answer.

In a language like Python, the expression:

x = a + b

Cannot really be compiled, as for the "compiler" it would be impossible to know the types of a and b (as the types are known only at run-time), and therefore how to add them.

This is what makes a language like Python impossible to compile without type declarations, correct? With declarations, the compiler knows that e.g. a and b are integers, and therefore knows how to add them, and translate that into native code.

So how does:

(setq x 60)
(setq y 40)
(+ x y)

work?

Compiled being defined as native ahead-of-time compilation.

EDIT

In reality, this question is more about whether dynamic languages without type declarations can be compiled, and if so, how?

EDIT 2

After much research (i.e. avid Wikipedia browsing) I think I understand the following:

  • dynamic typed languages are languages where the types are checked at run-time
  • static typed languages are languages where the types are checked when the program is compiled
  • type declarations allow the compiler to make the code more efficient because instead of making API calls all the time it can use more native 'functions' (that is why you can add type declarations to Cython code to speed it up, but don't have to, because it can still just call the Python libraries in the C code)
  • there are no datatypes in Lisp; therefore no types to be checked (the type is the data itself)
  • Obj-C has both static and dynamic declarations; the former are type-checked at compile time, the latter at run-time

Correct me if I am wrong on any of the above points.

Godfearing answered 21/8, 2013 at 12:16 Comment(11)
And how can Objective-C be dynamic and compiled? Well... dynamism vs. static nature and "compiled-ness" don't describe the same property. A language can be statically typed and compiled (like C), statically typed and interpreted (like C++ interpreted by Cling), dynamically typed and compiled (like Objective-C, Lisp, or JIT-ed JavaScript) and dynamically typed and interpreted (like Python, PHP, Lua, ...). They really have nothing to do with each other. The fact that static typing makes it easier for a compiler to catch errors and generate more efficient code is irrelevant.Memoried
As to "how to add them": polymorphism. The compiler generates code that does some kind of dynamic trickery based on the (run-time) types of a and b.Memoried
Then why does compiled Python need type annotations? And doesn't Obj-C have type annotations?Godfearing
Objective-C doesn't have type annotations, but it has declarations, just like C. Does it really need them? Not sure. Objective-C objects can, at least, be queried for their class at runtime.Memoried
sorry, that's what I meantGodfearing
but how does the compiler know the types of the variables, to be able to translate to the respective machine code? TGodfearing
It doesn't know the types (at least not in Objective-C). Check out some papers on polymorphism.Memoried
Lisp by default has no data types for variables, thus it can't check them. The data itself has the type information.Handley
The statement "there are no datatypes in Lisp" is wrong. There are datatypes, but they are not a property of variables but of values (which might be referred to by a variable). For example, if you (let ((n 8))), then n does not have a type, but the value 8, which n happens to be bound to in the scope of this let, is of type integer.Relaxation
@Godfearing The compiler doesn't need to know at compile time. It might require some type before computing though. You might be interested in 90 minute Scheme to C compiler talk which comes with code. It's not incremental and not so sophisticated, but pretty impressive by Marc Feeley. (author of Gambit)Wendeline
@Godfearing Matt Might has a compiler too that has more types.Wendeline
H
30

Example Code:

(setq x 60)
(setq y 40)
(+ x y)

Execution with a Lisp interpreter

In an Interpreter based Lisp above would be Lisp data and the interpreter looks at each form and runs the evaluator. Since it is running of the Lisp data structures, it will do this every time when it sees above code

  • get the first form
  • we have an expression
  • it is a SETQ special form
  • evaluate 60, the result is 60
  • look up the place for the variable x
  • set the variable x to 60
  • get the next form... ...
  • we have a function call to +
  • evaluate x -> 60
  • evaluate y -> 40
  • call the function + with 60 and 40 -> 100 ...

Now + is some piece of code which actually finds out what to do. Lisp typically has different number types and (almost) no processor has support for all those: fixnums, bignums, ratios, complex, float, ... So the + function needs to find out what types the arguments have and what it can do to add them.

Execution with a Lisp compiler

A compiler will simply emit machine code, which will do the operations. The machine code will do everything the interpreter does: checking the variables, checking the types, checking the number of arguments, calling functions, ...

If you run the machine code, it is much faster, since the Lisp expressions don't need to be looked at and interpreted. The interpreter would need to decode each expression. The compiler has that done already.

It is still slower than some C code, since the compiler does not necessarily know the types and just emits the fully safe and flexible code.

So this compiled Lisp code is much faster than the interpreter running the original Lisp code.

Using an optimizing Lisp compiler

Sometimes it is not fast enough. Then you need a better compiler and tell the Lisp compiler that it should put more work into the compilation and create optimized code.

The Lisp compiler might know the types of the arguments and variables. You can then tell the compiler to omit runtime checks. The compiler can also assume that + is always the same operation. So it may inline the necessary code. Since it knows the types, it may only generate the code for those types: integer addition.

But still the semantics of Lisp are different from C or machine operations. A + not only deals with various number types, it will also automatically switch from small integers (fixnums) to large integers (bignums) or signal errors on overflows for some types. You can also tell the compiler to omit this and just use a native integer addition. Then your code will be faster - but not as safe and flexible as normal code.

This is an example for the fully optimized code, using the 64Bit LispWorks implementation. It uses type declarations, inline declarations and optimization directives. You see that we have to tell the compiler a bit:

(defun foo-opt (x y)
  (declare (optimize (speed 3) (safety 0) (debug 0) (fixnum-safety 0))
           (inline +))
  (declare (fixnum x y))
  (the fixnum (+ x y)))

The code (64bit Intel machine code) then is very small and optimized for what we told the compiler:

       0:      4157             push  r15
       2:      55               push  rbp
       3:      4889E5           moveq rbp, rsp
       6:      4989DF           moveq r15, rbx
       9:      4803FE           addq  rdi, rsi
      12:      B901000000       move  ecx, 1
      17:      4889EC           moveq rsp, rbp
      20:      5D               pop   rbp
      21:      415F             pop   r15
      23:      C3               ret   
      24:      90               nop   
      25:      90               nop   
      26:      90               nop   
      27:      90               nop   

But keep in mind above code does something different from what the interpreter or safe code would do:

  • it only calculates fixnums
  • it does silently overflow
  • the result is also a fixnum
  • it does no error checking
  • it does not work for other numeric datatypes

Now the unoptimized code:

       0:      49396275         cmpq  [r10+75], rsp
       4:      7741             ja    L2
       6:      4883F902         cmpq  rcx, 2
      10:      753B             jne   L2
      12:      4157             push  r15
      14:      55               push  rbp
      15:      4889E5           moveq rbp, rsp
      18:      4989DF           moveq r15, rbx
      21:      4989F9           moveq r9, rdi
      24:      4C0BCE           orq   r9, rsi
      27:      41F6C107         testb r9b, 7
      31:      7517             jne   L1
      33:      4989F9           moveq r9, rdi
      36:      4C03CE           addq  r9, rsi
      39:      700F             jo    L1
      41:      B901000000       move  ecx, 1
      46:      4C89CF           moveq rdi, r9
      49:      4889EC           moveq rsp, rbp
      52:      5D               pop   rbp
      53:      415F             pop   r15
      55:      C3               ret   
L1:   56:      4889EC           moveq rsp, rbp
      59:      5D               pop   rbp
      60:      415F             pop   r15
      62:      498B9E070E0000   moveq rbx, [r14+E07]   ; SYSTEM::*%+$ANY-CODE
      69:      FFE3             jmp   rbx
L2:   71:      41FFA6E7020000   jmp   [r14+2E7]        ; SYSTEM::*%WRONG-NUMBER-OF-ARGUMENTS-STUB
  ...

You can see that it calls a library routine to do the addition. This code does all the Interpreter would do. But it does not need to interpret Lisp source code. It's already compiled to the corresponding machine instructions.

Why is compiled Lisp code fast(er)?

So, why is compiled Lisp code fast? Two situations:

  • unoptimized Lisp code: the Lisp runtime system is optimized for dynamic data structures and code does not need to be interpreted

  • optimized Lisp code: the Lisp compiler needs information or infers it and does a lot of work to emit optimized machine code.

As a Lisp programmer you would want to work with unoptimized, but compiled, Lisp code most of the time. It is fast enough and offers a lot comfort.

Different execution modes offer choice

As a Lisp programmer we have the choice:

  • interpreted code: slow, but easiest to debug
  • compiled code: fast at runtime, fast compilation, lots of compiler checks, slightly more difficult to debug, fully dynamic
  • optimized code: very fast at runtime, possibly unsafe at runtime, lots of compilation noise about various optimizations, slow compilation

Typically we only optimize those portions of code which need the speed.

Keep in mind that there are a lot of situations where even a good Lisp compiler can't do wonders. A fully generic object-oriented program (using the Common Lisp Object System) will almost always have some overhead (dispatching based on runtime classes, ...).

Dynamically typed and Dynamic are not the same

Note also that dynamically typed and dynamic are different properties of a programming language:

  • Lisp is dynamically typed because type checks are done at runtime and variables by default can be set to all kinds of objects. For this Lisp also needs types attached to the data objects themselves.

  • Lisp is dynamic because both the programming language Lisp and the program itself can be changed at runtime: we can add, change and remove functions, we can add, change or remove syntactic constructs, we can add, change or remove data types (records, classes, ...), we can change the surface syntax of Lisp in various ways, etc. It helps that Lisp is also dynamically typed to provide some of these features.

User interface: compiling and disassembling

ANSI Common Lisp provides

  • two standard functions to compile code: compile and compile file
  • one standard function to load source or compiled code: load
  • one standard function to disassemble code: disassemble
Handley answered 21/8, 2013 at 12:50 Comment(3)
TL;DR: Paragraph #4 is the essence.Memoried
'answered 18mins ago' ? Damn, I'm a slow typist :) Very nice explanation !Mikkel
Since I didn't see links in the answer note that in Common Lisp, you can compile code with compile and investigate the result with disassemble.Lewis
M
7

Compilation is a simple translation from one language to another. If you can express the same thing in language A and language B, you can compile this thing expressed in language A into the same thing in language B.

Once you have expressed your intent in some language, it is executed by being interpreted. Even when doing C, or some other compiled language, your statement is:

  1. Translated from C -> Assembly language
  2. Translated from Assembly -> machine code
  3. Interpreted by the machine.

A computer is actually an interpreter for a very basic language. Since it is so basic and so hard to work with, people came up with other languages which are easier to work with, and can be easily translated into equivalent statements in machine code (e.g. C). Then, you can hijack the compilation phase by performing the translation 'on-the-fly' as JIT compiler do, or by writing your own interpreter which executes directly statement in your high-level language (e.g. LISP or Python).

But note that interpreter are just a shortcut to execute your code directly ! If instead of executing the code, the interpreter printed whatever call it would be making, would it to execute the code, you would have... a compiler. Of course, that would be a very stupid compiler, and it would not make use of most of the information it has.

Actual compilers will try to gather as much information as they can from the whole program before generating the code. For instance, the following code:

const bool dowork = false;

int main() {
    if (dowork) {
        //... lots of code go there ... 
    }
    return 0;
}

Will in theory generate all the code inside the if branch. But a clever compiler will probably consider it unreachable and just ignore it, making use of the fact that it knows everything in the program and knows that dowork will always be false.

In addition to that, some language have types, which can help to dispatch function call, ensure some things at compile-time and help for the translation to machine code. Some languages like C require the programmer to declare the type of their variables. Others like LISP and Python just infer the type of the variable when it is set, and panic at runtime if you try to use a value of a certain type if another type is required (e.g. if you write (car 2) in most lisp interpreters, it will raise some error telling you a pair is expected). Types can be used to allocate the memory at compile time (e.g. a C compiler will allocate exactly 10 * sizeof(int) bytes of memory if it is required to allocate a int[10]), but this is not exactly required. In fact, most C programs use pointers to store arrays, which are basically dynamic. When dealing with a pointer, a compiler will generate/link to code which, at runtime, will perform the necessary checks, reallocations, etc. But the bottom line is that dynamic and compiled are not to be opposed. The Python or Lisp interpreters are compiled programs, but still can act on dynamic values. In fact, assembly language itself is not really typed, as the computer can perform any operation on any object, since all it 'sees' are streams of bits, and operations on bits. Higher level languages introduce arbitrary types and limits to make things more readable and prevent you from doing completely crazy things. But this is just to help you, not an absolute requirement.

Now that the philosophical rant is over, let's look at your example:

(setq x 60)
(setq y 40)
(+ x y)

And let's try to compile that to a valid C program. Once that is done, C compilers abound, so we can translate LISP -> C -> machine language, or pretty much anything else. Keep in mind that compilation is only translation (optimisations are cool too, but optional).

(setq 

This allocate a value. But we don't know what is allocated to what. Let's continue

(setq x 60)

Ok, we're allocating 60 to x. 60 is an integer literal, so its C type is int. Since there is no reason to assume x is of another type, this is equivalent to the C:

int x = 60;

Similarly for (setq y 40):

int y = 40;

Now we have:

(+ x y)

+ is a function which, depending on implementations, can take several types of arguments, but we know that x and y are integers. Our compilers knows that there exist an equivalent C statement, which is:

x + y;

So we just translate it. Our final C program:

int x = 60;
int y = 40;
x + y;

Which is a perfectly valid C program. It can get more tricky than this. For instance, if x and y are very big, most LISP won't let them overflow while C will, so you might code your compiler to have its own integer type as array of ints (or whatever you find relevant). If you are able to define common operations (like +) on these types, your new compiler will maybe translate the previous code into this instead:

int* x = newbigint("60");
int* y = newbigint("40");
addbigints(x, y);

With your functions newbigint and addbigints defined elsewhere, or generated by the compiler. It will still be valid C, so it will compile. In fact, your own interpreter is probably implemented in some lower-level language and already has representations for LISP objects in its own implementation, so it can use these directly.

By the way, that is exactly what the Cython compiler does for Python code :)

You can define types statically in Cython to get some extra speed/optimisations, but it is not required. Cython can translate your Python code directly into C, and then into machine code.

I hope that it makes it clearer ! Remember:

  1. ALL code is interpreted, eventually
  2. Compilers translate code into something which is easier/quicker to interpret. They often perform optimisations along the way, but this is not part of the definition
Mikkel answered 21/8, 2013 at 13:9 Comment(1)
(Note: Most C'ish languages, including C++ and C# (the ones that i know of that define both bool and const), reserve do as a keyword. The int main() example probably won't compile. Doesn't make the point any less valid, though.)Knitter

© 2022 - 2024 — McMap. All rights reserved.