Dynamic vs Static Compiler (JavaScript)

Asked 24/8, 2011 at 18:55 Answered 24/8, 2011 at 19:9

javascript static-analysis dynamic-languages compiler-construction

I'm currently writing a JavaScript compiler in ANTLR+Java.

I've read questions here on Stack Overflow on how to proceed with the execution - and the answer is always that it would be way too hard to do a static compilation (without JIT-information) of a dynamic language - but why is that exactly? There are of course the obvious "type resolving" problem and in JavaScript maybe a problem with the eval function - but are there other reasons? (because they don't seem too hard to overcome pure statically (no JITS))

I'm excluding JIT-based compilation because I figure it would be too hard for me to implement.

I have some experience in writing static compilers with a byte-code execution.

UPDATE:

All your answers are really helpfull understanding the problem. To clarify does this mean that JavaScript is harder to implement than other dynamic languages?

And does this also means that im better of using a Tree-based interpreter than e.g. Byte-code (if we forget about the property that JS always is shipped in raw source code - hence adding extra time for generating and IR and afterwards execute it)? - or should they be about equally easy / hard to do?

(Im new to SOF; dont know if this is the preferred way to update a question?)

Granulocyte answered 24/8, 2011 at 18:55 Comment(0)

There are lots of ways this conversation could go. Here's one direction. In javascript, nearly everything is an object and properties or methods can be added to any object at run-time. As such, you don't know at compile time what methods or properties will or won't be attached to an object. As such, everything has to be looked up at run-time.

For example:

var myObj = {};

function configureObject() {
    if (something in the environment) {
        myObj.myfunc = function () {alert("Hi");}
    } else {
        myObj.myfunc = function () {document.write("Hello");}
    }
}

Now, sometime later in the code you call myObj.myfunc(); It is not known at compile time what myfunc is or whether it's even an attribute of myObj. It has to be a run-time lookup.

In another example, take this line of code:

var c = a + b;

What his means depends entirely upon the types of a and b and those types are not known at compile time.

If a and b are both numbers, then this is an addition statement and c will be a number.

If either a or b is a string, then the other will be coerced to a string and c will be a string.

You can't precompile this kind of logic into native code. The execution environment has to record that this is a request for the addition operator between these two operands and it has to (at runtime) examine the types of the two operands and decide what to do.

Astral answered 24/8, 2011 at 19:2 Comment(1)

+1 This. Screw turing machines, there are enough problems with real-world code. The best thing you can hope to compile this into (statically, that is) is an unrolled version of what an interpreter would do anyway. – Frentz 25/8, 2011 at 7:10

The challenge with writing a static JavaScript compiler is that it is in general undecidably hard to determine what objects are being referenced at any program point or what functions are being called. I could use the fact that JavaScript is dynamic to decide which function to call based on the output of some Turing machine. For example:

var functionName = RunTuringMachineAndReportOutputOnTape(myTM, myInput);
eval(functionName + "();");

At this point, unless you have advance knowledge about what myTM and myInput are, it is provably impossible to decide what function will be invoked by the call to eval, since it's undecidable to determine what is on a Turing machine's tape if it halts (you can reduce the halting problem to this problem). Consequently, no matter how clever you are, and no matter how good of a static analyzer you build, you will never be able to correctly statically resolve all function calls. You can't even bound the set of functions that might be called here, since the Turing machine's output might define some function that is then executed by the above code.

What you can do is compile code that, whenever a function is called, includes extra logic to resolve the call, and possibly uses techniques like inline caching to speed things up. Additionally, in some cases you might be able to prove that a certain function is being called (or that one of a small number of functions will be called) and can then hardcode in those calls. You could also compile multiple versions of a piece of code, one for each common type (object, numeric, etc.), then emit code to jump to the appropriate compiled trace based on the dynamic type.

Opus answered 24/8, 2011 at 19:3 Comment(3)

There are any number of languages that are statically compiled but that cannot statically resolve all function call sites. In C, castable function pointers have this effect. All compilers for widely used languages solve this problem by punting some subset of call-site resolutions until runtime using a variety of mechanisms including simple pointer dereference, virtual tables, dynamic code loading, hashtable lookup, among others. There is nothing special about JavaScript that prevents this as evidenced by the fact that it has been done. – Argo 24/8, 2011 at 19:55

@Mike Samuel- Ah, I see what you're saying. I think I was interpreting the OP's question to say that he could statically resolve function calls and accesses, which clearly doesn't work. However, I think that my answer addresses this - I describe in the second half of the answer potential ways that you can compile code when you don't know what you're calling. Is this insufficient? – Opus 24/8, 2011 at 19:56

Fair enough. Only the OP can clarify their question. – Argo 24/8, 2011 at 19:57

V8 does that. See Compile JavaScript to Native Code with V8

With EcmaScript 3 and 5 non-strict there are a number of wrinkles around scopes which you don't run into in other dynamic languages. You might think that it is easy to do compiler optimizations on local variables, but there are edge cases in the language when it is not, even ignoring eval's scope introspection.

Consider

function f(o, x, y) {
  with (o) { return x + y + z; }
}

when called with

o = {};
o = { z: 3 };
o = { x: 1, z: 2 };
Object.prototype.z = 3, o = {};

and according to EcmaScript 3,

x = (function () { return toString(); })()

should produce quite a different result from

x = toString();

because EcmaScript 3 defines an activation record as an object with a prototype chain.

Argo answered 24/8, 2011 at 19:9 Comment(0)

Recommended topics

Hot tags