Can Perl be "statically" parsed?
Asked Answered
B

5

9

An article called "Perl cannot be parsed, a formal proof" is doing the rounds. So, does Perl decide the meaning of its parsed code at "run-time" or "compile-time"?

In some discussions I've read, I get the impression the arguments stem from imprecise terminology, so please try to define your technical terms in your answer. I have deliberately not defined "run-time", "statically" or "parsed" so that I can get perspectives from people who perhaps define those terms differently to me.

Edit:

This isn't about static analysis. Its a theoretical question about Perl's behaviour.

Blanchette answered 14/8, 2009 at 23:1 Comment(6)
modernperlbooks.com/mt/2009/08/on-parsing-perl-5.htmlBucko
And now, also modernperlbooks.com/mt/2009/08/how-a-perl-5-program-works.htmlOverstreet
Robert P: "Perl 5's execution model [is] definitely not the same as the traditional notion of an interpreter. ". Then he goes on to describe a traditional interpreter...Blanchette
Also from Hacker news: news.ycombinator.com/item?id=770072Melchior
@ Paul Biggar: Part of it is similar to a traditional interpreter. The part where it breaks into execution before it finishes interpreting the rest of the code is not.Overstreet
@Robert P: That has nothing to do with interpreting, its just a language feature. You may as well say that its not a traditional interpreter because it uses $_ (or pick a feature).Blanchette
A
19

Perl has a well-defined "compile time" phase, which is followed by a well-defined "runtime" phase. However, there are ways of transitioning from one to the other. Many dynamic languages have eval constructs that allow compilation of new code during the runtime phase; in Perl the inverse is possible as well -- and common. BEGIN blocks (and the implicit BEGIN block caused by use) invoke a temporary runtime phase during compile-time. A BEGIN block is executed as soon as it's compiled, instead of waiting for the rest of the compilation unit (i.e. current file or current eval) to compile. Since BEGINs run before the code that follows them is compiled, they can influence the compilation of the following code in practically any way (although in practice the main things they do are to import or define subroutines, or to enable strictness or warnings).

A use Foo; is basically equivalent to BEGIN { require foo; foo->import(); }, with require being (like eval STRING) one of the ways to invoke compile-time from runtime, meaning that we're now within compile-time within runtime within compile-time and the whole thing is recursive.

Anyway, what it boils down to for the decidability of parsing Perl is that since the compilation of one bit of code can be influenced by the execution of a preceding piece of code (which can in theory do anything), we've got ourselves a halting-problem type situation; the only way to correctly parse a given Perl file in general is by executing it.

Alysonalysoun answered 14/8, 2009 at 23:31 Comment(1)
More often the compilation of one bit of code can be influenced by the compilation of a preceding piece of code - notably whether an identifier is the name of a package or sub.Coot
J
11

Perl has BEGIN blocks, which runs user Perl code at compile-time. This code can affect the meaning of other code to be compiled, thus making it "impossible" to parse Perl.

For example, the code:

sub foo { return "OH HAI" }

is "really":

BEGIN {
    *{"${package}::foo"} = sub { return "OH HAI" };
}

That means that someone could write Perl like:

BEGIN {
    print "Hi user, type the code for foo: ";
    my $code = <>;
    *{"${package}::foo"} = eval $code;
}

Obviously, no static analysis tool can guess what code the user is going to type in here. (And if the user says sub ($) {} instead of sub {}, it will even affect how calls to foo are interpreted throughout the rest of the program, potentially throwing off the parsing.)

The good news is that the impossible cases are very corner-casey; technically possible, but almost certainly useless in real code. So if you are writing a static analysis tool, this will probably cause you no trouble.

To be fair, every language worth its salt has this problem, or something similar. As an example, throw your favorite code walker at this Lisp code:

(iter (for i from 1 to 10) (collect i))

You probably can't predict that this is a loop that produces a list, because the iter macro is opaque and would require special knowledge to understand. The reality is that this is annoying in theory (I can't understand my code without running it, or at least running the iter macro, which may not ever stop running with this input), but very useful in practice (iteration is easy for the programmer to write and the future programmer to read).

Finally, a lot of people think that Perl lacks static analysis and refactoring tools, like Java has, because of the relative difficulty in parsing it. I doubt this is true, I just think the need is not there and nobody has bothered to write it. (People do need a "lint", so there is Perl::Critic, for example.)

Any static analysis I have needed to do of Perl to generate code (some emacs macros for maintaining test counters and Makefile.PL) has worked fine. Could weird corner cases throw off my code? Of course, but I don't go out of my way to write code that's impossible to maintain, even though I could.

Joyejoyful answered 14/8, 2009 at 23:18 Comment(6)
So why do you use the terms "run Perl code at compile-time" rather than "compile Perl code at run-time". Whats the distinction? That's why I asked about terminology.Blanchette
So its just the terminology from the Perl community? We would be just as correct to say that the second "compile" occurs during the execution of the BEGIN block, as to say that the first execution occurs during the compile phase of the main code?Blanchette
Yes, though the end of the initial compilation phase is special.Coot
It's not just terminology. Although Perl might run some code in the Compile phase, and maybe compile some code in the Run phase, each of those also have hooks to run at the beginning and end of the Phases. Athough they are a little fuzzy on the inside, they have boundaries where other things happen.Penza
@brian d foy: Yes, but the names given to those phases by the Perl community do not uniquely reflect what the phases do, and the names chosen for those phases would have been just as accurate the other way.Blanchette
@Paul, no, the names reflect the Big Task of each of those phases. The names are purposeful, descriptive, and accurate.Penza
P
5

People have used a lot of words to explain various phases, but it's really a simple matter. While compiling Perl source, the perl intrepreter may end up running code that changes how the rest of the code will parse. Static analysis, which runs no code, will miss this.

In that Perlmonks post, Jeffrey talks about his articles in The Perl Review that go into much more detail, including a sample program that doesn't parse the same way every time you run it.

Penza answered 15/8, 2009 at 17:20 Comment(0)
V
3

C++ has a similar problem in its template system, but that doesn't stop compilers from compiling it. They will just break out or run forever on the corner cases where this sort of argument would apply.

Violinist answered 14/8, 2009 at 23:27 Comment(4)
Yes, well-said. Same idea as my post, and many fewer words :)Joyejoyful
It's not really similar - for C++ templates, all values involved are also compile-time expressions, and they are clearly distinct from runtime expressions. In Perl, with the example given to the linked article, the function could be defined differently depending on e.g. a string user inputs, so the rest of the program would be passed differently from the moment of input onwards. There's nothing even remotely similar in C++.Stroman
@Pavel You can create (almost) the exact analogue of the example in the article in C++ using templates and the declaration/initialization ambiguity. The fact that Perl can punt this to runtime whereas a C++ compiler needs to resolve it at compile time is irrelevant.Violinist
@Segfault: Static analysis happens before runtime.Jylland
O
3

Perl has a compile phase, but it's different than most normal compile phases when it comes to code. Perl's lexer turns the code into tokens, then a parser analyzes tokens to form an op tree. However, BEGIN {} blocks can interrupt this process and allow you to execute code. When doing a use. All BEGIN blocks execute before anything else, giving you a way to set up modules and namespaces. During the overall "compile" of a script, you most likely will use Perl to determine how the Perl module should look when it's done. sub, bare, implies adding it to the glob for the package, but you don't have to. For example, this is a (albeit, odd) way of setting up methods in a module:

package Foo;

use strict;
use warnings;
use List::Util qw/shuffle/;

my @names = qw(foo bar baz bill barn);
my @subs = (
    sub { print "baz!" },
    sub { die; },
    sub { return sub { die } },
);
@names = shuffle @names;
foreach my $index (0..$#subs) {
   no strict 'refs';
   *{$names[$index]} = $subs[$index];
}

1;

You have to interpret this to even know what it does! It's not very useful, but it's not something you can determine ahead of time. But it's 100% valid perl. Even though this feature can be abused, it can also do great tasks, like build complicated subs that all look very similar, programatically. It also makes it hard to know, for certain, what everything does.

That's not to say that a perl script can't be 'compiled' - in perl, compiling is merely determining, what right then, the module should look like. You can do that with a

perl -c myscript.pl

and it will tell you whether or not it can get to the point where it will start executing the main module. You just can't merely know from looking at it 'statically'.

However, as PPI demonstrates, we can get close. Really close. Close enough to do very interesting things, like (almost static) code analysis.

"Run Time", then, becomes what happens after all the BEGIN blocks have executed. (This is a simplification; there is a lot more to this. See perlmod for more.) It's still perl code being run, but it's a separate phase of execution, done after all the higher priority blocks have run.

chromatic has some detailed posts on his Modern::Perl blog:

Overstreet answered 14/8, 2009 at 23:28 Comment(4)
Presumably, you can make the BEGIN block check something on the file system or network, resulting in two parses of an identical program with two different meanings?Blanchette
Absolutely. I've seen (perhaps, misguided) perl developers use the BEGIN block to parse arguments on the command line, and then change the variables that are available for execution based on that. In fact, you don't even have to do it during the 'compile' phase; the above code could be executed multiple times. You could have it in a function, and even change the behavior of the module after it's been "compiled". The malability of perl code is something that's really only rivaled by other dynamic languages; LISP-like languages come to mind as a prime example.Overstreet
OK, but that's just playing with the symbol-table. But a line can't change meaning during the program (in the sense of the cited article), could it?Blanchette
@Paul Biggar: The article isn't talking about the parse-tree for a bit of code changing during runtime, it's talking about the inability (in the general case) to determine the parse tree for that code without executing anything.Haggadah

© 2022 - 2024 — McMap. All rights reserved.