Why are Perl source filters bad and when is it OK to use them?
Asked Answered
L

7

26

It is "common knowledge" that source filters are bad and should not be used in production code.

When answering a a similar, but more specific question I couldn't find any good references that explain clearly why filters are bad and when they can be safely used. I think now is time to create one.

  1. Why are source filters bad?
  2. When is it OK to use a source filter?
Leeds answered 23/11, 2009 at 20:50 Comment(3)
Thanks, everyone. This is now the third result on a google search for "Perl source filters".Leeds
You have fulfilled part of Joel and Jeff's dream :)Diba
As long as this site is speedy, usable and useful, and the ads don't get in the way, I'll be happy to use it and contribute. If that lines J&J's pockets, so be it. But the main thing is that I can help build the Perl community. I don't have huge amounts of solid time to produce modules or contribute to the core. But I can answer questions. I do so here and at Perlmonks.Leeds
N
19

Only perl can parse Perl (see this example):

@result = (dothis $foo, $bar);

# Which of the following is it equivalent to?
@result = (dothis($foo), $bar);
@result = dothis($foo, $bar);

This kind of ambiguity makes it very hard to write source filters that always succeed and do the right thing. When things go wrong, debugging is awkward.

After crashing and burning a few times, I have developed the superstitious approach of never trying to write another source filter.

I do occasionally use Smart::Comments for debugging, though. When I do, I load the module on the command line:

$ perl -MSmart::Comments test.pl

so as to avoid any chance that it might remain enabled in production code.

See also: Perl Cannot Be Parsed: A Formal Proof

Nephron answered 23/11, 2009 at 20:50 Comment(2)
According to $ perl -MO=Deparse -e '@result = (dothis $foo, $bar)' it parses as @result = ($foo->dothis, $bar); Talk about ambiguity. If we predeclare sub dothis with no prototype or a prototype of ($$) or (@) it parses as @result = dothis($foo, $bar). It only parses as @result = (dothis($foo), $bar) if we declare it with a prototype of ($).Catenate
@Chris Lutz: Yup, I remember doing the same thing when I first saw that snippet in the PPI docs. It is a very clever example.Enlarge
G
23

Why source filters are bad:

  1. Nothing but perl can parse Perl. (Source filters are fragile.)
  2. When a source filter breaks pretty much anything can happen. (They can introduce subtle and very hard to find bugs.)
  3. Source filters can break tools that work with source code. (PPI, refactoring, static analysis, etc.)
  4. Source filters are mutually exclusive. (You can't use more than one at a time -- unless you're psychotic).

When they're okay:

  1. You're experimenting.
  2. You're writing throw-away code.
  3. Your name is Damian and you must be allowed to program in latin.
  4. You're programming in Perl 6.
Gabrielgabriela answered 23/11, 2009 at 20:50 Comment(0)
N
19

Only perl can parse Perl (see this example):

@result = (dothis $foo, $bar);

# Which of the following is it equivalent to?
@result = (dothis($foo), $bar);
@result = dothis($foo, $bar);

This kind of ambiguity makes it very hard to write source filters that always succeed and do the right thing. When things go wrong, debugging is awkward.

After crashing and burning a few times, I have developed the superstitious approach of never trying to write another source filter.

I do occasionally use Smart::Comments for debugging, though. When I do, I load the module on the command line:

$ perl -MSmart::Comments test.pl

so as to avoid any chance that it might remain enabled in production code.

See also: Perl Cannot Be Parsed: A Formal Proof

Nephron answered 23/11, 2009 at 20:50 Comment(2)
According to $ perl -MO=Deparse -e '@result = (dothis $foo, $bar)' it parses as @result = ($foo->dothis, $bar); Talk about ambiguity. If we predeclare sub dothis with no prototype or a prototype of ($$) or (@) it parses as @result = dothis($foo, $bar). It only parses as @result = (dothis($foo), $bar) if we declare it with a prototype of ($).Catenate
@Chris Lutz: Yup, I remember doing the same thing when I first saw that snippet in the PPI docs. It is a very clever example.Enlarge
B
10

I don't like source filters because you can't tell what code is going to do just by reading it. Additionally, things that look like they aren't executable, such as comments, might magically be executable with the filter. You (or more likely your coworkers) could delete what you think isn't important and break things.

Having said that, if you are implementing your own little language that you want to turn into Perl, source filters might be the right tool. However, just don't call it Perl. :)

Bearwood answered 23/11, 2009 at 20:50 Comment(5)
In that case, can we implement Perl 6 as a source filter? ;-)P
take a look at some of the Perl6::* modules on cpan, a few of them are source filters :-)Carolanncarole
Perl 6 is explicitly designed to have user-extensible syntax; one of its mottos is "All's fair if you predeclare"Extrauterine
You guys missed the bit about "just don't call it Perl" :)P
Case in point: I just found that a Test::Base::Filter wasn't working as expected because Spiffy only filtered methods that had a space between the name and the curly brace. I love Test::Base, but that was a tough bug to find!Alec
E
6

It's worth mentioning that Devel::Declare keywords (and starting with Perl 5.11.2, pluggable keywords) aren't source filters, and don't run afoul of the "only perl can parse Perl" problem. This is because they're run by the perl parser itself, they take what they need from the input, and then they return control to the very same parser.

For example, when you declare a method in MooseX::Declare like this:

method frob ($bubble, $bobble does coerce) {
  ... # complicated code
}

The word "method" invokes the method keyword parser, which uses its own grammar to get the method name and parse the method signature (which isn't Perl, but it doesn't need to be -- it just needs to be well-defined). Then it leaves perl to parse the method body as the body of a sub. Anything anywhere in your code that isn't between the word "method" and the end of a method signature doesn't get seen by the method parser at all, so it can't break your code, no matter how tricky you get.

Escrow answered 23/11, 2009 at 20:50 Comment(0)
F
3

The problem I see is the same problem you encounter with any C/C++ macro more complex than defining a constant: It degrades your ability to understand what the code is doing by looking at it, because you're not looking at the code that actually executes.

Fomentation answered 23/11, 2009 at 20:50 Comment(7)
What about the macro #define ARRAY_SIZE(x) (sizeof(x)/sizeof((x)[0]))? Does that degrade your ability to understand what the code is doing just by looking at it?Catenate
@Chris: in that case, I would far rather you simply define an inline function than a macro.Congius
@Ether: sizeof won't work as an inline function. Chris's macro has to be a macro.Countershaft
To expand on @Kinopiko's point, if you define ARRAY_SIZE as an inline function, the array argument x will decay to a pointer and the trick in @Chris Lutz's comment will not work.Enlarge
The problem with the macro is that you can call it on anything that has a size and that supports bracket operators. That includes pointers, vectors, and maps, all of which are inappropriate for such a macro. A real function works great: template <typename T, std::size_t N> inline std::size_t size(T(&)[N]) { return N ; } You can't call that function on a pointer; its argument must be an array.Warrantor
@Chris That's a function which you happen to write as a macro for esoteric reasons which side-steps the point. But forget those outer parens and you're in a world of hurt underscoring the danger involved in injecting code. Macros are less dangerous than source filters as they get inserted into the code by the compiler at points where they're used by a the caller. Source filters just rewrite all the code. To best see Brad's point, look at the Perl 5 source code some time. Its more C macros than C.Averse
@Rob - That's a good solution for C++. Some of us still use C, and in the case of C the macro is by far sufficient. It'll break for pointers, but when you're writing C, you should know that already, and it shouldn't be a problem.Catenate
C
2

In theory, a source filter is no more dangerous than any other module, since you could easily write a module that redefines builtins or other constructs in "unexpected" ways. In practice however, it is quite hard to write a source filter in a way where you can prove that its not going to make a mistake. I tried my hand at writing a source filter that implements the perl6 feed operators in perl5 (Perl6::Feeds on cpan). You can take a look at the regular expressions to see the acrobatics required to simply figure out the boundaries of expression scope. While the filter works, and provides a test bed to experiment with feeds, I wouldn't consider using it in a production environment without many many more hours of testing.

Filter::Simple certainly comes in handy by dealing with 'the gory details of parsing quoted constructs', so I would be wary of any source filter that doesn't start there.

In all, it really depends on the filter you are using, and how broad a scope it tries to match against. If it is something simple like a c macro, then its "probably" ok, but if its something complicated then its a judgement call. I personally can't wait to play around with perl6's macro system. Finally lisp wont have anything on perl :-)

Carolanncarole answered 23/11, 2009 at 20:50 Comment(4)
This simply isn't true. in theory a source filter is infinitely more dangerous. Firstly, not all internal CORE functions can be redefined in perl, and the fact that parsing perl requires perl (for the aformentioned reason of prototyping and indirect object notation) it simply isn't fair to say "no more dangerous." A source filter by its very design totally and unavoidably dependent on assumptions whereas code isn't. Additionally, there is a mechanism to warn you or error during compilation if it can be detected that there is a problem, such as that of code-composition.Parapet
@EvanCarroll my point was that any module can manipulate the caller's space in potentially unexpected or dangerous ways so you should always be cautious and prefer well tested modules. i then go on to explain how it is much harder to ensure a that a source filter will be safe. you might have seen that if you read more than the first sentance of my post.Carolanncarole
A module author has to go out of their way to do something really wacky. You choose what's going to effect your caller, everything else is contained. Thus "modular". For a source filter, wackiness is the default. A filter touches every line of code in the caller, you have to be real careful to only effect the ones you mean. Even the simplest source filter contains danger, whereas simple modules do not.Averse
A module might redefine builtins, but the syntax is still Perl. Source filters transfer a potentially non-Perl syntax into Perl syntax.Bearwood
B
1

There is a nice example here that shows in what trouble you can get with source filters. http://shadow.cat/blog/matt-s-trout/show-us-the-whole-code/

They used a module called Switch, which is based on source filters. And because of that, they were unable to find the source of an error message for days.

Berl answered 23/11, 2009 at 20:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.