Is it possible to use Perl's Marpa parser for a public network server?

Asked 10/11, 2015 at 12:46 Answered 10/11, 2015 at 13:56

The documentation of Perl's Marpa parser contains the following section about tainted data:

Marpa::R2 exists to allow its input to alter execution in flexible and powerful ways. Marpa should not be used with untrusted input. In Perl' s taint mode, it is a fatal error to use Marpa's SLIF interface with a tainted grammar, a tainted input string, or tainted token values.

I am not sure, if I understand the consequences of this limitation. I understand, that the grammar must not be tainted. But I do not understand that the input must not be tainted. For me it is the task of the parser to validate the input. It sounds unreasonable to me that a parser has to trust its input.

Is it really that way? Is it impossible to implement any kind of public network service with Marpa?

I ask this because one of the reference use cases is the Marpa HTML parser and it seems to me contradictory to use a parser for HTML, which must not be used with tainted data although about 99,99% of all HTML is possibly tainted.

Can anybody explain this contradiction?

Incurious answered 10/11, 2015 at 12:46 Comment(3)

There are ways how to untaint data in Perl. – Fadiman 10/11, 2015 at 12:54

@Fadiman yes by parsing it. – Incurious 10/11, 2015 at 12:56

Laundering-and-Detecting-Tainted-Data. You can e.g. check the length of the input at least. – Fadiman 10/11, 2015 at 12:57

Marpa is actually safer than other parsers, because the language it parses is exactly that specified by the BNF. With regexes, PEG, etc., it's very hard to determine what language is actually parsed. In practice programmers tend to get a few test cases working and then give up.

In particular, the parsing of unwanted inputs could be a major security issue -- with traditional parsers you usually don't know everything you are letting through. Rarely does a test suite check to see if inputs which should be errors are in fact accepted. Marpa parses exactly the language in its specification -- nothing less and nothing more.

So why the scare language about taint mode? Marpa, in its most general case, can be seen as a programming language, and has exactly the same security issues. Allowing the user to execute arbitrary code is by definition insecure, and it is exactly what C, Perl, Marpa, etc. do by design. You cannot give an untrusted user a general language interface. This would be clear for C, Python, etc., but I thought someone might overlook it in the case of Marpa. Hence the scare language.

Marpa is IMHO more secure than competing technologies. However, in the most general case, that is not secure enough.

Kristlekristo answered 10/11, 2015 at 13:56 Comment(6)

Example: I use Marpa to implement a program adding integers. The program listens on port 90 reads two numbers and returns the sum. It is the intention of the program to expose the addition, calculated on my server, to the world. Is this doable with Marpa without any security issues? Or is it possible to attack even the most simple parser generated by Marpa, because the Marpa's parse strategy itself is attackable? – Incurious 10/11, 2015 at 15:11

I take it you are not allowing the user to generate Marpa parsers, only allowing them access to a specific one. Yes, this can be made safe. And, as I say, it should be safer than other parsers where you do not really know what the language you are parsing actually is. – Kristlekristo 10/11, 2015 at 16:3

It might be useful if I explain the final sentence quoted: " In Perl' s taint mode, it is a fatal error to use Marpa's SLIF interface with a tainted grammar, a tainted input string, or tainted token values." In choosing taint strategies, I went with being fascist. My reasoning is that if I was liberal in the taint strategy, I might be giving the user a false sense of security. And if I was fascist, I would force them to turn taint off, which would make them aware they needed to be careful. – Kristlekristo 10/11, 2015 at 22:0

Opposing Perl's taint mode does not help. It is not the promise to be careful, which makes a program more secure. It is the certainty not having forgotten anything to check, which makes the program more secure. – Incurious 11/11, 2015 at 8:22

I didn't think I was opposing taint mode, as much as interpreting it. The only real alternatives to the approach I took that I can think of are 1) ignoring taintedness altogether; and 2) not allowing any use of Marpa when the taint flag is set. The approach I took tries to use the taintedness feature in a way that helps the user check security. I think it meets your criteria for making the program more secure -- it helps ensure the user has not forgotten any checks. – Kristlekristo 11/11, 2015 at 15:3

stackoverflow discourages comment threads from turning into what it considers "discussions". Marpa has a google group. More active is our IRC channel: #marpa on irc.freenode.net. A lot of Marpa-savvy folks hang out or lurk on the IRC channel, and I often find their input helpful. – Kristlekristo 11/11, 2015 at 15:10

taint mode is a perl optional setting that says - treat user input as untrusted. It stops you using any "tainted" variables - such as those read directly from STDIN or ENV in certain functions, because doing so is dangerous.

The typical example being code injection exploits:

That's all "taint mode" does - it enforces running a sanitisation prior to using untrusted input in a risky way.

untainting is straightforward - all you need do is apply a regular expression filter to your source data, such that any 'dangerous' metacharacters are excluded. (It should be noted - perl doesn't actually know what is 'dangerous' and what isn't - it assumes you're not being an idiot and just 'matching' everything)

This will error:

#!/usr/bin/env perl -T
use strict;
use warnings;

my $tainted = $ENV{'USERNAME'};
system ( "echo $tainted" );

Because I'm passing an untrusted variable through to "system" and it might have embedded code injection.

Insecure dependency in system while running with -T switch at

(It might also complain about insecure path)

So to untaint, I need to sanitise. A reasonable sanitisation would be - username must be only alphanumeric:

#!/usr/bin/env perl -T
use strict;
use warnings;

$ENV{'PATH'} = '/bin'; # an untainted value 

my $tainted = $ENV{'USERNAME'};
my ( $untainted ) = $tainted =~ m/(\w+)/g;
system ( "echo $untainted"); # no error now

And because I have used a regex - perl assumes I haven't done something boneheaded (like (.*)) and thus considers the data untainted.

Why is this important? Well, it depends what your parser does. It's not uncommon for parsers - by their nature - to get 'broken' by invalid input. See the above, for example - where escaping some inline SQL bypasses validation.

In your specific case:

taint mode is optional. You should use it when you're getting untrusted input (e.g. from potentially malicious users) but it's perhaps more trouble than it's worth for your own use.
Filtering HTML to validate length and character set is probably sensible. For example - checking it's an "ascii compatible character encoding".

Fundamentally though I think you're overthinking what taint checking is - it's not an exhaustive validation method - it's a safety net. All it does is ensure you've done some basic sanitisation before passing user input to an unsafe mechanism. That's to stop ridiculous gotchas like the one I outline - most of these can be caught by a simple regex.

If you're aware of the problem, and aren't concerned about malicious user input, then I don't think you need to be worried overly. A character whitelist will suffice, and then parse away.

Pierrepierrepont answered 10/11, 2015 at 13:5 Comment(5)

Nice article about Perl's concept of data being tainted but it does not answer my question at all. Untainting is done in the most basic way by regular expressions, which are Chomsky type-3 grammars. But sometimes this is not enough and we need a Chomsky type-2 grammar to verify the input. Now we start using parser generators. The parser generated by YACC, Bison, ANTLR, Marpa is the piece of software, which has to do the verification, if a regular expression is not enough any more. But Marpa says, it is not allowed to do so. For what is a parser generator good for, if not for untainting data? – Incurious 10/11, 2015 at 14:49

In perl untainting is only by regular expression sanitisation. Granted, you can parse and then use a 'null' regex to force the untaint flag. But I think you're over thinking what "taint" actually accomplishes - it's a safety net akin to strict and warnings. It tells you if you're using user input in a (possibly) unsafe way. But that's all. – Pierrepierrepont 10/11, 2015 at 15:9

The lexer of a parser does nothing more but evaluating input data against a set of regular expressions. – Incurious 10/11, 2015 at 15:16

Then you're already doing more than 'taint' checking would mandate. – Pierrepierrepont 10/11, 2015 at 15:17

No I do exactly the same. I validate the input against a set of rules. And if the validation is successful, I accept the input as untainted. The only enhancement is, that I put the regular expressions in some kind of context expressed by the grammar. But in the end it is a regular expression which judges, which input data is accepted as untainted input. – Incurious 10/11, 2015 at 15:24

Recommended topics

Hot tags