Short version
In the code below, $1
is tainted and I don't understand why.
Long version
I'm running Foswiki on a system with perl v5.14.2 with -T
taint check mode enabled.
Debugging a problem with that setup, I managed to construct the following SSCCE. (Note that I edited this post, the first version was longer and more complicated, and comments still refer to that.)
#!/usr/bin/perl -T
use strict;
use warnings;
use locale;
use Scalar::Util qw(tainted);
my $var = "foo.bar_baz";
$var =~ m/^(.*)[._](.*?)$/;
print(tainted($1) ? "tainted\n" : "untainted\n");
Although the input string $var
is untainted and the regular expression is fixed, the resulting capture group $1
is tainted. Which I find really strange.
The perlsec manual has this to say about taint and regular expressions:
Values may be untainted by using them as keys in a hash; otherwise the only way to bypass the tainting mechanism is by referencing subpatterns from a regular expression match. Perl presumes that if you reference a substring using
$1
,$2
, etc., that you knew what you were doing when you wrote the pattern.
I would imagine that even if the input were tainted, the output would still be untainted. To observe the reverse, tainted output from untainted input, feels like a strange bug in perl. But if one reads more of perlsec, it also points users at the SECURITY section of perllocale. There we read:
when use locale is in effect, Perl uses the tainting mechanism (see perlsec) to mark string results that become locale-dependent, and which may be untrustworthy in consequence. Here is a summary of the tainting behavior of operators and functions that may be affected by the locale:
Comparison operators (
lt
,le
,ge
,gt
andcmp
) […]Case-mapping interpolation (with
\l
,\L
,\u
or\U
) […]Matching operator (
m//
):Scalar true/false result never tainted.
Subpatterns, either delivered as a list-context result or as
$1
etc. are tainted if use locale (but notuse locale ':not_characters'
) is in effect, and the subpattern regular expression contains\w
(to match an alphanumeric character),\W
(non-alphanumeric character),\s
(whitespace character), or\S
(non whitespace character). The matched-pattern variable,$&
,$`
(pre-match),$'
(post-match), and$+
(last match) are also tainted if use locale is in effect and the regular expression contains\w
,\W
,\s
, or\S
.Substitution operator (
s///
) […][⋮]
This looks like it should be an exhaustive list. And I don't see how it could apply: My regex is not using any of \w
, \W
, \s
or \S
, so it should not depend on locale.
Can someone explain why this code taints the varibale $1
?
use locale;
? Wouldn't hurt to send this top5p
using theperlbug
tool. There appears to be a bug in Perl if not a bug in the docs. – HillaryhillbillyScalar::Util qw(tainted)
? – JourneymanScalar::Util::tainted
, it yields the same result. I just pasted the code Foswiki uses here, since it might reduce the dependencies of that code a little bit. I don't know if everyone hasScalar::Util
available, the docs suggest using CPAN for it. – Outherod