Whenever I think that something is impossible in Perl, it usually turns out that I am wrong. And sometimes when I think that something is very difficult in Perl, I am wrong, too. @sln pointed me to the right track
Let's not override \s
just yet, although you could. For the sake of the heirs of your program who expect \s
to mean something specific, instead let's define the sequence \_
to mean "any whitespace character or the _
character" inside a regular expression. The details are in the link above, but the implementation looks like:
package myspace; # redefine \_ to mean [\s_]
use overload;
my %rules = ('\\' => '\\\\', '_' => qr/[\t\n\x{0B}\f\r _]/ );
sub import {
die if @_ > 1;
overload::constant 'qr' => sub {
my $re = shift;
$re =~ s{\\(\\|_)}{$rules{$1}}gse;
return $re;
};
}
1;
Now in your script, say
use myspace;
and now \_
in a regular expression means [\s_]
.
Demo:
use myspace;
while (<DATA>) {
chomp;
if ($_ =~ /aaa\s.*txt/) { # match whitespace
print "match[1]: $_\n";
}
if ($_ =~ /aaa\_.*txt/) { # match [\s_]
print "match[2]: $_\n";
}
if ($_ =~ /\\_/) { # match literal '\_'
print "match[3]: $_\n";
}
}
__DATA__
aaabbb.txt
aaa\_ccc.txt
cccaaa bbb.txt
aaa_bbb.txt
Output:
match[3]: aaa\_ccc.txt
match[1]: cccaaa bbb.txt
match[2]: cccaaa bbb.txt
match[2]: aaa_bbb.txt
The third case is to demonstrate that \\_
in a regular expression will match a literal \_
, like \\s
will match a literal \s
.
$s=qr/[\s_]/;
– EssayCreating Custom RE Engines
from the perlre docs. – Layfield\s
matches[\t\n\x0B\f\r ]
.\x0B
is a vertical tab character or line tabulation. In Unicode it matches another 18 extended characters – Soche(?(DEFINE)(?<MY_PATTERN>...))
mechanism, but that'd end up uglier than[\s_]
– Balzer