What characters must I escape in a Perl pre-compiled regex?
Asked Answered
T

3

5

I'm having a hard time determining what characters must be escaped when using Perl's qr{} construct

I'm attempting to create a multi-line precompiled regex for text that contains a myriad of normally escaped characters (#*.>:[]) and also contains another precompiled regex. Additionally I need to match as strictly as possible for testing purposes.

my $output = q{# using defaults found in .config
*
*
Options:
  1. opt1
> 2. opt2
choice[1-2?]: };

my $sc = qr{(>|\s)}smx;
my $re = qr{# using defaults found in .config
*
*
Options:
$sc 1. opt1
$sc 2. opt2
choice[1-2?]: }mx;

if ( $output =~ $re ) {
  print "OK!\n";
}
else {
  print "D'oh!\n";
}

Error:

Quantifier follows nothing in regex; marked by <-- HERE in m/# using defaults found in .config
* <-- HERE 
*
Options:
(?msx-i:(>|\s)) 1. opt1
(?msx-i:(>|\s)) 2. opt2
choice[1-2?]: / at ./so.pl line 14.

Attempting to escape the asterisks results in a failed match (D'oh output). Attempting to escape other pesky chars also results in a failed match. I could continue trying different combos of what to escape, but there's a lot of variations here and am hoping someone could provide some insight.

Thyme answered 14/11, 2008 at 19:50 Comment(0)
H
14

You have to escape the delimiter for qr//, and you have to escape any regex metacharacters that you want to use as literals. If you want those to be literal *'s, you need to escape them since the * is a regex quantifier.

Your problem here is the various regex flags that you've added. The /m doesn't do anything because you don't use the beginning- or end-of-string anchors (^, $). The /s doesn't do anything because you don't use the wildcard . metacharacter. The /x makes all of the whitespace in your regex meaningless, and it turns that line with the # into a regex comment.

This is what you want, with regex flags removed and the proper things escaped:

my $sc = qr{(>|\s)};

my $re = qr{# using defaults found in \.config
\*
\*
Options:
$sc 1\. opt1
$sc 2\. opt2
choice\[1-2\?]: };

Although Damian Conway tells people in Perl Best Practices to always put these options on their regexes, you now see why he's wrong. You should only add them when you want what they do, and you should only add things when you know what they do. :) Here's what you might do if you want to use /x. You have to escape any literal whitespace, you need to denote the line endings somehow, and you have to escape the literal # character. What was readable before is now a mess:

my $sc  = qr{(>|\s)};
my $eol = qr{[\r\n]+};

my $re  = qr{\# \s+ using \s+ defaults \s+ found \s+ in \s+ \.config $eol
\*                    $eol
\*                    $eol
Options:              $eol
$sc \s+ 1\. \s+ opt1   $eol
$sc \s+ 2\. \s+ opt2   $eol
choice\[1-2\?]: \s+
}x;

if ( $output =~ $re ) {
  print "OK!\n";
}
else {
  print "D'oh!\n";
}
Hallah answered 14/11, 2008 at 19:56 Comment(4)
Argh! My understanding of what 's' and 'x' did was inverse of the reality. Hence the 's' missing from $re. But yes, I blame PbP here as well. :)Thyme
The book explains what the options do and why to use them... you can't really blame the book for this. :)Dewberry
I can blame the book. It says "Always use the /x flag" (p 236) and "Always use the /m flag" (p 237). The recommendation of "Always" is wrong.Hallah
Blame solely lies with me :). A quick edit to my .perlcriticrc should remedy this.Thyme
N
7

Sounds like what you really want is Expect, but the thing you are most immediately looking for is the quotemeta operator which escapes all characters that have special meanings to a regex.

To answer your question directly (however), in addition to the unquote character (in this case }) you need to escape at a minimum, .[$()|*+?{\

Neukam answered 14/11, 2008 at 20:5 Comment(1)
Actually, this is being used in conjuction with Expect and Test::More. Just paring down the code for examples sake.Thyme
M
2

Like brian said, you must escape the delimiter and regex metacharacters. Note that when using qr//x (which you are), you must also escape whitespace characters and # (which is a comment marker). You probably don't actually want to use /x here. If you want to be safe, you can escape any non-alphanumeric character.

Mottle answered 14/11, 2008 at 20:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.