Problem with perl multiline matching
Asked Answered
P

6

22

I'm trying to use a perl one-liner to update some code that spans multiple lines and am seeing some strange behavior. Here's a simple text file that shows the problem I'm seeing:

ABCD    START
         STOP    EFGH

I expected the following to work but it doesn't end up replacing anything:

perl -pi -e 's/START\s+STOP/REPLACE/s' input.txt

After doing some experimenting I found that the \s+ in the original regex will match the newline but not any of the whitespace on the 2nd line, and adding a second \s+ doesn't work either. So for now I'm doing the following workaround, which is to add an intermediate regex that only removes the newline:

perl -pi -e 's/START\s+/START/s' input.txt

This creates the following intermediate file:

ABCD    START            STOP    EFGH

Then I can run the original regex (although the /s is no longer needed):

perl -pi -e 's/START\s+STOP/REPLACE/s' input.txt

This creates the final, desired file:

ABCD    REPLACE    EFGH

It seems like the intermediate step should not be necessary. Am I missing something?

Piecrust answered 2/5, 2011 at 21:4 Comment(2)
Your Frequently Asked Question is answered in the very first sentence: "perldoc -q match" --> "I'm having trouble matching over more than one line. What's wrong?"Petrous
/s only affects what . matches, so none of your /ss are neededWinna
S
23

perl -p processes the file one line at a time. The regex you have is correct, but it is never matched against the multi-line string.

A simple strategy, assuming the file will fit in memory, is to read the whole thing (do this without -p):

$/ = undef;
$file = <>;
$file =~ s/START\s+STOP/REPLACE/sg;
print $file;

Note, I have added the /g modifier to specify global replacement.

As a shortcut for all that extra boilerplate, you can use your existing script with the -0777 option: perl -0777pi -e 's/START\s+STOP/REPLACE/sg'. Adding /g is still needed if you may need to make multiple replacements within the file.

A hiccup that you might run into, although not with this regex: if the regex were START.+STOP, and a file contains multiple START/STOP pairs, greedy matching of .+ will eat everything from the first START to the last STOP. You can use non-greedy matching (match as little as possible) with .+?.

If you want to use the ^ and $ anchors for line boundaries anywhere in the string, then you also need the /m regex modifier.

Sarcasm answered 2/5, 2011 at 21:11 Comment(3)
Also can't find any info on the -0. What does that flag do?Breunig
This had been driving me nuts! Thanks so much :)Rocker
Documentation on the -0 argument: perldoc.perl.org/perlrun#-0%5Boctal/hexadecimal%5DFoam
C
23

You were close. You need either -00 or -0777:

 perl -0777 -pi -e 's/START\s+/START/' input.txt
Cinereous answered 3/5, 2011 at 12:53 Comment(3)
And what do -0777 and -00 do? I'm reading the perl manpage, but other than those numbers being octal (which was obvious), I can't find any information. Thanks!Tetrapody
Option -0 changes the record separator. 777 activates slurp mode, if which no record separator is defined, s.t. the entire file is read at once. 0 changes the separator to blank lines.Eyra
Documentation on the -0 argument: perldoc.perl.org/perlrun#-0%5Boctal/hexadecimal%5DFoam
A
6

A relatively simple one-liner (reading the file in memory):

perl -pi -e 'BEGIN{undef $/;} s/START\s+STOP/REPLACE/sg;' input.txt

Another alternative (not so simple), not reading the file in memory:

perl -ni -e '$a.=$_; \
             if ( $a =~ s/START\s+STOP/REPLACE/s ) { print $a; $a=""; } \
             END{$a && print $a}' input.txt
Archfiend answered 3/5, 2011 at 1:40 Comment(0)
W
3
perl -MFile::Slurp -e '$content = read_file(shift); $content =~ s/START\s+STOP/REPLACE/s; print $content' input.txt
Wistful answered 2/5, 2011 at 21:22 Comment(1)
Whyever would you have people use a non-standard module for something that a single simple command-line will take care of completely?Cinereous
P
3

Here's a one-liner that doesn't read the entire file into memory at once:

perl -i -ne 'if (($x = $last . $_) =~ s/START\n\s*STOP/REPLACE/) \
  { print $x; $last = ""; } else { print $last; $last = $_; } \
  print $last if eof ARGV' input.txt
Photovoltaic answered 3/5, 2011 at 1:18 Comment(1)
Nice, although I don't think ARGV is doing anything and can be removed.Mackmackay
F
0

-g is an alias for -0777, and is IMO more readable.

perl -g -pi -e 's/START\s+/START/' input.txt

https://perldoc.perl.org/perlrun#-g

Foam answered 21/3 at 17:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.