Multiline search replace with Perl

Asked 23/6, 2009 at 5:43 Answered 24/3, 2023 at 16:29

I know this kind of questions have been asked already many times before. The reason why I come here again is that I feel like I've missed something simple and fundamental.

Is it possible to make this kind of search-replace routine better. For example without opening same file twice. Also speed related advices are welcome.

Please notice that this works with multiline matches and replaces also multiline strings.

#!/bin/perl -w -0777

local $/ = undef;

open INFILE, $full_file_path or die "Could not open file. $!";
$string =  <INFILE>;
close INFILE;

$string =~ s/START.*STOP/$replace_string/sm;

open OUTFILE, ">", $full_file_path or die "Could not open file. $!";
print OUTFILE ($string);
close OUTFILE;

Nuthouse answered 23/6, 2009 at 5:43 Comment(3)

It seems like you're trying to edit the file in place. That is, open it for reading as well as for writing. Is that correct? – Freeloader 23/6, 2009 at 5:46

Yes, editing file in place. That's the most common use case for me. – Nuthouse 23/6, 2009 at 6:5

I gravitate towards general solutions, but I sometimes need to be reminded that for something like this it can be (and was, today!) worth trying it in an IDE (e.g,. IntellJ's Find In Project + Find-and-Replace) versus the ramp-up, trial-and-error, and checking afterwards for a scripting solution. – Selfimmolating 17/9, 2020 at 20:31

117

This kind of search and replace can be accomplished with a one-liner such as -

perl -i -pe 's/START.*STOP/replace_string/g' file_to_change

For more ways to accomplish the same thing check out this thread. To handle multi-line searches use the following command -

perl -i -pe 'BEGIN{undef $/;} s/START.*STOP/replace_string/smg' file_to_change

In order to convert the following code from a one-liner to a perl program have a look at the perlrun documentation.

If you really find the need to convert this into a working program then just let Perl handle the file opening/closing for you.

#!/usr/bin/perl -pi
#multi-line in place substitute - subs.pl
use strict;
use warnings;

BEGIN {undef $/;}

s/START.*STOP/replace_string/smg;

You can then call the script with the filename as the first argument

$perl subs.pl file_to_change

If you want a more meatier script where you get to handle the file open/close operations(don't we love all those 'die' statements) then have a look at the example in perlrun under the -i[extension] switch.

Forspent answered 23/6, 2009 at 5:57 Comment(11)

How do you convert this one liner to actual perl code? Does it get ugly? – Nuthouse 23/6, 2009 at 6:0

Check the edit, the BEGIN block now ensures that this works on multi-line matches too. – Forspent 23/6, 2009 at 6:47

Alright, can it be written as perl code (not in one-liner)? I want to know that what happens to file opening/writing routines. – Nuthouse 23/6, 2009 at 8:34

regexp /START.*STOP/smg will not match more than once. – Sachasachem 23/6, 2009 at 8:54

Hynek, by not match more than "once" I assume you state that because of the greedy * operator. I left it in just so that once the OP realises the need for *? he/she will have the /g extension ready. – Forspent 23/6, 2009 at 11:37

@muteW can please make it clear what should i do for matching multi-line pattern in perl and what is start and stop here .please reply soon. – Mushy 21/4, 2011 at 9:20

START and STOP are the start and end, respectively, of the regular expression you are trying to match. By undef'ing the input-record separator('$/') we effectively get Perl to slurp in the entire file at once into $_ thereby enabling us to do multi-line substitutions. – Forspent 27/4, 2011 at 22:55

Even shorter version: perl -i -p0e 's/START.*STOP/replace_string/smg' file_to_change (-0 sets the line separator to nul). – Cliff 30/11, 2011 at 19:25

For those who want to know what is undef $/;. It is called "slurp mode". More information here. – Demineralize 24/5, 2015 at 13:47

@Petr Check out the comments above yours. – Gambrel 28/11, 2017 at 7:18

(Note the -0777 option at the top of the code in the question, which makes this work.) – Styrax 20/9, 2021 at 2:54

101

Pulling the short answer from the comments, for anyone looking for a quick one-liner, and the reason Perl is ignoring their RegEx options from the command line.

perl -0pe 's/search/replace/gms' file

Without the -0 argument, Perl processes data line-by-line, which causes multiline searches to fail.

Gambrel answered 28/11, 2017 at 7:17 Comment(4)

Perfect. And if it does not seem to work, try with \R (matches all kinds of end of line) instead of \n. – Yaroslavl 11/3, 2018 at 15:14

For me the 0-switch was the crucial thing. Thanks and +1 – Vlada 24/10, 2018 at 20:42

perl -0777 -i -pe 's/search/replace/' 1.h work for me on macosx – Loireatlantique 15/4, 2020 at 3:19

on perls I worked, . does not not include \n, so I had to use [\s\S]*. I wonder why nobody here mentioned it. – Pansir 30/3, 2022 at 6:43

Considering that you slurp in the whole contents of the file with:

local $/ = undef;

open INFILE, $full_file_path or die "Could not open file. $!";
$string =  <INFILE>;
close INFILE;

And then do all the processing with $string, there's no connection between how you handle the file and how you process the contents. You'd have an issue if you opened the file for writing before you were done reading it, since opening a file for writing creates a new file, discarding the previous contents.

If all you're trying to do is save on open/close statements, then do as Jonathan Leffer suggested. If your question is about multiline search and replace, then please clarify what the problem is.

Freeloader answered 23/6, 2009 at 6:50 Comment(4)

It's about generic multiline search and replace. Is it really fine that I open the same file pointer again even if the file is very huge? In one-liners there seem to be no need to open same file twice. I'm still missing something here. Maybe I should see Jonathan's example in practice. – Nuthouse 23/6, 2009 at 8:31

creating a file handler has nothing to do with the file's size. It's just a pointer. The act of opening a file doesn't imply reading its contents. – Freeloader 23/6, 2009 at 9:15

I think this is somewhere near my misunderstanding. How to open the same file once for both reading and writing when reading means the necessary operation of going it through to find possible matches? – Nuthouse 23/6, 2009 at 9:27

you have to read it once and only once. When you open it for writing you're not reading it at all. It doesn't matter how big the file was before you opened it for writing because you're discarding all of that anyway. – Freeloader 23/6, 2009 at 11:45

you might want to check out my Perl script, which is battle tested (used heavily in production), and has quite a lot of features, such as:

do multiple search-replace or query-search-replace operations
search-replace expressions can be given on the command line or read from a file processes multiple input files
recursively descend into directory and do multiple search/replace operations on all files
user defined perl expressions are applied to each line of each input file optionally run in paragraph mode (for multi-line search/replace)
interactive mode
batch mode
optionally backup files and backup numbering
preserve modes/owner when run as root
ignore symbolic links, empty files, write protected files, sockets, named pipes, and directory names
optionally replace lines only matching / not matching a given regular expression

https://github.com/tilo/replace_string

Megacycle answered 6/8, 2019 at 21:45 Comment(3)

-1 This is not an answer as you've not told OP how to solve the problem, but just pointed to your code instead. If you explained the key bit of your code that solves OP's query, that would be a better answer. – Periphery 13/10, 2021 at 8:55

@Periphery I provided a more general tool, that is also using Perl. Did you look at it's source code? you might find the answer there.. ;) Given that there is a handy tool for doing multi-file search/replace operations, it is better to use that, instead of trying to code it "by hand" – Megacycle 13/10, 2021 at 21:16

That would be fine if you explained in the answer how your general solution solves the problem. Links (or repositories) can break, then future readers are none the wiser as to how your general solution helps anyone to do multiline search+replaces. See also: Your answer is in another castle and this answer to a similar question. – Periphery 13/10, 2021 at 21:42

the combination of bash script + perl -pi -e is unbeatable - an example of bash function to directly type the search and replace strings before the EOF label :

# usage put into foobar.sh file, source foobar.sh file
# call directly into the shell do_multiline_srch_and_replace
do_multiline_srch_and_replace(){

                test -z $dir_to_work && {
         echo "You must export dir_to_work=<<the-dir>> - it is empty !!!"; exit 1;
      }
                test -d $dir_to_work || {
         echo "The dir to work on: \"$dir_to_work\" is not a dir !!!"; exit 1;
      }

                echo "INFO dir_to_work: $dir_to_work" ; sleep 1
                echo "INFO START :: searching and replacing in the non-binary files only"

                while read -r file ; do (
                        echo "DEBUG working on the following file: $file"

         # those pattern in the file names we want to skip usually - git, not , py files
         case "$file" in
            *.git*)
            continue ;;
            *node_modules*)
            continue ;;
            *.venv*)
            continue ;;
         esac
         # note the string should be exactly between the s|| and the replace str between the ||gs
         # the 'EOF' guarantees that no special chars from the shell will affect the result
                        perl -pi - <<'EOF' "$file"
BEGIN{undef $/;}
s|a multiline
string|the multiline
string to replace|gs
EOF
                );
                done < <(find $dir_to_work -type f -not -exec file {} \; | grep text | cut -d: -f1)

                echo "INFO STOP  :: search and replace in non-binary files"

}

Gateshead answered 18/12, 2021 at 17:43 Comment(0)

I know this has been answered but this is how I managed to solve this.

Let's say you wanted to change out a UUID but there must be a match on the line above because you have many UUID's that belong to other things.

perl call in a bash script in Ubuntu 20:

_UUID=$(uuidgen | sed 's/-//g')
export _UUID
perl -0777 -pi.back -e 's/(<stringProp\sname="Argument\.name">_BINARYVIDEOTEMPURL<\/stringProp>\n.*<stringProp\sname="Argument\.value">)[a-zA-Z0-9]{32}(<\/stringProp>)/$1$ENV{_UUID}$2/g;' test.txt

Your test.txt file reads like so: (not a valid XML I know but just create it)

<?xml version="1.0" encoding="UTF-8"?> <jmeterTestPlan version="1.2" properties="5.0" jmeter="5.2.1">
  <hashTree>
<TestPlan guiclass="TestPlanGui" testclass="TestPlan" testname="K8S Load Test Plan" enabled="true">      
  <stringProp name="TestPlan.user_define_classpath"></stringProp>
</TestPlan>
      <collectionProp name="Arguments.arguments">
        <elementProp name="_SESSIONID" elementType="Argument">
          <stringProp name="Argument.name">_SESSIONID</stringProp>
          <stringProp name="Argument.value">7c096b65-84b6-40c9-be93-a5891ec0394d</stringProp>
          <stringProp name="Argument.metadata">=</stringProp>
        </elementProp>
        <elementProp name="_BINARYVIDEOTEMPURL" elementType="Argument">
          <stringProp name="Argument.name">_BINARYVIDEOTEMPURL</stringProp>
          <stringProp name="Argument.value">64e1886127fa41c4a58e59fe2bb098e1</stringProp>
          <stringProp name="Argument.metadata">=</stringProp>
        </elementProp>
      </collectionProp>

So a lot is happening here but let me explain.

Create a new UUID to replace.
Export the UUID because perl will pick it up in the ENVIRONMENT variables.
Call perl to handle the search and replacement

-077 makes perl be able to use multiline and accomplish the multiline lookahead and behind. I couldn't tell you how perl works.
-pi.back basically inline editing and backing up the file.
-e is basically 's/reaplcethis/withthis/g' but it contains the regex with new lines needed to match. Plus it shows how to use ENVIRONMENT variables and grouping to recreate the string.

Anyways, hope this helps someone.

Amido answered 24/3, 2023 at 16:29 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags