How to print lines between two patterns, inclusive or exclusive (in sed, AWK or Perl)?
Asked Answered
B

9

101

I have a file like the following and I would like to print the lines between two given patterns PAT1 and PAT2.

1
2
PAT1
3    - first block
4
PAT2
5
6
PAT1
7    - second block
PAT2
8
9
PAT1
10    - third block

I have read How to select lines between two marker patterns which may occur multiple times with awk/sed but I am curious to see all the possible combinations of this, either including or excluding the pattern.

How can I print all lines between two patterns?

Baxie answered 16/8, 2016 at 10:40 Comment(5)
I am posting an attempt of canonical answer to How to select lines between two marker patterns which may occur multiple times with awk/sed so that all cases are covered. I follow It's OK to Ask and Answer Your Own Questions and posted the answer as Community Wiki, so feel free to improve it!Baxie
@Cyrus yes, thank you! I also checked this one before going ahead and posting this question/answer. The point here is to provide a set of tools on this, since the volume of comments (and votes to them) in my other answer lead me think that a generic post would be of good help to future readers.Baxie
See also thelinuxrain.com/articles/how-to-use-flags-in-awkTrebuchet
@fedorqui, I didn't hear back so I decided to have a go at improving the question to rank better on Google and clarifying what the scope is. Feel free to revert if you're not happy with it.Shelli
@Alex not sure where my comments back were expected, but in any case thanks for the edit! It looks fine to me. Thanks for taking the time on thisBaxie
B
156

Print lines between PAT1 and PAT2

$ awk '/PAT1/,/PAT2/' file
PAT1
3    - first block
4
PAT2
PAT1
7    - second block
PAT2
PAT1
10    - third block

Or, using variables:

awk '/PAT1/{flag=1} flag; /PAT2/{flag=0}' file

How does this work?

  • /PAT1/ matches lines having this text, as well as /PAT2/ does.
  • /PAT1/{flag=1} sets the flag when the text PAT1 is found in a line.
  • /PAT2/{flag=0} unsets the flag when the text PAT2 is found in a line.
  • flag is a pattern with the default action, which is to print $0: if flag is equal 1 the line is printed. This way, it will print all those lines occurring from the time PAT1 occurs and up to the next PAT2 is seen. This will also print the lines from the last match of PAT1 up to the end of the file.

Print lines between PAT1 and PAT2 - not including PAT1 and PAT2

$ awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag' file
3    - first block
4
7    - second block
10    - third block

This uses next to skip the line that contains PAT1 in order to avoid this being printed.

This call to next can be dropped by reshuffling the blocks: awk '/PAT2/{flag=0} flag; /PAT1/{flag=1}' file.

Print lines between PAT1 and PAT2 - including PAT1

$ awk '/PAT1/{flag=1} /PAT2/{flag=0} flag' file
PAT1
3    - first block
4
PAT1
7    - second block
PAT1
10    - third block

By placing flag at the very end, it triggers the action that was set on either PAT1 or PAT2: to print on PAT1, not to print on PAT2.

Print lines between PAT1 and PAT2 - including PAT2

$ awk 'flag; /PAT1/{flag=1} /PAT2/{flag=0}' file
3    - first block
4
PAT2
7    - second block
PAT2
10    - third block

By placing flag at the very beginning, it triggers the action that was set previously and hence print the closing pattern but not the starting one.

Print lines between PAT1 and PAT2 - excluding lines from the last PAT1 to the end of file if no other PAT2 occurs

This is based on a solution by Ed Morton.

awk 'flag{
        if (/PAT2/)
           {printf "%s", buf; flag=0; buf=""}
        else
            buf = buf $0 ORS
     }
     /PAT1/ {flag=1}' file

As a one-liner:

$ awk 'flag{ if (/PAT2/){printf "%s", buf; flag=0; buf=""} else buf = buf $0 ORS}; /PAT1/{flag=1}' file
3    - first block
4
7    - second block

# note the lack of third block, since no other PAT2 happens after it

This keeps all the selected lines in a buffer that gets populated from the moment PAT1 is found. Then, it keeps being filled with the following lines until PAT2 is found. In that point, it prints the stored content and empties the buffer.

Baxie answered 16/8, 2016 at 10:40 Comment(10)
is it shortest match ?Energy
@MukulAnand it depends on the caseBaxie
how about if I want to print one word/column from lines in a file between patterns? here's one answer echo "n" | yum update | awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag{ print $5 }'Straka
Can I do grep over this awk? Like: $ awk '/PAT1/,/PAT2/' | grep "XYZ" ?Vacuva
how to do it, if lines between PART1 and PART2 are extracted only once using awk?Percolator
@Percolator just pipe normally: awk '... your things ...' | awk 'this awk'Baxie
I'd appreciate a solution where PAT1 may repeat before PAT2 is encountered, and all we want is what is non-greedy between the PAT1 which is closest to the PAT2. i.e. PAT1(some text we don't want)PAT1(some text we do want)PAT2Xantha
Doesn't work: awk: cmd. line:1: /<p>/{flag=1; next}/</p>/{flag=0} flag awk: cmd. line:1: ^ unterminated regexpWhitcomb
@Whitcomb can you post the exact command you used?Baxie
self-contained examples: inclusive: seq 10 | awk '/3/,/5/' and exclusive: seq 10 | awk '/3/{flag=1; next} /5/{flag=0} flag' and include start: seq 10 | awk '/3/{flag=1} /5/{flag=0} flag' and include end: seq 10 | awk 'flag; /3/{flag=1} /5/{flag=0}'Floaty
L
91

What about the classic sed solution?

Print lines between PAT1 and PAT2 - include PAT1 and PAT2

sed -n '/PAT1/,/PAT2/p' FILE

Print lines between PAT1 and PAT2 - exclude PAT1 and PAT2

GNU sed
sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p}}' FILE
Any sed1
sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p;};}' FILE

or even (Thanks Sundeep):

GNU sed
sed -n '/PAT1/,/PAT2/{//!p}' FILE
Any sed
sed -n '/PAT1/,/PAT2/{//!p;}' FILE

Print lines between PAT1 and PAT2 - include PAT1 but not PAT2

The following includes just the range start:

GNU sed
sed -n '/PAT1/,/PAT2/{/PAT2/!p}' FILE
Any sed
sed -n '/PAT1/,/PAT2/{/PAT2/!p;}' FILE

Print lines between PAT1 and PAT2 - include PAT2 but not PAT1

The following includes just the range end:

GNU sed
sed -n '/PAT1/,/PAT2/{/PAT1/!p}' FILE
Any sed
sed -n '/PAT1/,/PAT2/{/PAT1/!p;}' FILE

1 Note about BSD/Mac OS X sed

A command like this here:

sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p}}' FILE

Would emit an error:

▶ sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p}}' FILE
sed: 1: "/PAT1/,/PAT2/{/PAT1/!{/ ...": extra characters at the end of p command

For this reason this answer has been edited to include BSD and GNU versions of the one-liners.

Lenna answered 16/8, 2016 at 14:55 Comment(12)
Hey, the classic is even shorter!Storeroom
What about the case of the starting line also matching the end pattern (but perhaps not vice-versa)? That would break your 3rd case at least.Aeolipile
Then the start and end pattern is not well chosen or the regex need to be more precise.Lenna
not sure about other versions, but with GNU sed, the first one can be simplified to sed -n '/PAT1/,/PAT2/{//!p}' file ... from manual empty regular expression ‘//’ repeats the last regular expression matchIdea
@Idea That's for the hint. POSIX says: If an RE is empty (that is, no pattern is specified) sed shall behave as if the last RE used in the last command applied (either as an address or as part of a substitute command) was specified. Looks like the only remaining question here is how to interpret the last RE. BSD is saying something to this. Look here (Point 23): github.com/freebsd/freebsd/blob/master/usr.bin/sed/POSIXLenna
@Lenna thanks for additional info... so if I understood correctly, /PAT1/,/PAT2/{//!p} will work only if last RE is dynamic.. if it was static, // would resolve to /PAT2/Idea
Looks like. Hard to find an incompatible version to prove that. :)Lenna
Note there is a new answer suggesting improvements to this one.Baxie
@fedorqui, there's my best go at it.Shelli
@AlexHarvey I think it is a great example of kindness what you did here, by sharing your knowledge to improve other answers. Ultimately, this was my goal when I posted this question, so we could have a canonical (yet another one :P) set of sources. Many thanks!Baxie
@AlexHarvey Let me share my view on this: I once answered How to select lines between two marker patterns which may occur multiple times... and kept getting quite a lot of comments asking for similar cases. Also, when being active in these tags I felt that I was reusing the same one-liners over and over again. For this I thought that a question-answer covering most of the cases could be useful. +25 stars, +30 votes, ~30K visits, lots of duplicates to this seem to agree with this. Of course it is not comprehensive but it seems to be working well.Baxie
If you compose the sed command from variables, then you must create it from a a combination of parts, some in single quotes, and some in double quotes. Something like this: sed -n "/$pattern1/,/$pattern2/"'{//!p}'. The bash shell will not expand the variables if they are in single quotes. But if you contain the whole command in double quotes, bash will interpret ! as a history command, and will expand it. So that part of the sed command must be in single quotes.Shea
K
13

Using grep with PCRE (where available) to print markers and lines between markers:

$ grep -Pzo "(?s)(PAT1(.*?)(PAT2|\Z))" file
PAT1
3    - first block
4
PAT2
PAT1
7    - second block
PAT2
PAT1
10    - third block
  • -P perl-regexp, PCRE. Not in all grep variants
  • -z Treat the input as a set of lines, each terminated by a zero byte instead of a newline
  • -o print only matching
  • (?s) DotAll, ie. dot finds newlines as well
  • (.*?) nongreedy find
  • \Z Match only at end of string, or before newline at the end

Print lines between markers excluding end marker:

$ grep -Pzo "(?s)(PAT1(.*?)(?=(\nPAT2|\Z)))" file
PAT1
3    - first block
4
PAT1
7    - second block
PAT1
10    - third block
  • (.*?)(?=(\nPAT2|\Z)) nongreedy find with lookahead for \nPAT2 and \Z

Print lines between markers excluding markers:

$ grep -Pzo "(?s)((?<=PAT1\n)(.*?)(?=(\nPAT2|\Z)))" file
3    - first block
4
7    - second block
10    - third block
  • (?<=PAT1\n) positive lookbehind for PAT1\n

Print lines between markers excluding start marker:

$ grep -Pzo "(?s)((?<=PAT1\n)(.*?)(PAT2|\Z))" file
3    - first block
4
PAT2
7    - second block
PAT2
10    - third block
Kolva answered 16/8, 2016 at 13:10 Comment(1)
Liking this because it works with regex patterns. The chosen solution did not.Egyptian
S
10

For completeness, here is a Perl solution:

Print lines between PAT1 and PAT2 - include PAT1 and PAT2

perl -ne '/PAT1/../PAT2/ and print' FILE

or:

perl -ne 'print if /PAT1/../PAT2/' FILE

Print lines between PAT1 and PAT2 - exclude PAT1 and PAT2

perl -ne '/PAT1/../PAT2/ and !/PAT1/ and !/PAT2/ and print' FILE

or:

perl -ne 'if (/PAT1/../PAT2/) {print unless /PAT1/ or /PAT2/}' FILE 

Print lines between PAT1 and PAT2 - exclude PAT1 only

perl -ne '/PAT1/../PAT2/ and !/PAT1/ and print' FILE

Print lines between PAT1 and PAT2 - exclude PAT2 only

perl -ne '/PAT1/../PAT2/ and !/PAT2/ and print' FILE

See also:

  • Range operator section in perldoc perlop for more on the /PAT1/../PAT2/ grammar:

Range operator

...In scalar context, ".." returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors.

  • For the -n option, see perldoc perlrun, which makes Perl behave like sed -n.

  • Perl Cookbook, 6.8 for a detailed discussion of extracting a range of lines.

Shelli answered 20/4, 2019 at 12:16 Comment(0)
G
8

Here is another approach

Include both patterns (default)

$ awk '/PAT1/,/PAT2/' file
PAT1
3    - first block
4
PAT2
PAT1
7    - second block
PAT2
PAT1
10    - third block

Mask both patterns

$ awk '/PAT1/,/PAT2/{if(/PAT2|PAT1/) next; print}' file
3    - first block
4
7    - second block
10    - third block

Mask start pattern

$ awk '/PAT1/,/PAT2/{if(/PAT1/) next; print}' file
3    - first block
4
PAT2
7    - second block
PAT2
10    - third block

Mask end pattern

$ awk '/PAT1/,/PAT2/{if(/PAT2/) next; print}' file
PAT1
3    - first block
4
PAT1
7    - second block
PAT1
10    - third block
Gourd answered 16/8, 2016 at 14:29 Comment(0)
I
7

Alternatively:

sed '/START/,/END/!d;//d'

This deletes all lines except for those between and including START and END, then the //d deletes the START and END lines since // causes sed to use the previous patterns.

Ivanna answered 2/2, 2017 at 18:13 Comment(0)
D
5

This is like a foot-note to the 2 top answers above (awk & sed). I needed to run it on a large number of files, and hence performance was important. I put the 2 answers to a load-test of 10000 times:

sedTester.sh

for i in `seq 10000`;do sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p;};}' patternTester >> sedTesterOutput; done

awkTester.sh

 for i in `seq 10000`;do awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag' patternTester >> awkTesterOutput; done

Here are the results:

zsh sedTester.sh  11.89s user 39.63s system 81% cpu 1:02.96 total
zsh awkTester.sh  38.73s user 60.64s system 79% cpu 2:04.83 total

sed solutions seems to be twice as fast as the awk solution (Mac OS).

Dolmen answered 26/10, 2019 at 6:29 Comment(0)
S
4

You can do what you want with sed by suppressing the normal printing of pattern space with -n. For instance to include the patterns in the result you can do:

$ sed -n '/PAT1/,/PAT2/p' filename
PAT1
3    - first block
4
PAT2
PAT1
7    - second block
PAT2
PAT1
10    - third block

To exclude the patterns and just print what is between them:

$ sed -n '/PAT1/,/PAT2/{/PAT1/{n};/PAT2/{d};p}' filename
3    - first block
4
7    - second block
10    - third block

Which breaks down as

  • sed -n '/PAT1/,/PAT2/ - locate the range between PAT1 and PAT2 and suppress printing;

  • /PAT1/{n}; - if it matches PAT1 move to n (next) line;

  • /PAT2/{d}; - if it matches PAT2 delete line;

  • p - print all lines that fell within /PAT1/,/PAT2/ and were not skipped or deleted.

Storeroom answered 16/8, 2016 at 15:10 Comment(2)
Thanks for the interesting one-liners and its breakdown! I have to admit I still prefer awk, it looks clearer to me :)Baxie
I got done sorting through this one only to find hek2mgl had a shorter way -- take a look at his classic sed solution.Storeroom
S
3

This might work for you (GNU sed) on the proviso that PAT1 and PAT2 are on separate lines:

sed -n '/PAT1/{:a;N;/PAT2/!ba;p}' file

Turn off implicit printing by using the -n option and act like grep.

N.B. All solutions using the range idiom i.e. /PAT1/,/PAT2/ command suffer from the same edge case, where PAT1 exists but PAT2 does not and therefore will print from PAT1 to the end of the file.

For completeness:

# PAT1 to PAT2 without PAT1
sed -n '/PAT1/{:a;N;/PAT2/!ba;s/^[^\n]*\n//p}' file 

# PAT1 to PAT2 without PAT2
sed -n '/PAT1/{:a;N;/PAT2/!ba;s/\n[^\n]*$//p}' file 

# PAT1 to PAT2 without PAT1 and PAT2   
sed -n '/PAT1/{:a;N;/PAT2/!ba;/\n.*\n/!d;s/^[^\n]*\n\|\n[^\n]*$/gp}' file

N.B. In the last solution PAT1 and PAT2 may be on consecutive lines and therefore a further edge case may arise. IMO both are deleted and nothing printed.

Strick answered 23/11, 2020 at 14:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.