How to print lines between two patterns, inclusive or exclusive (in sed, AWK or Perl)?

Asked 16/8, 2016 at 10:40 Answered 23/11, 2020 at 14:50

Solved shell perl awk sed pattern-matching

101

I have a file like the following and I would like to print the lines between two given patterns PAT1 and PAT2.

1
2
PAT1
3    - first block
4
PAT2
5
6
PAT1
7    - second block
PAT2
8
9
PAT1
10    - third block

I have read How to select lines between two marker patterns which may occur multiple times with awk/sed but I am curious to see all the possible combinations of this, either including or excluding the pattern.

How can I print all lines between two patterns?

Baxie answered 16/8, 2016 at 10:40 Comment(5)

I am posting an attempt of canonical answer to How to select lines between two marker patterns which may occur multiple times with awk/sed so that all cases are covered. I follow It's OK to Ask and Answer Your Own Questions and posted the answer as Community Wiki, so feel free to improve it! – Baxie 16/8, 2016 at 10:41

@Cyrus yes, thank you! I also checked this one before going ahead and posting this question/answer. The point here is to provide a set of tools on this, since the volume of comments (and votes to them) in my other answer lead me think that a generic post would be of good help to future readers. – Baxie 16/8, 2016 at 10:49

See also thelinuxrain.com/articles/how-to-use-flags-in-awk – Trebuchet 16/8, 2016 at 23:18

@fedorqui, I didn't hear back so I decided to have a go at improving the question to rank better on Google and clarifying what the scope is. Feel free to revert if you're not happy with it. – Shelli 20/4, 2019 at 12:47

@Alex not sure where my comments back were expected, but in any case thanks for the edit! It looks fine to me. Thanks for taking the time on this – Baxie 20/4, 2019 at 21:57

156

Print lines between PAT1 and PAT2

$ awk '/PAT1/,/PAT2/' file
PAT1
3    - first block
4
PAT2
PAT1
7    - second block
PAT2
PAT1
10    - third block

Or, using variables:

awk '/PAT1/{flag=1} flag; /PAT2/{flag=0}' file

How does this work?

/PAT1/ matches lines having this text, as well as /PAT2/ does.
/PAT1/{flag=1} sets the flag when the text PAT1 is found in a line.
/PAT2/{flag=0} unsets the flag when the text PAT2 is found in a line.
flag is a pattern with the default action, which is to print $0: if flag is equal 1 the line is printed. This way, it will print all those lines occurring from the time PAT1 occurs and up to the next PAT2 is seen. This will also print the lines from the last match of PAT1 up to the end of the file.

Print lines between PAT1 and PAT2 - not including PAT1 and PAT2

$ awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag' file
3    - first block
4
7    - second block
10    - third block

This uses next to skip the line that contains PAT1 in order to avoid this being printed.

This call to next can be dropped by reshuffling the blocks: awk '/PAT2/{flag=0} flag; /PAT1/{flag=1}' file.

Print lines between PAT1 and PAT2 - including PAT1

$ awk '/PAT1/{flag=1} /PAT2/{flag=0} flag' file
PAT1
3    - first block
4
PAT1
7    - second block
PAT1
10    - third block

By placing flag at the very end, it triggers the action that was set on either PAT1 or PAT2: to print on PAT1, not to print on PAT2.

Print lines between PAT1 and PAT2 - including PAT2

$ awk 'flag; /PAT1/{flag=1} /PAT2/{flag=0}' file
3    - first block
4
PAT2
7    - second block
PAT2
10    - third block

By placing flag at the very beginning, it triggers the action that was set previously and hence print the closing pattern but not the starting one.

Print lines between PAT1 and PAT2 - excluding lines from the last PAT1 to the end of file if no other PAT2 occurs

This is based on a solution by Ed Morton.

awk 'flag{
        if (/PAT2/)
           {printf "%s", buf; flag=0; buf=""}
        else
            buf = buf $0 ORS
     }
     /PAT1/ {flag=1}' file

As a one-liner:

$ awk 'flag{ if (/PAT2/){printf "%s", buf; flag=0; buf=""} else buf = buf $0 ORS}; /PAT1/{flag=1}' file
3    - first block
4
7    - second block

# note the lack of third block, since no other PAT2 happens after it

This keeps all the selected lines in a buffer that gets populated from the moment PAT1 is found. Then, it keeps being filled with the following lines until PAT2 is found. In that point, it prints the stored content and empties the buffer.

Baxie answered 16/8, 2016 at 10:40 Comment(10)

is it shortest match ? – Energy 9/10, 2019 at 9:34

@MukulAnand it depends on the case – Baxie 9/10, 2019 at 11:5

how about if I want to print one word/column from lines in a file between patterns? here's one answer echo "n" | yum update | awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag{ print $5 }' – Straka 4/8, 2020 at 14:44

Can I do grep over this awk? Like: $ awk '/PAT1/,/PAT2/' | grep "XYZ" ? – Vacuva 12/10, 2020 at 4:21

how to do it, if lines between PART1 and PART2 are extracted only once using awk? – Percolator 26/10, 2020 at 3:26

@Percolator just pipe normally: awk '... your things ...' | awk 'this awk' – Baxie 26/10, 2020 at 6:12

I'd appreciate a solution where PAT1 may repeat before PAT2 is encountered, and all we want is what is non-greedy between the PAT1 which is closest to the PAT2. i.e. PAT1(some text we don't want)PAT1(some text we do want)PAT2 – Xantha 18/12, 2020 at 23:33

Doesn't work: awk: cmd. line:1: /<p>/{flag=1; next}/</p>/{flag=0} flag awk: cmd. line:1: ^ unterminated regexp – Whitcomb 29/1, 2023 at 4:20

@Whitcomb can you post the exact command you used? – Baxie 30/1, 2023 at 18:27

self-contained examples: inclusive: seq 10 | awk '/3/,/5/' and exclusive: seq 10 | awk '/3/{flag=1; next} /5/{flag=0} flag' and include start: seq 10 | awk '/3/{flag=1} /5/{flag=0} flag' and include end: seq 10 | awk 'flag; /3/{flag=1} /5/{flag=0}' – Floaty 7/2 at 18:5

What about the classic sed solution?

Print lines between PAT1 and PAT2 - include PAT1 and PAT2

sed -n '/PAT1/,/PAT2/p' FILE

Print lines between PAT1 and PAT2 - exclude PAT1 and PAT2

GNU sed

sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p}}' FILE

Any sed¹

sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p;};}' FILE

or even (Thanks Sundeep):

GNU sed

sed -n '/PAT1/,/PAT2/{//!p}' FILE

Any sed

sed -n '/PAT1/,/PAT2/{//!p;}' FILE

Print lines between PAT1 and PAT2 - include PAT1 but not PAT2

The following includes just the range start:

GNU sed

sed -n '/PAT1/,/PAT2/{/PAT2/!p}' FILE

Any sed

sed -n '/PAT1/,/PAT2/{/PAT2/!p;}' FILE

Print lines between PAT1 and PAT2 - include PAT2 but not PAT1

The following includes just the range end:

GNU sed

sed -n '/PAT1/,/PAT2/{/PAT1/!p}' FILE

Any sed

sed -n '/PAT1/,/PAT2/{/PAT1/!p;}' FILE

¹ Note about BSD/Mac OS X sed

A command like this here:

sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p}}' FILE

Would emit an error:

▶ sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p}}' FILE
sed: 1: "/PAT1/,/PAT2/{/PAT1/!{/ ...": extra characters at the end of p command

For this reason this answer has been edited to include BSD and GNU versions of the one-liners.

Lenna answered 16/8, 2016 at 14:55 Comment(12)

Hey, the classic is even shorter! – Storeroom 16/8, 2016 at 15:15

What about the case of the starting line also matching the end pattern (but perhaps not vice-versa)? That would break your 3rd case at least. – Aeolipile 8/1, 2017 at 12:22

Then the start and end pattern is not well chosen or the regex need to be more precise. – Lenna 8/1, 2017 at 13:12

not sure about other versions, but with GNU sed, the first one can be simplified to sed -n '/PAT1/,/PAT2/{//!p}' file ... from manual empty regular expression ‘//’ repeats the last regular expression match – Idea 20/6, 2017 at 9:42

@Idea That's for the hint. POSIX says:

If an RE is empty (that is, no pattern is specified) sed shall behave as if the last RE used in the last command applied (either as an address or as part of a substitute command) was specified.

Looks like the only remaining question here is how to interpret the last RE. BSD is saying something to this. Look here (Point 23): github.com/freebsd/freebsd/blob/master/usr.bin/sed/POSIX – Lenna 20/6, 2017 at 13:4

@Lenna thanks for additional info... so if I understood correctly, /PAT1/,/PAT2/{//!p} will work only if last RE is dynamic.. if it was static, // would resolve to /PAT2/ – Idea 20/6, 2017 at 13:12

Looks like. Hard to find an incompatible version to prove that. :) – Lenna 20/6, 2017 at 13:16

Note there is a new answer suggesting improvements to this one. – Baxie 18/4, 2019 at 5:48

@fedorqui, there's my best go at it. – Shelli 18/4, 2019 at 13:54

@AlexHarvey I think it is a great example of kindness what you did here, by sharing your knowledge to improve other answers. Ultimately, this was my goal when I posted this question, so we could have a canonical (yet another one :P) set of sources. Many thanks! – Baxie 18/4, 2019 at 14:0

@AlexHarvey Let me share my view on this: I once answered How to select lines between two marker patterns which may occur multiple times... and kept getting quite a lot of comments asking for similar cases. Also, when being active in these tags I felt that I was reusing the same one-liners over and over again. For this I thought that a question-answer covering most of the cases could be useful. +25 stars, +30 votes, ~30K visits, lots of duplicates to this seem to agree with this. Of course it is not comprehensive but it seems to be working well. – Baxie 18/4, 2019 at 21:42

If you compose the sed command from variables, then you must create it from a a combination of parts, some in single quotes, and some in double quotes. Something like this: sed -n "/$pattern1/,/$pattern2/"'{//!p}'. The bash shell will not expand the variables if they are in single quotes. But if you contain the whole command in double quotes, bash will interpret ! as a history command, and will expand it. So that part of the sed command must be in single quotes. – Shea 6/6, 2023 at 13:41

Using grep with PCRE (where available) to print markers and lines between markers:

$ grep -Pzo "(?s)(PAT1(.*?)(PAT2|\Z))" file
PAT1
3    - first block
4
PAT2
PAT1
7    - second block
PAT2
PAT1
10    - third block

-P perl-regexp, PCRE. Not in all grep variants
-z Treat the input as a set of lines, each terminated by a zero byte instead of a newline
-o print only matching
(?s) DotAll, ie. dot finds newlines as well
(.*?) nongreedy find
\Z Match only at end of string, or before newline at the end

Print lines between markers excluding end marker:

$ grep -Pzo "(?s)(PAT1(.*?)(?=(\nPAT2|\Z)))" file
PAT1
3    - first block
4
PAT1
7    - second block
PAT1
10    - third block

(.*?)(?=(\nPAT2|\Z)) nongreedy find with lookahead for \nPAT2 and \Z

Print lines between markers excluding markers:

$ grep -Pzo "(?s)((?<=PAT1\n)(.*?)(?=(\nPAT2|\Z)))" file
3    - first block
4
7    - second block
10    - third block

(?<=PAT1\n) positive lookbehind for PAT1\n

Print lines between markers excluding start marker:

$ grep -Pzo "(?s)((?<=PAT1\n)(.*?)(PAT2|\Z))" file
3    - first block
4
PAT2
7    - second block
PAT2
10    - third block

Kolva answered 16/8, 2016 at 13:10 Comment(1)

Liking this because it works with regex patterns. The chosen solution did not. – Egyptian 23/8, 2023 at 11:21

For completeness, here is a Perl solution:

Print lines between PAT1 and PAT2 - include PAT1 and PAT2

perl -ne '/PAT1/../PAT2/ and print' FILE

or:

perl -ne 'print if /PAT1/../PAT2/' FILE

Print lines between PAT1 and PAT2 - exclude PAT1 and PAT2

perl -ne '/PAT1/../PAT2/ and !/PAT1/ and !/PAT2/ and print' FILE

or:

perl -ne 'if (/PAT1/../PAT2/) {print unless /PAT1/ or /PAT2/}' FILE

Print lines between PAT1 and PAT2 - exclude PAT1 only

perl -ne '/PAT1/../PAT2/ and !/PAT1/ and print' FILE

Print lines between PAT1 and PAT2 - exclude PAT2 only

perl -ne '/PAT1/../PAT2/ and !/PAT2/ and print' FILE

sedTester.sh

for i in `seq 10000`;do sed -n '/PAT1/,/PAT2/{/PAT1/!{/PAT2/!p;};}' patternTester >> sedTesterOutput; done

awkTester.sh

 for i in `seq 10000`;do awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag' patternTester >> awkTesterOutput; done

Here are the results:

zsh sedTester.sh  11.89s user 39.63s system 81% cpu 1:02.96 total
zsh awkTester.sh  38.73s user 60.64s system 79% cpu 2:04.83 total

sed solutions seems to be twice as fast as the awk solution (Mac OS).

Dolmen answered 26/10, 2019 at 6:29 Comment(0)

You can do what you want with sed by suppressing the normal printing of pattern space with -n. For instance to include the patterns in the result you can do:

$ sed -n '/PAT1/,/PAT2/p' filename
PAT1
3    - first block
4
PAT2
PAT1
7    - second block
PAT2
PAT1
10    - third block

To exclude the patterns and just print what is between them:

$ sed -n '/PAT1/,/PAT2/{/PAT1/{n};/PAT2/{d};p}' filename
3    - first block
4
7    - second block
10    - third block

Which breaks down as

sed -n '/PAT1/,/PAT2/ - locate the range between PAT1 and PAT2 and suppress printing;
/PAT1/{n}; - if it matches PAT1 move to n (next) line;
/PAT2/{d}; - if it matches PAT2 delete line;
p - print all lines that fell within /PAT1/,/PAT2/ and were not skipped or deleted.

Storeroom answered 16/8, 2016 at 15:10 Comment(2)

Thanks for the interesting one-liners and its breakdown! I have to admit I still prefer awk, it looks clearer to me :) – Baxie 16/8, 2016 at 15:17

I got done sorting through this one only to find hek2mgl had a shorter way -- take a look at his classic sed solution. – Storeroom 16/8, 2016 at 15:19

This might work for you (GNU sed) on the proviso that PAT1 and PAT2 are on separate lines:

sed -n '/PAT1/{:a;N;/PAT2/!ba;p}' file

Turn off implicit printing by using the -n option and act like grep.

N.B. All solutions using the range idiom i.e. /PAT1/,/PAT2/ command suffer from the same edge case, where PAT1 exists but PAT2 does not and therefore will print from PAT1 to the end of the file.

For completeness:

# PAT1 to PAT2 without PAT1
sed -n '/PAT1/{:a;N;/PAT2/!ba;s/^[^\n]*\n//p}' file 

# PAT1 to PAT2 without PAT2
sed -n '/PAT1/{:a;N;/PAT2/!ba;s/\n[^\n]*$//p}' file 

# PAT1 to PAT2 without PAT1 and PAT2   
sed -n '/PAT1/{:a;N;/PAT2/!ba;/\n.*\n/!d;s/^[^\n]*\n\|\n[^\n]*$/gp}' file

N.B. In the last solution PAT1 and PAT2 may be on consecutive lines and therefore a further edge case may arise. IMO both are deleted and nothing printed.

Strick answered 23/11, 2020 at 14:50 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Print lines between PAT1 and PAT2

Print lines between PAT1 and PAT2 - not including PAT1 and PAT2

Print lines between PAT1 and PAT2 - including PAT1

Print lines between PAT1 and PAT2 - including PAT2

Print lines between PAT1 and PAT2 - excluding lines from the last PAT1 to the end of file if no other PAT2 occurs

Print lines between PAT1 and PAT2 - include PAT1 and PAT2

Print lines between PAT1 and PAT2 - exclude PAT1 and PAT2

Print lines between PAT1 and PAT2 - include PAT1 but not PAT2

Print lines between PAT1 and PAT2 - include PAT2 but not PAT1

Print lines between PAT1 and PAT2 - include PAT1 and PAT2

Print lines between PAT1 and PAT2 - exclude PAT1 and PAT2

Print lines between PAT1 and PAT2 - exclude PAT1 only

Print lines between PAT1 and PAT2 - exclude PAT2 only

sedTester.sh

awkTester.sh

Recommended topics

Hot tags