Tola, resurrecting this question because it had a fairly simple regex solution that wasn't mentioned. This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."
The idea is to build an alternation (a series of |
) where the left sides match what we don't want in order to get it out of the way... then the last side of the |
matches what we do want, and captures it to Group 1. If Group 1 is set, you retrieve it and you have a match.
So what do we not want?
First, we want to eliminate the whole outer block if there is unwanted
between outer-start
and inner-start
. You can do it with:
outer-start(?:(?!inner-start).)*?unwanted.*?outer-end
This will be to the left of the first |
. It matches a whole outer block.
Second, we want to eliminate the whole outer block if there is unwanted
between inner-end
and outer-end
. You can do it with:
outer-start(?:(?!outer-end).)*?inner-end(?:(?!outer-end).)*?unwanted.*?outer-end
This will be the middle |
. It looks a bit complicated because we want to make sure that the "lazy" *?
does not jump over the end of a block into a different block.
Third, we match and capture what we want. This is:
inner-start\s*(text-that-i-want)\s*inner-end
So the whole regex, in free-spacing mode, is:
(?xs)
outer-start(?:(?!inner-start).)*?unwanted.*?outer-end # dont want this
| # OR (also don't want that)
outer-start(?:(?!outer-end).)*?inner-end(?:(?!outer-end).)*?unwanted.*?outer-end
| # OR capture what we want
inner-start\s*(text-that-i-want)\s*inner-end
On this demo, look at the Group 1 captures on the right: It contains what we want, and only for the right block.
In Perl and PCRE (used for instance in PHP), you don't even have to look at Group 1: you can force the regex to skip the two blocks we don't want. The regex becomes:
(?xs)
(?: # non-capture group: the things we don't want
outer-start(?:(?!inner-start).)*?unwanted.*?outer-end # dont want this
| # OR (also don't want that)
outer-start(?:(?!outer-end).)*?inner-end(?:(?!outer-end).)*?unwanted.*?outer-end
)
(*SKIP)(*F) # we don't want this, so fail and skip
| # OR capture what we want
inner-start\s*\Ktext-that-i-want(?=\s*inner-end)
See demo: it directly matches what you want.
The technique is explained in full detail in the question and article below.
Reference