sed - Get only the replaced string from a multiline input & omit unmatched lines!
Asked Answered
D

4

25

I want sed to omit all non-matching lines, and only output the replaced string (of the single/multiple intended line/s).

In other words: I've got a hay stack, and only want the needle returned, not all the hay which was searched and which remained unaltered.

Or again in other words: Search/replace a RegEx described string in a multi line string, and only get that string returned. (As it is possible with the PHP function http://www.php.net/manual/en/function.preg-replace.php )

My current workaround is to first filter with grep, and then pipe only the matching lines into sed for replacement:

echo -e "Bla\nBla\nImportant1: One \nBla\nImportant2: Two\nBla\nBla" | egrep "^Important1.*$" | sed -E "s/^Important1: *\b(.*)\b */\1/g"
# From the multiple line input I only want the "One One" (with pre/post whitespace removed, hence matching the word boundaries with "\b")
# And I want no "Bla bla" lines in the result!

But I'd like to have a single solution within sed. Or is this out of intended sed usage, and should I therefore better use something else? Btw, issue: multiline sed using backreferences seemed somehow related, but I am not sure!

Digitalism answered 4/4, 2011 at 21:3 Comment(0)
T
20

Following solution has been tested on both Mac & Linux.

You can use sed like this:

echo -e "Bla\nBla\nImportant1: One \nBla\nImportant2: Two\nBla\nBla" |
   sed -n 's/^Important1: *\([^ ]*\) */\1/p'

OUTPUT:

one

Explanation

sed -n 's/^Important1: *\([^ ]*\) */\1/p'

-n # quiet / silent 

{
  s/^Important1: *\([^\ ]*\) */\1/ # replace "Important1: one " with 1st group i.e. "one"
  p                  # print the replaced text
}
Thorough answered 4/4, 2011 at 21:26 Comment(9)
This solution also works! Could you please explain how and why? Thanks!Digitalism
@Digitalism -1 to this answer. the h;g; is completely unnecessary, and can be removed and you get the same answer (mine)! It's akin to X = Y + 1 - 1, just write X=Y! In fact, why stop at one pair of h;g;, why not sed -nE "/^Important1: /{h;g;h;g;h;g;h;g;h;g;s/^Important1: *\b(.*)\b *$/\1/;p}" which also gives the same answer. `Galvanism
@SiegeX: Thanks for pointing out extra h:g; in the sed command of my answer however if you try other solutions they simply DON'T WORK under Mac OS but my edited answer above works both on Mac and Linux.Thorough
@anubhava: -E must be some sort of mac extension because it's not part of posix sed. In either case, you don't need -E if you just escape out the parentheses and used single quotes, not double quotes to surround the expression (always use single quotes). So in your completely revised answer it would be sed -n 's/^Important1: *\([^ ]*\) */\1/p'. But now we are back to Christian's answer.Galvanism
@SiegeX: Thanks a lot, edited the answer to remove -E as well.Thorough
@anubhava: You can also remove the backslash for the space in the character class. Space is not special to regex so it doesn't need escaping. In the future (or now if you want), it is much better to upvote others answers that got their first than to duplicate your own. Your revised, revised answer is now 99% the same as Christian'sGalvanism
@SiegeX: I have removed extra \ however if you notice other solutions still don't work on Mac OS.Thorough
@SiegeX: No, -E is not a mac extension. It is a GNU extension, but now also part of POSIX. -E stands for Extended Regular Expression. In ERE, parentheses are special by default, and backslashes suppress their grouping and catching effects, instead of enabling them. austingroupbugs.net/view.php?id=528Underbred
Wow, looks like lots of work went into exactly the problem I was having (and what I suspect is quite a common problem). Thanks, everyone, for persisting and finding a solution, and making it so easy for me, many years later. ☺Fassold
G
7
sed -n '/^Important1.*$/s/^Important1: *\b\(.*\)\b */\1/p'

Proof of Concept

$ echo -e "Bla\nBla\nImportant1: One \nBla\nImportant2: Two\nBla\nBla" | sed -n '/^Important1.*$/s/^Important1: *\b\(.*\)\b */\1/p'
One
Galvanism answered 4/4, 2011 at 21:9 Comment(6)
As I just noted, you can skip the match before the substitution, because the p flag on the s will only print substituted lines (those that start with Important1:).Cyrenaic
@SiegeX: This works fine for me! The differences to the version from @Christian Semrau: 1) Omitted the -e argument 2) Omitted /^Important1.*$/ 3) Escaped the brackets with backslashes \( Could you eleborate, WHY it works then? Thanks!Digitalism
This solution is correct! Could someone please vote it up, as I have to few reputation points myself.Digitalism
@porg: if this answer solved your problem, could you please accept it by clicking the checkmark next to the vote count.Galvanism
@porg: 1) Some (all newer?) sed versions accept the omitted -e flag, because they can infer that -e is the only sensible option before the script argument. 2) As I noted, the match is not required in this case, I don't know if there is a performance difference. 3) That was my mistake, sed requires the backslashes.Cyrenaic
@Christian Semrau: Thanks for the explanations!Digitalism
C
7

This sed command does what your combination of egrep and sed does:

echo -e "Bla\nBla\nImportant1: One \nBla\nImportant2: Two\nBla\nBla"
| sed -n -e "s/^Important1: *\b\(.*\)\b */\1/p"

You perform the substitution and only print lines that matched, after the substitution.

Cyrenaic answered 4/4, 2011 at 21:9 Comment(3)
Sorry, doesn't work for me! Mac OS X 10.5.8 built in sed returns: sed: 1: "s/^Important1: *\b(.*)\ ...": \1 not defined in the RE Mac OS X 10.5.8 with fink installed GNU sed 4.2.1 returns: sed: -e expression #1, char 31: invalid reference \1 on `s' command's RHSDigitalism
My mistake, sed needs to escape the grouping brackets: "s/^Important1: *\b\(.*\)\b */\1/p" (changed it in the answer).Cyrenaic
@porg: you should change your choice and accept this answer instead. The one you chose may have a decent explanation attached to it but the answer is just plain wrong as I noted in my comment under that answer. When you pull out the unnecessary h;g; portion you get back my answer, and Christian's answer is better than mine because my answer has an unnecessary portion in it as well, although not as egregious as h;gGalvanism
K
1

In order to keep your original expression:

sed -E "s/^Important1: *\b(.*)\b */\1/g"

you can use the -n option for sed and add the p flag to the end of your s command like this:

sed -En "s/^Important1: *\b(.*)\b */\1/gp"

proof:

echo -e "Bla\nBla\nImportant1: One \nBla\nImportant2: Two\nBla\nBla" | sed -En "s/^Important1: *\b(.*)\b */\1/gp"

The s command uses the following format:

sed OPTIONS... 's/regexp/replacement/flags'

The -n or --silent option suppresses automatic printing of pattern space 1.

The p flag is used to print the new pattern space if a substitution was made2.

Kumquat answered 22/5, 2020 at 4:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.