Why does a space cause the remembered pattern in sed to output different things
Asked Answered
N

1

6

I'm trying to get the value of the value entry in this xml line via terminal so I'm using sed.

abcs='<param name="abc" value="bob3" no_but_why="4"/>'

echo $abcs | sed -e 's/.*value="\(.*\)" .*/\1/'
echo $abcs | sed -e 's/.*value="\(.*\)".*/\1/'

The output is:

bob3
bob3" no_but_why="4

Why does the second way without the space cause more than just what I wanted to be printed out? Why would the \1 be affected by that

Necaise answered 11/7, 2016 at 20:32 Comment(0)
H
7

As you can see difference is use of greedy pattern .* in second regex after " without space.

Reason why it is behaving differently because there is a double quote after no_but_why= as well and .* being a greedy pattern is matching until last " before /> in second regex.

In your first regex "\(.*\)" is matching only "bob3" because there is a space after this which makes regex engine prevent .* matching till last double quote in input.

To avoid this situation you should be using negated character class instead of greedy matching.

Consider these sed command examples:

sed -e 's/.*value="\([^"]*\)" .*/\1/' <<< "$abcs"
bob3

sed -e 's/.*value="\([^"]*\)".*/\1/' <<< "$abcs"
bob3

Now you can see both command are producing same output bob3 because negated character class [^"]* will match until it gets next " not till the very last " in input as the case with .*.

Helenehelenka answered 11/7, 2016 at 20:39 Comment(2)
So what do I do to make it only match the first such occurrence?Necaise
Ah I see, that makes sense!Necaise

© 2022 - 2024 — McMap. All rights reserved.