how do you specify non-capturing groups in sed?
Asked Answered
Z

4

69

is it possible to specify non-capturing groups in sed?

if so, how?

Parentheses in sed have two functions, grouping, and capturing.

So i'm asking about using parentheses to do the grouping, but without capturing. One might say non-capturing grouping parentheses. (non-capturing parantheses and that aren't literal). What are called non-capturing groups. Like i've seen the syntax (?:regex) for non-capturing groups, but it doesn't work in sed.

Linguistic Note- in the UK, the term brackets is used generally, for "round brackets" or "square brackets". In the UK, brackets usually refers to "( )", since "( )" are so common. And in the UK the term parentheses is hardly used. In the USA the term brackets are specifically "[ ]". So to prevent confusion to anybody in the USA, i've not used the words brackets in the question.

Zahn answered 28/1, 2011 at 1:19 Comment(3)
I am / was aware of the meaning alternating between literal and grouping-capturing, based on whether they are escaped or not, and that whether -r or not reverses it.Zahn
No sed that I know of supports non-capturing groups. However, Perl is readily available, and is probably the right answer if you really need them.Manifestative
@TobySpeight yeah you're rightZahn
Z
35

The answer, is that as of writing, you can't - sed does not support it.

Non-capturing groups have the syntax of (?:a) and are a PCRE syntax.

Sed supports BRE(Basic regular expressions), aka POSIX BRE, and if using GNU sed, there is the option -r that makes it support ERE(extended regular expressions) aka POSIX ERE, but still not PCRE)

Perl will work, for windows or linux

examples here

https://superuser.com/questions/416419/perl-for-matching-with-regular-expressions-in-terminal

e.g. this from cygwin in windows

$ echo -e 'abcd' | perl -0777 -pe 's/(a)(?:b)(c)(d)/\1/s'
a

$ echo -e 'abcd' | perl -0777 -pe 's/(a)(?:b)(c)(d)/\2/s'
c

There is a program albeit for Windows, which can do search and replace on the command line, and does support PCRE. It's called rxrepl. It's not sed of course, but it does search and replace with PCRE support.

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(c)" -r "\1"
a

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(c)" -r "\3"
c

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(?:c)" -r "\3"
Invalid match group requested.

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(?:b)(c)" -r "\2"
c

C:\blah\rxrepl>

The author(not me), mentioned his program in an answer over here https://superuser.com/questions/339118/regex-replace-from-command-line

It has a really good syntax.

The standard thing to use would be perl, or almost any other programming language that people use.

Zahn answered 11/4, 2016 at 10:39 Comment(1)
Not sure why you'd add a link to a windows-only answer to a question tagged Linux, but whatever floats your boat, I guess ... ;)Leonleona
T
36

Parentheses can be used for grouping alternatives. For example:

sed 's/a\(bc\|de\)f/X/'

says to replace "abcf" or "adef" with "X", but the parentheses also capture. There is not a facility in sed to do such grouping without also capturing. If you have a complex regex that does both alternative grouping and capturing, you will simply have to be careful in selecting the correct capture group in your replacement.

Perhaps you could say more about what it is you're trying to accomplish (what your need for non-capturing groups is) and why you want to avoid capture groups.

Edit:

There is a type of non-capturing brackets ((?:pattern)) that are part of Perl-Compatible Regular Expressions (PCRE). They are not supported in sed (but are when using grep -P).

Togs answered 28/1, 2011 at 5:16 Comment(3)
was just me being old fashioned in on principle wanting to be able to reduce overhead with non-capturing brackets where capturing wasn't necessary. and also wanting to know if sed can do it and how, just to know. I did go through a regex quiz once months ago that insisted on non-capturing brackets when capturing wasn't necessary, it didn't use sed though! it made the regex look messier (?: not as easy on the eyes, but i suppose it was to give over what the quiz's author believed probably rightly to be good habitsZahn
@barlop: Ah! Now I understand what you're getting at. The (?:) style non-capturing brackets is part of Perl-Compatible Regular Expressions (PCRE) which is not supported in sed (but is in grep -P).Togs
@Zahn you should switch your answer to Dennis'. Although mine eventually answered your question, it did so in a much round about way.Elyn
Z
35

The answer, is that as of writing, you can't - sed does not support it.

Non-capturing groups have the syntax of (?:a) and are a PCRE syntax.

Sed supports BRE(Basic regular expressions), aka POSIX BRE, and if using GNU sed, there is the option -r that makes it support ERE(extended regular expressions) aka POSIX ERE, but still not PCRE)

Perl will work, for windows or linux

examples here

https://superuser.com/questions/416419/perl-for-matching-with-regular-expressions-in-terminal

e.g. this from cygwin in windows

$ echo -e 'abcd' | perl -0777 -pe 's/(a)(?:b)(c)(d)/\1/s'
a

$ echo -e 'abcd' | perl -0777 -pe 's/(a)(?:b)(c)(d)/\2/s'
c

There is a program albeit for Windows, which can do search and replace on the command line, and does support PCRE. It's called rxrepl. It's not sed of course, but it does search and replace with PCRE support.

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(c)" -r "\1"
a

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(c)" -r "\3"
c

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(?:c)" -r "\3"
Invalid match group requested.

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(?:b)(c)" -r "\2"
c

C:\blah\rxrepl>

The author(not me), mentioned his program in an answer over here https://superuser.com/questions/339118/regex-replace-from-command-line

It has a really good syntax.

The standard thing to use would be perl, or almost any other programming language that people use.

Zahn answered 11/4, 2016 at 10:39 Comment(1)
Not sure why you'd add a link to a windows-only answer to a question tagged Linux, but whatever floats your boat, I guess ... ;)Leonleona
E
7

I'll assume you are speaking of the backrefence syntax, which are parentheses ( ) not brackets [ ]

By default, sed will interpret ( ) literally and not attempt to make a backrefence from them. You will need to escape them to make them special as in \( \) It is only when you use the GNU sed -r option will the escaping be reversed. With sed -r, non escaped ( ) will produce backrefences and escaped \( \) will be treated as literal. Examples to follow:

POSIX sed

$ echo "foo(###)bar" | sed 's/foo(.*)bar/@@@@/'
@@@@

$ echo "foo(###)bar" | sed 's/foo(.*)bar/\1/'
sed: -e expression #1, char 16: invalid reference \1 on `s' command's RHS
-bash: echo: write error: Broken pipe

$ echo "foo(###)bar" | sed 's/foo\(.*\)bar/\1/'
(###)

GNU sed -r

$ echo "foo(###)bar" | sed -r 's/foo(.*)bar/@@@@/'
@@@@

$ echo "foo(###)bar" | sed -r 's/foo(.*)bar/\1/'
(###)

$ echo "foo(###)bar" | sed -r 's/foo\(.*\)bar/\1/'
sed: -e expression #1, char 18: invalid reference \1 on `s' command's RHS
-bash: echo: write error: Broken pipe

Update

From the comments:

Group-only, non-capturing parentheses ( ) so you can use something like intervals {n,m} without creating a backreference \1 don't exist. First, intervals are not apart of POSIX sed, you must use the GNU -r extension to enable them. As soon as you enable -r any grouping parentheses will also be capturing for backreference use. Examples:

$ echo "123.456.789" | sed -r 's/([0-9]{3}\.){2}/###/'
###789

$ echo "123.456.789" | sed -r 's/([0-9]{3}\.){2}/###\1/'
###456.789
Elyn answered 28/1, 2011 at 1:33 Comment(11)
thanks, but what about non-literal grouping non-capturing? and where does it say -r is GNU and default(-e?) is POSIX? I figured default(-e?) is BRE, and -r is ERE. And I guessed both are POSIX with GNU.Zahn
Also, regarding your comment on my terminology.. I think the term brackets could mean round or square or curly, and isn't just square as you say.Zahn
the -r option is a GNU extension. It is not part of the POSIX sed syntax. The -e flag allows you to specify more than one sed script inline, it doesn't have anything to do with BRE vs ERE. For example echo "foo" | sed -e 's/foo/bar/' -e 's/bar/baz/'Elyn
I appreciate your answer, it has interesting info but when I said non-capturing brackets, I meant also not literal brackets, I meant non-capturing grouping brackets.Zahn
@Zahn RE: bracket terminology. I'm not going to say calling ( ) "brackets" is for sure wrong, but I will say that by and large the common terminology is as follows: Brackets [], Parentheses (), Braces {}. Wikipedia seems to back this up as well (see the far right image half way down).Elyn
thanks.. Is sed without -r, POSIX-GNU BRE, and sed with -r POSIX-GNU ERE ? Without -r doesn't have intervals but does have * which could use grouping brackets.Zahn
POSIX-GNU BRE does exist.. what you say suggests that sed doesn't use it.. something must.Zahn
OK.. your answer's examples weren't necessary because I am familiar with escaping brackets and it wasn't what I wanted 'cos either way they weren't grouping non-capturing. The answer then was merely no..sed doesn't have the facility.But a Why is good and you've given a why,in your current answer and comments, but your current explanations to "why", I find to not be very complete and have many glaring unanswered questions.eg would GNU's sed use POSIX BRE and not GNU BRE?You say BRE doesn't do intervals(ok) and hence has no need for grouping.But It does have * and that can make use of groupingZahn
nevertheless, I will accept your answer that sed doesn't allow it and you had some interesting things to say in the comments / update to your question.Zahn
In some dialects of English parentheses are referred to as round brackets.Togs
@Dennis and that wikipedia article all about brackets lists parentheses as a type of bracket.certainly in british english we write ( ) and just call them brackets.and that wikipedia article mentions about the uk.But, one must learn to speak american! so we're on the same page,so it's good to know.writing english in britain we don't use [ ] , i dunno about in america but if we meant [ ] when we said brackets then they'd be something we'd never write with a pen. in the uk brackets existed before computers and are and were something you write with a pen, so typically mean round ones in the uk.Zahn
G
3

As said, it is not possible to have non-capturing groups in sed.

It could be obvious but non-capturing groups are not a necessity(unless running into the back reference limit (e.g. \9).).

One can just use the desired capturing ones and ignore the non-desired ones as if they were non-capturing.

So e.g. of the two capturings here \1 and \2 you can ignore the \1 and just use the \2

$ echo blahblahblahc | sed -r "s/(blah){1,10}(.)/\2/"
c

For reference, nested capturing groups are numbered by the position-order of "(".

E.g.,

echo "apple and bananas and monkeys" | sed -r "s/((apple|banana)s?)/\1x/g"

applex and bananasx and monkeys (note: "s" in bananas, first bigger group)

vs

echo "apple and bananas and monkeys" | sed -r "s/((apple|banana)s?)/\2x/g"

applex and bananax and monkeys (note: no "s" in bananas, second smaller group)

Geoponics answered 14/3, 2018 at 17:2 Comment(6)
What you're saying is really obvious, it's really obvious that non-capturing groups are not an absolute necessity and that you can still use capturing groups but just ignore what was captured. Capturing is mainly just an added potential benefit though one that one doesn't have to use. Nevertheless, perl still offers non-capturing parens. Nobody has ever claimed that non-capturing parens serve some important function that couldn't be done without them. You're just writing here correcting an imagined misconception that nobody has.Zahn
You write "It is obvious. I didn't mean to "correct". Just added it in case anyone as forgetful as I am thinks it is needed to use perl or such for any non-capturing purpose." <-- It's more that there's no such thing as a non-capturing purpose. In the sense that a non-capturing purpose, or non-capturing, isn't necessary, / absolutely necessary / isn't (ever) a necessity. If you were to define a "non-capturing purpose" and grouping was needed then yes it would be necessary, but you're clearly not defining such a purpose in this approach/answer and neither did I in the question, nor should oneZahn
You are right. I edited the answer and just left the example commenting on how nested numbered groups are numbered, which is again quite obvious. If you think it is better to just remove the answer I will do it. ThanksGeoponics
Hector actually that part of the answer was important as otherwise it wouldn't be that clear what your point was. Also another thing that could be more clear is your example with sed, 'cos with all the backslashes used with sed I was first mainly looking at the english of your answer anyway. i'd include that part about counting, as that was what made the answer very clear. You could make the answer clearer by simplifying your sed example with sed -r to reduce backslashes.Zahn
Done. Thanks for the suggestion.Geoponics
They're not a necessity until you're bumping up against the back reference limit (e.g. \9).Togliatti

© 2022 - 2024 — McMap. All rights reserved.