How to use back-reference of sed replacement command correctly considering a special Regular Expression
Asked Answered
G

2

10

I am learning the sed s/regexp/replacement/ command on linux.

There are some numbers from phone.txt

(555)555-1212
(555)555-1213
(555)555-1214
(666)555-1215
(777)555-1217

I'd like to use the regular expression (which I have tested on https://www.freeformatter.com/regex-tester.html)

 (\(555\))(.*-)(.*$)

to match numbers which begin with (555). And then I want the output of these three parts of these matched number as: (an example for number (555)555-1212)

Area code: (555) Second: 555- Third: 1212

I tried the following command:

cat phone.txt | sed 's/\(\\\(555\\\)\)\(.*-\)\(.*$)/Area code: \1 Second: \2 Third: \3/'

But the system gave me:

sed: -e expression #1, char 66: Unmatched ( or \(

The general command for all numbers was:

cat phone.txt | sed 's/\(.*)\)\(.*-\)\(.*$\)/Area code: \1 Second: \2 Third: \3/'

Source: https://www.tutorialspoint.com/unix/unix-regular-expressions.htm

But I just want to execute sed on numbers which begins with (555) and add it to the output through back reference.

Could you tell me how to write this special command correctly?

Gobbler answered 30/12, 2018 at 19:32 Comment(2)
You may use sed -E 's/(\(555\))(.*-)(.*)/Area code: \1 Second: \2 Third: \3/'. With POSIX ERE syntax, you may escape parentheses as in all common online regex testers.Dinh
When you use BRE, you don't have to double-escape ( to get a literal parenthesis; just use the parenthesis on its own, so ( instead of \\\\(.Musser
T
18

Ypu are using POSIX BRE syntax in your sed command, and in such patterns, unescaped parentheses match literal parentheses. Escaped parentheses there define capturing groups.

You may use

sed -E 's/(\(555\))(.*-)(.*)/Area code: \1 Second: \2 Third: \3/'

See the online demo

Literal parentheses in POSIX ERE syntax (enabled with -E option) are escaped as in all common online regex testers, and unescaped parentheses define capturing groups.

Transudate answered 30/12, 2018 at 20:22 Comment(1)
Thank you, that is the point I want to know. It is important to feed sed with -E or -r or --regexp-extended to enable extended regular expression syntax.Gobbler
T
5

You can generalize using the formatting included in the string to pick out the first 555, the second 555 and the third 1212 without limiting yourself to any specific prefix within the s/find/replace/ substitution form of sed. You can then limit as needed by including a matching condition before the substitution where you would enter your 555 or 666, etc...

To include the pattern match along with the substitution, you use the following form:

sed '/pattern/s/find/replace/'

To make the pattern match suppress output for all lines except those that match the pattern you pass the -n option to suppress printing of pattern space, and include a p at the end of the substitute form to explicitly print those lines that match, e.g.

sed -n '/pattern/s/find/replace/p'

Now, let's turn to your problem at hand. To limit your reformatted output to only those lines beginning with (555) you would do:

$ sed -n '/^(555)/s/^(\([^)]*\))\([^-]*\)-\(.*\)$/Area code: (\1) Second: \2- Third: \3/p' file
Area code: (555) Second: 555- Third: 1212
Area code: (555) Second: 555- Third: 1213
Area code: (555) Second: 555- Third: 1214

(note: the backreferences capture only the numbers and not the (..) or '-')

To reformat all lines, you would remove the -n and /pattern/ along with the final p, using only the base sed 's/find/replace/ form, e.g.

$ sed 's/^(\([^)]*\))\([^-]*\)-\(.*\)$/Area code: (\1) Second: \2- Third: \3/' file
Area code: (555) Second: 555- Third: 1212
Area code: (555) Second: 555- Third: 1213
Area code: (555) Second: 555- Third: 1214
Area code: (666) Second: 555- Third: 1215
Area code: (777) Second: 555- Third: 1217

Look things over and let me know if you have further questions.

Taxiway answered 30/12, 2018 at 21:18 Comment(4)
Thank you for your detailed answer. So the regular expression ^(\([^)]*\))\([^-]*\)-\(.*\)$ is POSIX BRE syntax, the escaped parentheses define capturing groups (thanks to @wiktor-stribiżew 's answer). Am I right?Gobbler
Yes, you are 100% correct. I prefer BRE if there is a reasonable way to form the REGEX.Taxiway
Thank you. I would like to accept your answer but the answer from @wiktor-stribiżew told me the point I want to know. Your answer worked well too.Gobbler
No problem, it's completely up to you. Glad to help.Taxiway

© 2022 - 2024 — McMap. All rights reserved.