Regular expression matching on comma bounded by nonwhite space
Asked Answered
M

4

5

I am trying to replace commas bounded by nonwhite space with a white space, while keeping other commas untouched (in R).

Imagine I have:

j<-"Abc,Abc, and c"

and I want:

"Abc Abc, and c"

This almost works:

gsub("[^ ],[^ ]"," " ,j)

But it removes the characters either side of the commas to give:

"Ab bc, and c"
Montanez answered 1/3, 2017 at 12:39 Comment(0)
T
5

You may use a PCRE regex with a negative lookbehind and lookahead:

j <- "Abc,Abc, and c"
gsub("(?<!\\s),(?!\\s)", " ", j, perl = TRUE)
## => [1] "Abc Abc, and c"

See the regex demo

Details:

  • (?<!\\s) - there cannot be a whitespace right before a ,
  • , - a literal ,
  • (?!\\s) - there cannot be a whitespace right after a ,

An alternative solution is to match a , that is enclosed with word boundaries:

j <- "Abc,Abc, and c"
gsub("\\b,\\b", " ", j)
## => [1] "Abc Abc, and c"

See another R demo.

Tiedeman answered 1/3, 2017 at 12:42 Comment(2)
Is this functionally equivalent: "(?<=\\S),(?=\\S)"?Conjuncture
No, negative lookarounds are not equivalent to positive ones as positive lookarounds require the presence of the pattern. Usually, the difference is seen at start/end of string positions. (?<=\S) requires a non-whitespace before the next subpattern, thus, there will be no match at the start of the string. (?<!\s) means there cannot be a whitespace before, but the start of string can be there.Illyes
H
3

You can use back references like this:

gsub("([^ ]),([^ ])","\\1 \\2" ,j)
[1] "Abc Abc, and c"

The () in the regular expression capture the characters adjacent to the comma. The \\1 and \\2 return these captured values in the order they were captured.

Hydrosome answered 1/3, 2017 at 12:41 Comment(0)
D
3

We can try

gsub(",(?=[^ ])", " ", j, perl = TRUE)
#[1] "Abc Abc, and c"
Dossal answered 1/3, 2017 at 12:41 Comment(0)
V
0

Maybe it also works:

library("stringr")
j<-"Abc,Abc, and c"
str_replace(j,"(\\w+),([\\w]+)","\\1 \\2")
Vivacity answered 1/3, 2017 at 12:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.