using captured groups in str_replace / stri_replace - stringi vs stringr [duplicate]
Asked Answered
H

2

7

Most stringr functions are just wrappers around corresponding stringi functions. str_replace_all is one of those. Yet my code does not work with stri_replace_all, the corresponding stringi function.

I am writing a quick regex to convert (a subset of) camel case to spaced words.

I am quite puzzled as to why this works:

str <- "thisIsCamelCase aintIt"
stringr::str_replace_all(str, 
                         pattern="(?<=[a-z])([A-Z])", 
                         replacement=" \\1")
# "this Is Camel Case ain't It"

And this does not:

stri_replace_all(str, 
                 regex="(?<=[a-z])([A-Z])", 
                 replacement=" \\1")
# "this 1s 1amel 1ase ain't 1t"
Housewares answered 19/8, 2016 at 10:26 Comment(1)
One option would be stri_replace_all(str, regex = "(?<=[a-z])(?=[A-Z])", replacement=" ")Wulf
H
11

If you look at the source for stringr::str_replace_all you'll see that it calls fix_replacement(replacement) to convert the \\# capture group references to $#. But the help on stringi:: stri_replace_all also clearly shows that you use $1, $2, etc for the capture groups.

str <- "thisIsCamelCase aintIt"
stri_replace_all(str, regex="(?<=[a-z])([A-Z])", replacement=" $1")
## [1] "this Is Camel Case aint It"
Hageman answered 19/8, 2016 at 11:59 Comment(0)
W
2

The below option should return the same output in both cases.

pat <- "(?<=[a-z])(?=[A-Z])"
str_replace_all(str, pat, " ")
#[1] "this Is Camel Case aint It"
stri_replace_all(str, regex=pat, " ")
#[1] "this Is Camel Case aint It"

According to the help page of ?stri_replace_all, there are examples that suggest $1, $2 are used for replacement

stri_replace_all_regex('123|456|789', '(\\p{N}).(\\p{N})', '$2-$1')

So, it should work if we replace the \\1 with $1

stri_replace_all(str, regex = "(?<=[a-z])([A-Z])", " $1")
#[1] "this Is Camel Case aint It"
Wulf answered 19/8, 2016 at 11:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.