Odd Behavior with Greedy Modifiers Inside Capture Groups - McMap

About

Odd Behavior with Greedy Modifiers Inside Capture Groups

Asked 26/2, 2014 at 23:45 Answered 27/2, 2014 at 2:16

Solved regex r posix-ere

A

1

9

Consider the following commands:

text <- "abcdEEEEfg"

sub("c.+?E", "###", text)
# [1] "ab###EEEfg"                          <<< OKAY
sub("c(.+?)E", "###", text)
# [1] "ab###EEfg"                           <<< WEIRD
sub("c(.+?)E", "###", text, perl=T)
# [1] "ab###EEEfg"                          <<< OKAY

The first does exactly what I expect, basically matching just the first E. The second one should essentially be identical to the first, since all I'm doing is adding a capturing group (though I'm not using it), yet for some reason it captures an extra E. That said, it isn't fully greedy (i.e. if it was it would have captured all the Es). Even weirder, it actually still matches the pattern, even though the sub result suggests the .+? piece left out EE, which can no longer be matched by the rest of the regular expression. This suggests there is an offset issue when computing the length of the matched sub-expression, rather than in the actual matching.

The final one is exactly the same but run with PCRE, and that works as expected.

Am I missing something or is this behavior undocumented/buggy?

Adorno answered 26/2, 2014 at 23:45 Comment(2)

This smells like a bug in R. – Apothecary 27/2, 2014 at 0:1

Posted as a bug on the tre github page. – Adorno 19/6, 2014 at 13:19

D

2

R uses libtre, version 0.8. For more stability, you should always use perl = TRUE.

Note that

sub("c(.+?)E?", "###", text)

works.

Dayna answered 27/2, 2014 at 2:16 Comment(4)

This is what I've always done, but there are some things not implemented with the perl = T flag (regexec in particular). My actual bug had come up while trying to use regexec (or more specifically, the str_match_all/etc. tools in stringr that rely on it) and I was similarly able to work around it by adding .* after the pattern, though for the sub example it obviously doesn't work. It no one else has more info by the morning I'll take this as the answer. Do you know if there are any plans to update the library? Looks like 0.8 has been around for 4 years. – Adorno 27/2, 2014 at 2:30

Actually, looks like the TRE library has already been updated (search for TRE). – Adorno 27/2, 2014 at 2:35

I fixed my answer to reflect the update. It doesn't look like development is continuing on libtre. There are several open issues, one of which is about R. I think this should be raised as a bug to the R development team. – Dayna 27/2, 2014 at 3:33

I submitted this to R and got sent packing suggesting I submit it to TRE instead. I submitted it to laurikari as well, though I suspect you're right that it is the same issue you link. – Adorno 19/6, 2014 at 13:25

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.