Capture string in regex replacement
Asked Answered
S

2

10

From what I can gather from the Pharo documentation on regex, I can define a regular expression object such as:

re := '(foo|re)bar' asRegex

And I can replace the matched regex with a string via this:

re copy: 'foobar blah rebar' replacingMatchesWith: 'meh'

Which will result in: `'meh blah meh'.

So far, so good. But I want to replace the 'bar' and leave the prefix alone. Therefore, I need a variable to handle the captured parenthetical:

re copy: 'foobar blah rebar' replacingMatchesWith: '%1meh'

And I want the result: 'foomeh blah remeh'. However, this just gives me: '%1meh blah %1meh'. I also tried using \1, or \\1, or $1, or {1} and got the literal string replacement, e.g., '\1meh blah \1meh' as a result.

I can do this easily enough in GNU Smalltalk with:

'foobar blah rebar' replacingAllRegex: '(foo|re)bar' with: '%1meh'

But I can't find anywhere in the Pharo regex documentation that tells me how I can do this in Pharo. I've done a bunch of googling for Pharo regex as well, but not turned up anything. Is this capability part of the RxMatcher class or some other Pharo regex class?

Stig answered 24/5, 2016 at 1:40 Comment(3)
it seems pharo does not support replacement with capturing groupsTowering
Well, have you tried the usual backreferencing styles? Like \1, or \\1 or $1 (perhaps, with matchesReplacedWith)? Capturing groups are supported, it is clear from what matching can do in Pharo, but there is no hint on whether backreferences are supported as parts of replacement patterns.Emmert
@WiktorStribiżew Yes, I tried \1, \\1, and $1 as well. In each case, the replacement was the literal string. I updated my question indicating those attempts. I see capturing groups are supported as far as matching goes. There are examples in the documentation for capturing and enumerating the captures. However, nothing about backreferencing them in a replacement string. This seems fundamental to regex find/replace to me, so I'm surprised it's not supported.Stig
S
1

After experimenting a bit with the RxMatcher class, I made the following modification to the RxMatcher#copyStream:to:replacingMatchesWith: selector:

copyStream: aStream to: writeStream replacingMatchesWith: aString
    "Copy the contents of <aStream> on the <writeStream>,
     except for the matches. Replace each match with <aString>."

    | searchStart matchStart matchEnd |
    stream := aStream.
    markerPositions := nil.
    [searchStart := aStream position.
    self proceedSearchingStream: aStream] whileTrue: [ | ws rep |
        matchStart := (self subBeginning: 1) first.
        matchEnd := (self subEnd: 1) first.
        aStream position: searchStart.
        searchStart to: matchStart - 1 do:
            [:ignoredPos | writeStream nextPut: aStream next].

        "------- The following lines replaced: writeStream nextPutAll: aString ------"
        "Do the regex replacement including lookback substitutions"
        writeStream nextPutAll: (aString format: self subexpressionStrings).
        "-------"

        aStream position: matchEnd.
        "Be extra careful about successful matches which consume no input.
        After those, make sure to advance or finish if already at end."
        matchEnd = searchStart ifTrue: 
            [aStream atEnd
                ifTrue: [^self "rest after end of whileTrue: block is a no-op if atEnd"]
                ifFalse:    [writeStream nextPut: aStream next]]].
    aStream position: searchStart.
    [aStream atEnd] whileFalse: [writeStream nextPut: aStream next]

And then "accessing" category:

subexpressionStrings
   "Create an array of lookback strings"
   | ws |
   ws := Array new writeStream.
   2 to: (self subexpressionCount) do: [ :n | | se |
      ws nextPut: ((se := self subexpression: n) ifNil: [ '' ] ifNotNil: [ se ]) ].
   ^ws contents.

With this modification, I can do a lookback in the replacement string using the Smalltalk String#format: pattern for arguments:

re := '((foo|re)ba(r|m))' asRegex
re copy: 'foobar meh rebam' replacingMatchesWith: '{2}bu{3} (was {1})'

Results in:

'foobur (was foobar) meh rebum (was rebam)'
Stig answered 30/5, 2016 at 3:9 Comment(0)
S
0

Did you check the Regex help? There is no #replacingAllRegex:, but the matcher has #subexpression:

Steven answered 28/5, 2016 at 9:1 Comment(1)
Isn't this really a comment? ;) I read all the online documentation on Pharo regex I could find (which are pretty much all the same instances reiterated). I know there's no #replacingAllRegex: in Pharo. I was citing that as an example of what I could do in GNU Smalltalk. I know the matcher has #subexpression: but there is no selector to perform a regex replacement which has references to those subexpression matches and as they exist in regex libraries of other languages (including the GNU Smalltalk). If I'm mistaken, can you show me an example?Stig

© 2022 - 2024 — McMap. All rights reserved.