Joining regular expressions in julia
Asked Answered
N

2

10
x = r"abc"
y = r"def"
z = join([x,y], "|")

z # => r"r\"abc\"|r\"def\""

Is there a way to join (and in general manipulate) Regex that deals only with the regex content (i.e. does not treat the r modifier as if it's part of the content). The desired output for z is:

z # => r"abc|def"
Natividad answered 9/12, 2013 at 19:17 Comment(5)
What is the output you're getting?Cocks
@UriMikhli It is the last line in the first code block.Natividad
Well, there's Regex(join([x.pattern,y.pattern], "|")), but that's not very pretty, and I don't know how it would behave in more complex cases.Bellanca
@Bellanca Not pretty but better than I had, I didn't know about the pattern attribute!Natividad
I think you should open this as a issue or maybe a Pull request on github.com/julialang/julia. I think this behaviour is a oversight.P
G
8
macro p_str(s) s end
x = p"abc"
y = p"def"
z = Regex(join([x,y], "|"))

The r"quote" operator actually compiles a regular expression for you which takes time. If you have just parts of a regular expression that you want to use to build a bigger one then you should store the parts using "regular quotes".

But what about the sketchy escaping rules that you get with r"quote" versus "regular quotes" you ask? If you want the sketchy r"quote" rules but not to compile a regular expression immediately then you can use a macro like:

macro p_str(s) s end

Now you have a p"quote" that escapes like an r"quote" but just returns a string.

Not to go off topic but you might define a bunch of quotes for getting around tricky alphabets. Here's some convenient ones:

                                       # "baked\nescape"    -> baked\nescape
macro p_mstr(s) s end                  # p"""raw\nescape""" -> raw\\nescape
macro dq_str(s) "\"" * s * "\"" end    # dq"with quotes"    -> "with quotes"
macro sq_str(s) "'" * s * "'" end      # sq"with quotes"    -> 'with quotes'
macro s_mstr(s) strip(lstrip(s))  end  # s"""  "stripme" """-> "stripme"

When you're done making fragments you can do your join and make a regex like:

myre = Regex(join([x, y], "|"))

Just like you thought.

If you want to learn more about what members an object has (such as Regex.pattern) try:

julia> dump(r"pat")
Regex 
  pattern: ASCIIString "pat"
  options: Uint32 33564672
  regex: Array(Uint8,(61,)) [0x45,0x52,0x43,0x50,0x3d,0x00,0x00,0x00,0x00,0x28  …   0x1d,0x70,0x1d,0x61,0x1d,0x74,0x72,0x00,0x09,0x00]
Galle answered 10/12, 2013 at 0:6 Comment(2)
Thanks Michael. It seems the answer to my question is no. Your answer contains some cool stuff (I didn't even know about dump()), but I already understand that I can construct regexes by manipulating string parts and then calling Regex(). The specific scenario I have, though, is when one has regexes and not strings. I guess in that case you have to use pattern.Natividad
It seems that once you use join() to combine the p-strings the escaping reverts to what it would normally be in a string. So the combined pattern does not have the correct escaping after all. I could be missing something, of course, since I am new to Julia.Thorvald
I
1

Instead of joining regexes, I think that it is better to join strings and then convert the result to regex. In this way, you can solve your problem as follows:

x = "abc"
y = "def"
z = Regex(join([x,y], "|"))
println(z)

You should get r"abc|def" as the output.


Note: Here I exploited the answer of Michel Fox by removing the macro

Irritating answered 25/12, 2022 at 10:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.