is there any compiler that can convert regexp to fsm? or could convert to human words?
Asked Answered
D

2

8

Something that can convert

r"a+|(?:ab+c)"

to

{
    (1, 'a') : [2, 3],
    (2, 'a') : [2],
    (3, 'b') : [4, 3],
    (4, 'c') : [5]
}

or something similar

and accepting in 2 or 5

Deflect answered 24/6, 2012 at 8:15 Comment(1)
Back in our Theoretical CS classes we did have a method of converting regex to a FSM. After all that's exactly what a regex engine has to to anyway.Craniate
R
4

i have some code that will do this. it's not well documented and it's not supported, but if you're interested you're welcome to look at it.

the library is called rxpy and the repository is http://code.google.com/p/rxpy

the routine that does parsing is parse_pattern at http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/parser/pattern.py#871

if you call repr(...) on the result from that you get a graph in the "dot language" - https://en.wikipedia.org/wiki/DOT_language

for example, see the tests as http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/parser/_test/parser.py#47

to show what i mean ,let's look at the test at http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/parser/_test/parser.py#234 which is for 'ab*c':

"""digraph {
 0 [label="a"]
 1 [label="...*"]
 2 [label="b"]
 3 [label="c"]
 4 [label="Match"]
 0 -> 1
 1 -> 2
 1 -> 3
 3 -> 4
 2 -> 1
}"""

that starts at 0 which can match an "a" to go to state 1. from there you can match a "b" to go to state 2 or a "c" to go to state 3. state 2 then has a transition back to 1 that can consume another "b", etc etc. it's a bit ugly to read by hand, but when the test fails you get a little graph displayed on the screen.

the library also has various "engines" which will match strings against this graph (and so do regular expression matching). but it is much slower than the python library (because it is pure python).

this is not supported and may not be very clear - sorry - but i think it's close to what you want and you're welcome to use it if it's useful (MPL or LGPL licence).

Renee answered 24/6, 2012 at 12:46 Comment(0)
C
6

You have a debug flag that prints your regexp in a more readable form:

>>> import re
>>> re.compile(r"a+|(?:ab+c)", flags=re.DEBUG)
branch
  max_repeat 1 65535
    literal 97
or
  subpattern None
    literal 97
    max_repeat 1 65535
      literal 98
    literal 99
<_sre.SRE_Pattern object at 0x0000000002325328>
Coray answered 24/6, 2012 at 10:43 Comment(0)
R
4

i have some code that will do this. it's not well documented and it's not supported, but if you're interested you're welcome to look at it.

the library is called rxpy and the repository is http://code.google.com/p/rxpy

the routine that does parsing is parse_pattern at http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/parser/pattern.py#871

if you call repr(...) on the result from that you get a graph in the "dot language" - https://en.wikipedia.org/wiki/DOT_language

for example, see the tests as http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/parser/_test/parser.py#47

to show what i mean ,let's look at the test at http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/parser/_test/parser.py#234 which is for 'ab*c':

"""digraph {
 0 [label="a"]
 1 [label="...*"]
 2 [label="b"]
 3 [label="c"]
 4 [label="Match"]
 0 -> 1
 1 -> 2
 1 -> 3
 3 -> 4
 2 -> 1
}"""

that starts at 0 which can match an "a" to go to state 1. from there you can match a "b" to go to state 2 or a "c" to go to state 3. state 2 then has a transition back to 1 that can consume another "b", etc etc. it's a bit ugly to read by hand, but when the test fails you get a little graph displayed on the screen.

the library also has various "engines" which will match strings against this graph (and so do regular expression matching). but it is much slower than the python library (because it is pure python).

this is not supported and may not be very clear - sorry - but i think it's close to what you want and you're welcome to use it if it's useful (MPL or LGPL licence).

Renee answered 24/6, 2012 at 12:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.