How to find minimum, maximum length strings generated given a regular expression? [closed]
Asked Answered
I

3

8

How to find minimum and maximum length given a regular expression ?

For example

[1-9]?[0-9]

This regular expression can generate a minimum 1 (0 or 1 0r 2.... or 9) and a maximum of string length 2 (10 or 11 or 12 or ......19 or 20 or 21...........or 99)

Similarly, can anyone provide a function which can calculate minimum and maximum length given a regular expression? Which can take below regex as input?

^[a-zA-Z0-9][a-zA-Z0-9.-]{0,64}[a-zA-Z0-9]$
^[a-zA-Z0-9._-]{1,255}$
^[a-zA-Z0-9 !#$'()*+,./:;=?@\\^_`~-]{1,30}$
^[]a-zA-Z0-9 !#$'()*+,./:;=?@[^_`{|}~-]{0,50}$
^((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])$
Immunochemistry answered 27/2, 2014 at 23:50 Comment(10)
What do you mean by: "I have tried Regular Expression Minimum Length"???Garniture
Regular expressions do not generate anything. They describe (or match) sth.Columbic
I am sorry not generate.Just the minimum n maximum lengths i meant.Immunochemistry
Keep in mind, as soon as you see * or + the maximum length goes to (essentially) infinity.Bozovich
@Alfe: Regular expressions in the CS theory sense generate regular languages. The language generated by an expression is simply the set of all strings that the expression will match.Twill
@HughBothwell: Assuming there's no skullduggery involved with backtracking (which takes the expression beyond the theoretical regular languages).Twill
@nneonneo, I've taught CS beginners for several years at the university in Berlin. Your comment reminds me of that time ;-) And, yes, as you said it: In that sense, a regular expression generates a regular language (and not strings which are elements of that language, as OP put it). But here we are at Stackoverflow which is much more practical oriented. If the question really was about a theoretical CS, it would even be offtopic here, and I would vote for migrating it to computer science. Maybe that's a good choice anyway?Columbic
@Alfe: Possibly. Though the question also asks about Python regexes in particular, which makes it slightly more concrete.Twill
Have you looked at the output of re.compile(expression, re.DEBUG)? It doesn't fully answer your question but it goes quite a long way...Abatement
Unfortunately, that debug output is printed directly to stdout; so to even get it into data structures for further processing, you'd have to capture that output and then you'd have to parse the debug output. All in all the complexity will be the same as with parsing the original regexp, I fear.Columbic
C
5

Regular expressions consist of just a very small set of elements.

  1. atoms (e. g. a or [a-k] or .),
  2. choices (e. g. r1|r2),
  3. repetitions (e. g. r{3,10}, r+, r*, r?).
  4. groups (e. g. (r)) which can be subject to repetitions or choices.
  5. Specials (e. g. ^, $).

That's more or less it unless we want to add non-consuming look-aheads and similar, but they are not part of your example input, so I will not consider these.

How long (minimum/maximum) can these be?

  1. 1 / 1 (atoms are constant in size)
  2. min(minlen(r) for r in choices) / max(maxlen(r) for r in choices)
  3. minlen(r) * minrepretition / maxlen(r) * maxrepetition
  4. minlen(r) / maxlen(r)
  5. 0 (positional parameters match the empty string).

So, what you will need is a regexp parser (as Hugh Bothwell suggested it in his answer) which returns you sth like an abstract syntax tree (absy) of a given regexp; this absy then can be analyzed using the rules I sketched above to find the minimal or maximal length for a string the given regexp can match.

Columbic answered 28/2, 2014 at 9:16 Comment(0)
B
4

There is some starting code at http://pyparsing.wikispaces.com/file/view/invRegex.py for a regex parser in pyparsing; it shouldn't be hard to modify to do what you want.

Some tutorials can be found at http://pyparsing.wikispaces.com/Examples

Bozovich answered 28/2, 2014 at 0:11 Comment(1)
Pyparsing is no longer hosted on wikispaces.com. Go to github.com/pyparsing/pyparsingWilliamwilliams
G
2

Looks like you need to build a regex parser to parse these regular expressions and calculate that for you. Something that would look at brackets as a single character, braces as the variable len, and the |'s for more variability. Looks like you've got a lot of homework in front of you. Good luck!

Edit, some additional help.

Ok, here's a bit to perhaps get you started:

This regular expression, for example:

^[a-zA-Z0-9 !#$'()*+,./:;=?@\\^_`~-]{1,30}$
^^--------one of these characters--^^----^^-end of string
^---start of string                   ^one to thirty times

So this regular expression would be 1 to 30 characters long.

Does that help? But seriously, I'm not going to do more than this, you need to read the re docs: http://docs.python.org/library/re.html

Garniture answered 28/2, 2014 at 0:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.