Producing all possible matches of a regular expression

About

Asked 10/7, 2011 at 21:13 Answered 10/7, 2011 at 21:38

Solved regex string algorithm production

Given a regular expression, I want to produce the set of strings that that regular expression would match. It is important to note that this set would not be infinite because there would be maximum length for each string. Are there any well known algorithms in place to do this? Are there any research papers I could read to gain insight into this problem?

Thanks.

p.s. Would this sort of question be more appropriate in the theoretical cs stack exchange?

Firdausi answered 10/7, 2011 at 21:13 Comment(5)

Well, we can't vote to move to Theoretical CS, so you can flag your question and ask a mod. – Fanfare 10/7, 2011 at 21:17

All the possible strings correspond to the all the possible paths through the state machine that ends up in a match. But this is like asking, give me all the possible programs of limited length that match my the output of my program. – Voncile 10/7, 2011 at 21:23

When you say a "maximum length" for each string you mean your regex does not contain any + or * operators? – Kingdon 10/7, 2011 at 21:26

Why would you want to do that? – Lachish 10/7, 2011 at 21:29

@Ray Toal: The regular expressions can contain + or * but a maximum string length would ensure there are not infinite possibilities. Am I making sense? – Firdausi 12/7, 2011 at 23:17

Are there any well known algorithms in place to do this?

In the Perl eco-system the Regexp::Genex CPAN module does this.

In Python the sre_yield generates the matching words. Regex inverter also does this.

A recursive algorithm is described here link1 link2 and several libraries that do this in Java are mentioned here.

Generation of random words/strings that match a given regex: xeger (Python)

Are there any research papers I could read to gain insight into this problem?

Yes, the following papers are available for counting the strings that would match a regex (or obtaining generating functions for them):

Counting occurrences for a finite set of words: an inclusion-exclusion approach by F. Bassino, J. Clement2, J. Fayolle, and P. Nicodeme (2007) paper slides
Regexpcount, a symbolic package for counting problems on regular expressions and words by Pierre Nicodeme (2003) paper link link code

Greenwich answered 10/7, 2011 at 21:38 Comment(2)

Thank you. I may just go ahead and use that module. Are you aware of any resources that discuss how to implement such a module? I was curious how it could be implemented elegantly/efficiently. – Firdausi 12/7, 2011 at 16:57

just check its source code. intuitively, a regex is an automaton, so a graph. generating all strings matching a string would mean finding all paths starting the "start" node and ending in the "accept" node of the automaton, so it would just mean an enumeration of paths inside a graph. – Greenwich 14/7, 2011 at 16:32

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags