I'm wondering how to find a set of all matches to a given regex with a finite number of matches.
For example:
All of these example you can assume they start with ^
and end with $
`hello?` -> (hell, hello)
`[1-9][0-9]{0,3}` -> (1,2,3 ..., 9998, 9999)
`My (cat|dog) is awesome!` -> (My cat is awesome!, My dog is awesome!)
`1{1,10}` -> (1,11, ..., 111111111, 1111111111)
`1*` -> //error
`1+` -> //error
`(1|11){2}` -> (1,11,111,1111) //notice how it doesn't repeat any of the possibilities
I'd also be interested if there was a way of retrieving count a unique a solutions to the regex or if there is a way to determine if the regex has a finite solutions.
It would be nice if the algorithm could parse any regex, but a powerful enough subset of the regex would be fine.
I'm interested in a PHP solution to this problem, but other languages would also be fine.
EDIT:
I've learned in my Formal Theory class about DFA which can be used to implement regex (and other regular languages). If I could transform the regex into a DFA the solution seems fairly straight forward to me, but that transformation seems rather tricky to me.
EDIT 2:
Thanks for all the suggestions, see my post about the public github project I'm working on to "answer" this question.
/hel+o?/i
matches Hello, Hell, and Hel. That is not the same as generation, though. – LatinizeAll of these example you can assume they start with ^ and end with $
– Hyacinthlanguage agnostic
[i.e. general solutions, for every language] orphp
[solution can and should use php tools]. Also: are you assuming ascii or unicode? for unicode the regex...
could be problematic [too much possibilities] – Bouillon