Does order not matter in regular expressions?
Asked Answered
C

2

3

I was looking at the question posed in this stackoverflow link (Regular expression for odd number of a's) for which it is asked to find the regular expression for strings that have odd number of a over Σ = {a,b}.

The answer given by the top comment which works is b*(ab*ab*)*ab*.

I am quite confused - a was placed just before the last b*, does this ordering actually matter? Why can't it be b*a(ab*ab*)*b* instead (where a is placed after the first b*), or any other permutation of it?

Another thing I am confused about is why it is (ab*ab*)* and not (b*ab*ab*)*. Isn't b*ab*ab* the more accurate definition of 'having exactly 2 a'?

Centrosome answered 28/10, 2019 at 12:20 Comment(0)
G
2

Why can't it be b*a(ab*ab*)*b* instead?

b*a(ab*ab*)*b* does not work because it would require the string to have two consecutive as before the first non-leading b, wouldn't it? For example, abaa would not be matched by your proposed regex when it should. Use the regex debugger on a site like Regex101 to see this for yourself.

On the other hand, moving the whole ab* part to the start (b*ab*(ab*ab*)*) works as well.

why it is (ab*ab*)* and not (b*ab*ab*)*?

(b*ab*ab*)* does work, but the first b* is quite redundant because whatever b there is left, will be matched by the last b* in the group. There is also a b* before the group, which causes the b* to not be able to match anything, hence it is redundant.

Gong answered 28/10, 2019 at 12:35 Comment(0)
C
0

There are infinitely many equivalent regular expressions which generate a given (infinite) regular language. A particular expression might be preferable in some cases and by certain authors: one might prefer a minimal expression, or one which shows structure or symmetry, or even one that simplifies the reasoning in a proof by induction.

Your particular suggestion to move the a is insufficient since, as noted above, that ensures the substring aa will appear in any string with more than one a. However, abab could be changed to baba to make that placement work. Choosing babab* would work with either placement. You could even go for an expression like bab + bababab + (babab*)a(babab*) which might be nice to work with depending on your application. Something like b*(abab)ab* has the advantage of being minimal (if it's not strictly minimal, it must be pretty close).

Chadburn answered 28/10, 2019 at 15:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.