Is it possible to simplify this regular expression any further?
Asked Answered
E

4

6

I'm working on some homework for my compiler class and I have the following problem:

Write a regular expression for all strings of a's and b's that contain an odd number of a's or an odd number of b's (or both).

After a lot of whiteboard work I came up with the following solution:

(aa|bb)* (ab|ba|a|b) ((aa|bb)* (ab|ba) (aa|bb)* (ab|ba) (aa|bb)*)*

However, Is this is the most simplified I can get it? I've considered constructing the DFA trying to minimize the number of states there to see if it would help me simplify but I figured I would ask the regex gurus on SO first.

Eleventh answered 16/9, 2009 at 19:28 Comment(6)
What advanced features of regex are you allowed to use?Corsica
he is using regular expressions in Computer Science, not PCRE or posix regex's ;) They are different.Lights
@Brad Gilbert, I assume we are only allowed to use the regex which has been introduced so far in the book which isn't much. (*, +, ?, |, [], ^). Pretty plain.Eleventh
Brings back memories from when I graded such homework as a TA. Some of the most interesting stuff I've graded, by far. :)Muggy
+1 for asking "why/how?" when the answer was posted. :)Muggy
Thank you everyone! From the combined insight of Greg D, Walt W, and sepp2k we were able to come up with a valid answer.Eleventh
B
8

Take Greg D's recommendation of starting with a(aa)*, and going from there. Sepp2k almost has it right, but the real consideration is that you don't care about the other letter. What I mean is, when you are looking at the "odd number of a's" constraint, you don't care at all about what b's are in your string. Thus, stick b*'s anywhere you can :)

Sepp2k's answer is almost right, but this one is correct:

b* a b* (a b* a b* )* | a* b a* (b a* b a* )*

To elaborate, this regex figures out all strings with an odd number of a's (first section), and OR's those strings with any strings containing an odd number of b's.

Braces answered 16/9, 2009 at 20:0 Comment(3)
@Walt W, I'm running this one through its paces but I think you are correct.Dravidian
please tell me the regular expression for any string that contain even number of a's and even number of b's ?Balalaika
Do you mean an even number of a's OR an even number of b's? I suppose you could do an AND with zero-length lookaheads... That's not standard regex stuff though. If you want to change this equation from odd to even, just drop the first two terms of each segment (ba from the left side and ab from the right side)Braces
S
5

This should work:

b* a b* (a b* a b*)* |  a* b a* (b a* b a*)*
Schouten answered 16/9, 2009 at 19:50 Comment(8)
I was writing something similar :) To elaborate, this regex figures out all strings with an odd number of a's (first section), and OR's those strings with any strings containing an odd number of b's. There's a slight error here though, as first term needs b* at the end, and second option needs a* at the end. Otherwise, abbba will not be accepted.Braces
@sepp2k, this is working in all my test cases. Can you describe your thought process when you were making that? It is far simpler than the path I was going down.Dravidian
Nobody said it can't be ambiguous. Walt is correct, it isn't finished, but all the important bits are there. :)Muggy
@Walt W: You're absolutely right. Fixed. @Simucal: Walt explained it pretty well.Schouten
@sepp2k, yea, we submitted out comments at the same time.Dravidian
@sepp2k: Err, I was a little wrong, you need the b* / a* at the end to be inside the parenthesis, or else you can't have b's between two groups of two a's (like aaabbaa). I posted an answer with that fix..Braces
I think this might fail on: ABBBABB.Dravidian
Simulcal: Refer to my last comment / answer.Braces
M
2

I'm afraid that I don't believe your regex as written is correct. Consider the string:

aba

We have a couple choices for matches, but the fact that it's odd-length means we must match a lone a at the front, so:

(a)(ba)

But, sadly, it's impossible for your second main grouping there to match (ba).

When dealing with a constraint like this, I found it easier to start from the core constraint and go from there. In this case, your constraint is "odd," so start with

a(aa)*

to force an odd number of a's and go from there. :)

Muggy answered 16/9, 2009 at 19:38 Comment(1)
@Greg D, that is true. Let me think on it for a second.Dravidian
C
0

I think you need to approach the problem differently.

You are trying to match anything that doesn't have even numbers of both a and b.

It would probably be easier to start with something that matches even numbers of a and b. All you have to do at that point would be to add something on the end that matches the smallest string that you actually want to match.

Corsica answered 16/9, 2009 at 20:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.