How do I find the language from a regular expression?

Asked 3/10, 2011 at 2:53 Answered 4/10, 2011 at 7:43

Solved regex context-free-grammar regular-language formal-languages

How would I find the language for the following regular expressions over the alphabet {a, b}?

aUb*
(ab*Uc)
ab*Ubc*
a*bc*Uac

EDIT: Before i get downvoted like crazy, i'd appreciate it if someone could show me the steps towards solving these problems, not just the solution. Maybe even walking me through one so I can do the rest on my own.

Thanks!

Barbarity answered 3/10, 2011 at 2:53 Comment(19)

What languages are you talking about? – Bedtime 3/10, 2011 at 2:57

have you ever heard of regular expressions? – Barbarity 3/10, 2011 at 2:59

Do regular expressions have a language? – Bedtime 3/10, 2011 at 3:0

you can build a regular expression from a language based on a given alphabet. So for example based on {a, b} there are a given set of rules which could build each of the following lines above. – Barbarity 3/10, 2011 at 3:0

@prgram4fun Do you mean 'regex flavors', or am I completely misunderstanding? – Betthel 3/10, 2011 at 3:3

Hmm, I never heard of that. Maybe who downvoted you didn't know either. Let's see what people answer and I will learn this as well :) – Bedtime 3/10, 2011 at 3:4

@muntoo Im pretty sure youre misunderstanding. this has to do with grammars/languages, not sure what 'regex flavors' are. – Barbarity 3/10, 2011 at 3:6

@Mosty Mostacho haha i hope we can learn together, the more the merrier! – Barbarity 3/10, 2011 at 3:7

@prgram4fun, why is "c" in some of your regular expressions, when the alphabet is limited to "a" and "b"? – Kokand 3/10, 2011 at 3:13

@Kokand i am not sure. could it possibly be placeholder for a and b? that's how the question was written down, but now that you pointed it out it does seem wrong. – Benignant 3/10, 2011 at 3:17

@prgram4fun maybe you wrote it wrong? – Benignant 3/10, 2011 at 3:19

@MostyMostacho, Hopefully this can clear some confusion up. The OP, is referring to the Formal Language Theory of Regular expressions. Mostacho, you're referring to the more common programming usage of regex's. – Jetton 3/10, 2011 at 3:20

@Benignant nope, that's exactly how the assignment was typed out. – Barbarity 3/10, 2011 at 3:21

@jb Thanks for the info. Looks like a mathematical (and compact) approach to regular expressions. Anyway I still don't get the question :( – Bedtime 3/10, 2011 at 3:27

@MostyMostacho, yea, it's basically the math behind how regex's work. And the only reason I understand the question is cause I'm taking this class right now too :) (doesn't mean I know the answer though...) – Jetton 3/10, 2011 at 3:32

@jb does OP have his tags correct? maybe you could suggest better tags so others who are more familiar with the concepts could help answer the question. – Benignant 3/10, 2011 at 3:50

@tehman, I don't know what CFG's are, so I left it. I added the formal language tag. – Jetton 3/10, 2011 at 3:54

@Mosty: The language of a regular expression is the regular set denoted by the expression. – Propene 4/10, 2011 at 11:57

@Jetton Not all languages generated by context-free grammars are generated by regular grammars. The OP presumably meant regular grammar, not context-free grammar. – Propene 4/10, 2011 at 11:59

Edit: short answer, * means "zero or more repetitions" in almost all regex/grammar syntaxes include perl5 and RFC 5234. * typically binds more tightly than concatenation and alternation.

You say you want a language over the alphabet (a, b), but include c and U in your expressions. I'm going to assume that you want a language grammar over the alphabet (a, b, c) in a form like BNF given a regular expression where U is a low-precedence union operator, similar to | in perl re.

In that case,

a|b*

is equivalent to the BNF:

L := a
   | <M>
M := epsilon
   | b <M>

The * operator means zero or more, so the first clause in M is the base case, and the second clause is a recursive use of M that includes a terminal b.

L is just either a single terminal a or the nonterminal M.

(ab*|c)

L ::= a <M>
    | c
M ::= epsilon
    | b <M>

M is derived the same way as above, and the rest is self explanatory.

ab*|bc*

L ::= a <M>
    | b <N>
M ::= epsilon
    | b <M>
N ::= epsilon
    | c <N>

N is derived in the same way as M above.

a*bc*|ac

The * in most regular expression languages binds more tightly than concatenation, so this is the same as

(a*b(c*))|(ac)

which boils down to

L ::= <M> b <N>
    | a c
M ::= epsilon
    | a <M>
N ::= epsilon
    | c <N>

In general to convert a regex to BNF, you simply use adjacency in a regex to mean adjaceny in BNF, and U or | in a regex means | in BNF.

If you define a nonterminal <X> ::= x then you can handle x* thus:

L ::= epsilon
    | <X> <L>

With the same nonterminal <X> ::= x then you can handle x+ thus:

L ::= <X>
    | <L> <X>

That gets you the repetition operators * and + which leaves ?. x? is simply

L ::= epsilon
    | <X>

Grub answered 3/10, 2011 at 4:13 Comment(2)

ok that was a little hard to understand at first, but i got it down. so basically the * means you can have an infite amount of that letter? so for example the last regular expression could mean 0 or as many a's as you like, 1 b, 0 or as many C's as you like , or ac. And like you said, the c does not make sense since it is not part of the alphabet. so i am assuming it's a mistake on the professors part. thank you! – Barbarity 3/10, 2011 at 4:37

@prgram4fun, yes, * means any number, + means one or more, ? means zero or one. All of these bind more tightly than concatenation, and concatenation binds more tightly than union. – Grub 3/10, 2011 at 5:2

If you know what star, union and concatenation mean, these should be easy. The first is a union b star. According to order of operations, this means a union (b star). Union means anything in the language on the left or anything in the language on the right. a means the language consisting of the length-one string a; b star means the language consisting of any string which consists of zero or more b symbols and nothing else. So this language is {empty, a, b, bb, bbb, bbbb, ...}. In the second one, ab* means all strings consisting of an a followed by zero or more b symbols. Do star first, then concatenation, then union, obeying the given explicit parentheses.

Ruyle answered 3/10, 2011 at 4:20 Comment(0)

Although Mike gave grammars generating the languages denoted by the regular expressions, your assignment requests the languages themselves. Because you're dealing with regular expressions, your answers must be regular sets.

Recall the definition of regular sets over an alphabet:

Let Σ be an alphabet. The class of regular sets over Σ is the smallest class
containing ∅, {λ}, and {a}, for all a ∈ Σ, and closed under union, concatenation,
and Kleene star.

Now recall the definition of regular expressions over an alphabet:

Let Σ be an alphabet. The class of regular expressions over Σ is the smallest
class containing ∅, λ, and a, for all a ∈ Σ, and closed under union, concat-
enation, and Kleene star.

The translation, therefore, should be straightforward. In fact, it consists of inserting curly brackets around each letter! For example:

a ∪ b*  denotes {a} ∪ {b}*
ab* ∪ c denotes {a}{b}* ∪ {c}
...

If you want to express the language of each regular expression in set-builder notation, you can revert to the definitions of the operations over regular sets. Recall:

Let A and B be regular sets. Then
   1  A ∪ B = {x : x ∈ A ∨ x ∈ B}
   2. AB    = {xy : x ∈ A ∧ y ∈ B}
   3. A*    = ∪[i = 0 -> ∞]A^i

The regular sets can be translated into set builder notation by substitution of the definitions of the regular set operations. To avoid introducing nested set-builder notation, I've used equality in conjunction with the definition of concatenation to express the concatenation of regular sets.

{a} ∪ {b}*    = {w : w ∈ {a} ∨ w ∈ ∪[i = 0 -> ∞]{b}^i}
{a}{b}* ∪ {c} = {w : (w = xy ∧ (x ∈ {a} ∧ y ∈ ∪[i = 0 -> ∞]{b}^i)) ∨ w ∈ {c}}
...

You should now be able to find the languages denoted by the remaining expressions without difficulty.

Propene answered 4/10, 2011 at 7:43 Comment(0)

Recommended topics

Hot tags