Regular expression not containing 101 [closed]
Asked Answered
G

6

6

I came across the regular expression not containing 101 as follows:

010+(1+00+000)+(0+1+0+)

I was unable to understand how the author come up with this regex. So I just thought of string which did not contain 101:

01000100

I seems that above string will not be matched by above regex. But I was unsure. So tried translating to equivalent pcre regex on regex101.com, but failed there too (as it can be seen my regex does not even matches string containing single 1.

Whats wrong with my translation? Is above regex indeed correct? If not what will be the correct regex?

Gereron answered 17/1, 2016 at 9:9 Comment(3)
Question: would it not be easier to have a regex that matches strings containing 101 and then keep only the unmatched strings ?Wasteland
That string is matched. Do you need to match a string that does not contain 101?Pompea
It doesn't look very clear what the meaning of + is in your question. To me, it seems as if a + in the same line as the numbers is intended to mean what's typically expressed as |. That's not how you translated it though.Airborne
T
1

Read the explanation in the right side tab in regex101. It tells you what your regex does.

I think you misunderstood what the list operatorcharacter class does; inside a list operator ([) , the other characters such as ( won't be metacharacters anymore so the expression [(0*1*0*)[1(00)(000)] will be equivalent to [01()*[] which means it matches 0 or 1 or ( or ) or [.

The correct translation of the regular expression

0∗1∗0∗+(1+00+000)∗+(0+1+0+)∗

will be as follows:

^((?:0*1*0*)|(?:1|00|000)*|(?:0+1+0+)*)$

regex101

Regular expression visualization

Debuggex Demo

What your regex [(0*1*0*)[1(00)(000)]*(0+1+0+)*] does:

  • [(0*1*0*)[1(00)(000)]* matches any of characters 0, (, ), *, [ zero or more times.
  • (0+1+0+)* matches the pattern 0+1+0+ 0 or more times
  • ] matches the character ]

So you expression is equivalent to

[([)01](0+1+0+)*]

which is not a regular expression to match strings that do not contain 101.

Taciturnity answered 17/1, 2016 at 9:46 Comment(1)
Yes thanks I didn't know inside [], meta characters are not allowed. I was thinking [] is like "any character out of", so just put multiple character sequences inside () to make "any word out of ". Poor me.Gereron
A
5

Here is a bit shorter expression ^0*(1|00+)*0*$

https://www.regex101.com/r/gG3wP5/1

Explanation:

  • (1|00+)* we can mix zeroes and ones as long as zeroes occur in groups
  • ^0*...0*$ we can have as many zeroes as we want in prefix/suffix

Direct translation of the original regexp would be like

^(0*1*0*|(1|00|000)*|(0+1+0+)*)$

Update
This seems like artificially complicated version of the above regexp:

  • (1|00|000)* is the same as (1|00+)*
    • it is almost the solution, but it does not match strings 0, 01.., and ..10
  • 0*1*0* doesn't match strings with 101 inside, but matches 0 and some of 01.., and ..10
    • we still need to match those of 01.., and ..10 which have 0 & 1 mixed inside, e.g. 01001.. or ..10010
  • (0+1+0+)* matches some of the remaining cases but there are still some valid strings unmatched
    • e.g. 10010 is the shortest string that is not matched by all of the cases.

So, this solution is overly complicated and not complete.

Agrigento answered 17/1, 2016 at 9:39 Comment(4)
I think this one should be the standard solution for this home work assignment and not the original regex which is pretty complicated for back engineering. I guess the author of the original regex lacked the insight, that "we can mix zeroes and ones as long as zeroes occur in groups"Izzo
@Izzo , I wasn't trying to simplify regex or anything, I was trying to show where the OP went wrong by not knowing the change of metacharacters in the regex quantifiers :), If this was a question about regex that do not match strings with 101 , the answer would have been different :)Taciturnity
yeah your regex is far more intuitive than the one given in slides. Just wondering if you can explain the logic behind slide's regex too...?Gereron
You can reduce the steps needed, if you write your pattern like this: ^0*(?:1*0{2,})*1*0*$Abaft
T
1

Read the explanation in the right side tab in regex101. It tells you what your regex does.

I think you misunderstood what the list operatorcharacter class does; inside a list operator ([) , the other characters such as ( won't be metacharacters anymore so the expression [(0*1*0*)[1(00)(000)] will be equivalent to [01()*[] which means it matches 0 or 1 or ( or ) or [.

The correct translation of the regular expression

0∗1∗0∗+(1+00+000)∗+(0+1+0+)∗

will be as follows:

^((?:0*1*0*)|(?:1|00|000)*|(?:0+1+0+)*)$

regex101

Regular expression visualization

Debuggex Demo

What your regex [(0*1*0*)[1(00)(000)]*(0+1+0+)*] does:

  • [(0*1*0*)[1(00)(000)]* matches any of characters 0, (, ), *, [ zero or more times.
  • (0+1+0+)* matches the pattern 0+1+0+ 0 or more times
  • ] matches the character ]

So you expression is equivalent to

[([)01](0+1+0+)*]

which is not a regular expression to match strings that do not contain 101.

Taciturnity answered 17/1, 2016 at 9:46 Comment(1)
Yes thanks I didn't know inside [], meta characters are not allowed. I was thinking [] is like "any character out of", so just put multiple character sequences inside () to make "any word out of ". Poor me.Gereron
A
1

0* 1* ( (00+000)* 1*)* (ε+0)

i think this expression covers all cases because --
any number apart from 1 can be broken into constituent 2's and 3's i.e. any number n=2*i+3*j. So there can be any number of 0's between 2 consecutive 1's apart from one 0.Hence, 101 cannot be obtained.

ε+0 for expressions ending in one 0.

Athodyd answered 22/12, 2017 at 14:58 Comment(0)
A
0

The RE for language not containing 101 as sub-string can also be written as (0*1*00)*.0*.1*.0*

This may me a smaller one then what you are using. Try to make use of this.

Amphimacer answered 31/12, 2017 at 12:41 Comment(0)
P
0

Regular expression I got: (0+10)*1*. (Looks simple :P)

I just considered all cases to make this.

You consider two 1's; we have to end up with continuous 1's.

  • Case 1: 11111111111111...

  • Case 2: 0000000011111111111111...(once we take two 1's we cant accept 0's so one and only chance is to continue with 1's)

    If you consider only one 1 which was followed by 0, no issue and after one 1 we can have any number of 0's.

  • Case 3: 00000000 10100100010000100000100000 1111111111

    => (0*+10*)*1*

Final answer: (0+10)*1*.

Project answered 14/8, 2021 at 7:27 Comment(0)
D
0

The RE for language not containing "101" as substrings can also be written as 0*(1*00(0)*)*1*0*

You won't get "101" because always there is 0 or more than 1 0's between any 1s

Decibel answered 10/8 at 9:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.