Testing intersection of two regular languages

Asked 26/2, 2010 at 0:33 Answered 26/2, 2010 at 2:0

I want to test whether two languages have a string in common. Both of these languages are from a subset of regular languages described below and I only need to know whether there exists a string in both languages, not produce an example string.

The language is specified by a glob-like string like

/foo/**/bar/*.baz

where ** matches 0 or more characters, and * matches zero or more characters that are not /, and all other characters are literal.

Any ideas?

thanks, mike

EDIT:

I implemented something that seems to perform well, but have yet to try a correctness proof. You can see the source and unit tests

Complicity answered 26/2, 2010 at 0:33 Comment(2)

What language will you use to perform the check? You are probably going to need to write a test bed for this. If you could post a fairly complete test bed it would help. – Whippoorwill 26/2, 2010 at 1:30

This will need to run in JS. I will of course have to write a testbed. I found a useful subset for which I can compute intersection efficiently by doing some tricks. The useful subset is one where * and ** can only appear at the start or directly after a /, and a / cannot be adjacent to another /. That means that I never need to worry whether foo can match boo*baz -- I have to do backtracking, but not a ridiculous amounts since I can always turn text following a * or ** into a suffix check. – Complicity 2/3, 2010 at 1:29

Build FAs A and B for both languages, and construct the "intersection FA" AnB. If AnB has at least one accepting state accessible from the start state, then there is a word that is in both languages.

Constructing AnB could be tricky, but I'm sure there are FA textbooks that cover it. The approach I would take is:

The states of AnB is the cartesian product of the states of A and B respectively. A state in AnB is written (a, b) where a is a state in A and b is a state in B.
A transition (a, b) ->r (c, d) (meaning, there is a transition from (a, b) to (c, d) on symbol r) exists iff a ->r c is a transition in A, and b ->r d is a transition in B.
(a, b) is a start state in AnB iff a and b are start states in A and B respectively.
(a, b) is an accepting state in AnB iff each is an accepting state in its respective FA.

This is all off the top of my head, and hence completely unproven!

Berna answered 26/2, 2010 at 1:32 Comment(1)

Well, this is a well-documented construction called the Cartesian Product Machine, many people beat you to this, and it is a well-documented and correct method to get an FA recognizing the intersection of languages recognized by other FAs. Just saying. – Eccentricity 16/8, 2011 at 20:59

I just did a quick search and this problem is decidable (aka can be done), but I don't know of any good algorithms to do it. One is solution is:

Convert both regular expressions to NFAs A and B
Create a NFA, C, that represents the intersection of A and B.
Now try every string from 0 to the number of states in C and see if C accepts it (since if the string is longer it must repeat states at one point).

I know this might be a little hard to follow but this is only way I know how.

Banna answered 26/2, 2010 at 2:0 Comment(0)

Recommended topics

Hot tags