Longest common contiguous subsequence - algorithm
Asked Answered
T

3

7

My question is simple: Is there an O(n) algorithm for finding the longest contiguous subsequence between two sequences A and B? I searched it, but all the results were about the LCS problem, which is not what I'm seeking.

Note: if you are willing to give any sample code, you are more than welcome to do so, but please, if you can, in C or C++.

Edit: Here is an example:

A: { a, b, a, b, b, b, a }
B: { a, d, b, b, b, c, n }
longest common contiguous subsequence: { b, b, b }
Tadpole answered 25/12, 2012 at 18:7 Comment(4)
If you aren't seeking lcs please supply a trivial example of what you are seekingPutumayo
This is the same as the longest common substring.Hatchery
that looks the same as LCS. Or is it LCS with the additional constraint that the sequence must be a repetition of the same symbols? i.e. {a,b,c,d,d} and {d,d,a,b,c} yields {d,d}?Fibrosis
Clarifying fgb's comment for other newbs like me: Longest Common Contiguous Subsequence = Longest Common String. As Wikipedia says, " unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences". Hence "contiguous subsequence" or "consecutive subsequence" can be replaced by "substring".Sinclare
D
2

Yes, you can do this in linear time. One way is by building suffix trees for both the pattern and the text and computing their intersection. I can't think of a way to do this without involving suffix trees or suffix arrays, though.

Digamy answered 25/12, 2012 at 18:23 Comment(3)
The solution with suffix trees is not linear.Tadpole
@RondogiannisAristophanes yes, it is. It's linear in n + m, so it's linear in n if n = m.Benkley
It takes linear time to build a suffix tree. (Surprising, but true.) Suffix trees take linear space. Walking two suffix trees simultaneously to find the longest common string takes linear time.Digamy
W
1

I am not sure whether there exists an O(n) algorithm. Here is a O(n*n) dynamic solution, maybe it is helpful to you. Let lcs_con[i][j] represent the longest common contiguous subsequence which end with element A_i from array A and B_j from array B. Then we can get the equations below:

lcs_con[i][j]=0 if i==0 or j==0
lcs_con[i][j]=0 if A_i != B_j
lcs_con[i][j]=lcs_con[i-1][j-1] if A_i==B_j

we can record the maximum of lcs_con[i][j] and the ending index during the calculation to get the longest common contiguous subsequence. The code is below:

#include<iostream>

using namespace std;


int main()
{
    char A[7]={'a','b','a','b','b','b','a'};
    char B[7]={'a','d','b','b','b','c','n'};

    int lcs_con[8][8];    
    memset(lcs_con,0,8*8*sizeof(int));

    int result=-1;
    int x=-1;
    int y=-1;

    for(int i=1;i<=7;++i)
      for(int j=1;j<=7;++j)
      {
          if(A[i-1]==B[j-1])lcs_con[i][j]=lcs_con[i-1][j-1]+1;
          else lcs_con[i][j]=0;

          if(lcs_con[i][j]>result)
          {
               result=lcs_con[i][j];
               x=i;
               y=j;                   
          }
      }

   if(result==-1)cout<<"There are no common contiguous subsequence";
   else
   {
       cout<<"The longest common contiguous subsequence is:"<<endl;
       for(int i=x-result;i<x;i++)cout<<A[i];
       cout<<endl;
   }

   getchar();
   getchar();

   return 0;    
}

Hope it helps!

Wilke answered 26/12, 2012 at 17:43 Comment(1)
I think there is a typo: lcs_con[i][j]=lcs_con[i-1][j-1] if A_i==B_j should be lcs_con[i][j]=lcs_con[i-1][j-1] + 1 if A_i==B_jWellknown
S
0

that is what you are looking for:

KMP algorithm c implementation

the basic theory:

  1. How to find Longest Common Substring using C++

  2. http://en.wikipedia.org/wiki/Longest_common_substring_problem

Salvador answered 25/12, 2012 at 18:25 Comment(4)
This sounds interesting. Could you please briefly explain this algorithm?Tadpole
Didn't you understand the wiki summary? in corman algo book there is a nice explanation as wellSalvador
KMP would not be linear, since you need to run it more than one time. You need a suffix tree.Benkley
The fastest solution I can think of that uses KMP will be quadratic. Can you elaborate?Digamy

© 2022 - 2024 — McMap. All rights reserved.