Finding longest common subsequence in O(NlogN) time
Asked Answered
A

6

8

Is there any way of finding the longest common subsequence of two sequences in O(NlogN) time?

I read somewhere that there is a way to achieve this using binary search.

I know the dp approach that takes O(N2) time.

Affiche answered 10/6, 2015 at 22:46 Comment(0)
H
13

For the general case, the O(N^2) dynamic programming algorithm is the best you can do. However, there exist better algorithms in some special cases.

  1. Alphabet size is bounded

This is a very common situation. Sequences consisting of letters from some alphabet (e.g. English) lie in this category. For this case, the O(N*M) algorithm can be optimised to get an O(N^2/logN) with method of four Russians. I don't know how exactly, you can search for it.

  1. Both sequences consist of distinct elements

An example problem is "Given two permutations of numbers from 1 to N, find their LCS". This one can be solved in O(N*logN).
Let the sequences be A and B.
Define a sequence C. C[i] is the index of B[i] in A. (A[C[i]] = B[i])
LCS of A and B is the longest increasing subsequence of C.

Hithermost answered 10/6, 2015 at 23:42 Comment(1)
The second case requires only one string to have distinct elements.Aisha
H
4

The dynamic programming approach, which is O(n2) for general case. For certain other cases, there are lower-complexity algorithms:

  • For a fixed alphabet size (which doesn't grow with n), there's the Method of Four Russians which brings the time down to O(n2/log n) (see here).

  • See here another further optimized case.

Hexahedron answered 10/6, 2015 at 23:36 Comment(1)
There's also the very practical O(nd) approach of Meyers, where d is the Levenshtein distance between the two strings -- this is O(n) if there are a bounded number of differences. TTBOMK it's still what's used in diff.Immoral
D
2

Assuming Exponential Time Hypothesis (which is stricter than P is not equal to NP, but is still widely believed to be true), it is not possible to do it in time O(N^{2 - eps}) for any positive constant eps, see "Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping" by Karl Bringmann and Marvin Kunnemann (pre-print on arXiv is available).

Roughly speaking, it means that the general case of this problem cannot be solved in time better than something like O(N^2/log N), so if you want faster algorithms you have to consider additional constraints (some properties of the strings) or look for approximate solution.

Dermatology answered 10/2, 2017 at 2:14 Comment(0)
F
1

The longest common subsequence between two sequences is essentially n-squared.

Masek and Patterson (1980) made a minor improvement to n-squared / log n using the so-called "Four Russians" technique.

In most cases the additional complexity introduced by such convoluted approaches is not justified by the small gains. For practical purposes you can consider the n-squared approach as the reasonable optimum in typical applications.

Firestone answered 10/6, 2015 at 23:36 Comment(1)
There's also the very practical O(nd) approach of Meyers, where d is the Levenshtein distance between the two strings -- this is O(n) if there are a bounded number of differences. TTBOMK it's still what's used in diff.Immoral
C
0
vector <int> LIS;
int LongestIncreasingSubsequence(int n){
    if(!n) return 0;
    LIS.emplace_back(arr[0]);
    for(int i = 1; i < n; i++){
        if(arr[i] > LIS.back()) LIS.emplace_back(arr[i]);
        else *lower_bound(LIS.begin(), LIS.end(), arr[i]) = arr[i];
    }
    return LIS.size();
}
Coordination answered 12/12, 2021 at 8:28 Comment(1)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Lully
R
0

The following is a O(NLOGM) approach of the LCS problem that uses binary search:

//If the strings are not the same length, take the longest one
//Map each letter with a ArrayList representing the occurences on that letter (indexes)
//The ArrayLists are thus sorted. This will enable binary search later on
Map<Character, List<Integer>> map = new HashMap<>();
for (int i = 0; i < first.length(); i++) {
    char c = first.charAt(i);
    if (!map.containsKey(c)) {
        map.put(c, new ArrayList<>());
    }
    map.get(c).add(i);
}

//Loop on the second string and increment an index to check that any matching character is a valid match
//For sequence match we need to maintain the same order as in the initial string so we use an increasing index for that

int index = 0;
StringBuilder lcs = new StringBuilder();

for (int j = 0; j < second.length(); j++) {
    char c = second.charAt(j);
    if (map.containsKey(c)) {
        List<Integer> indexes = map.get(c);
        //The following method uses Binary Search to find the first bigger or equal index
        //This is possible since the indexes list is sorted by definition
        int firstBigger = firstBiggerOrEqualIndex(indexes, index);
        if (firstBigger != -1) {
            index = firstBigger+1;
            lcs.append(c);
        }
    }
}
return lcs.toString();

The method firstBiggerOrEqualIndex uses Binary Search like this:

private static int firstBiggerOrEqualIndex(List<Integer> indexes, int index) {

    int start = 0;
    int end = indexes.size()-1;
    int res = -1;
    while(start <= end) {
        int mid = (start+end)/2;
        int value = indexes.get(mid);
        if (value > index) {
            res = value;
            end = mid-1;
        }
        else if (value == index) {
            res = value;
            break;
        }
        else {
            start = mid + 1;
        }
    }
    return res;
}
Renal answered 15/4, 2023 at 18:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.