efficient longest common subsequence algorithm library?

Asked 7/9, 2010 at 13:25 Answered 30/9, 2013 at 20:3

c++algorithm performance dynamic-programming lcs

I'm looking for a (space) efficient implementation of an LCS algorithm for use in a C++ program. Inputs are two random access sequences of integers.
I'm currently using the dynamic programming approach from the wikipedia page about LCS. However, that has O(mn) behaviour in memory and time and dies on me with out of memory errors for larger inputs.
I have read about Hirschberg's algorithm, which improves memory usage considerably, Hunt-Szymanski and Masek and Paterson. Since it isn't trivial to implement these I'd prefer to try them on my data with an existing implementation. Does anyone know of such a library? I'd imagine since text diff tools are pretty common, there ought to be some open source libraries around?

Lublin answered 7/9, 2010 at 13:25 Comment(4)

Are you interested in the actual longest common subsequence or just its length? – Goodly 7/9, 2010 at 14:8

Disappointed that a few quick web searches didn't turn up anything especially useful (loads of ad hoc implementations for char in C, but nothing with either Hirschberg's linear-space speedup or templated on element type for C++). If you do find (or create :D) anything, please update! – Multivalent 8/9, 2010 at 6:4

Also: Not directly on-topic, but Myers had a couple of O(nd) algorithms, where d is the number of edits needed. Very nice for inputs that you expect to be similar! (I think one of these is still used in most diffs.) – Multivalent 8/9, 2010 at 6:5

The best I have found so far is wordaligned.org/articles/longest-common-subsequence Although you have to be careful: the C++ version increments iterators past the end when performing the recursive calls. Needs fixing. Also, it does not implement the common prefix/suffix optimization. – Lublin 8/9, 2010 at 15:55

When searching for things like that, try scholar.google.com. It is much better for finding scholarly works. It turned up http://www.biotec.icb.ufmg.br/cabi/artigos/seminarios2/subsequence_algorithm.pdf this document, a "survey of longest common subsequences algorithms".

Encounter answered 9/9, 2010 at 6:16 Comment(2)

Grudging +1 because the OP really wants library implementations of said algorithms, not descriptions. But probably a useful paper anyway. – Multivalent 9/9, 2010 at 10:42

Also it would be helpful to know the date of publication & other details. – Multivalent 9/9, 2010 at 10:45

Hirschberg's Algorithm embeds a javascript implementation : almost C.

Cozmo answered 30/9, 2013 at 20:3 Comment(0)

-1

Not C++ but Python but I think usable.

http://wordaligned.org/articles/longest-common-subsequence

Anaximenes answered 22/9, 2011 at 12:34 Comment(0)

Recommended topics

Hot tags