Longest common prefix (LCP) of a list of strings

Asked 23/11, 2017 at 19:52 Answered 29/11, 2018 at 22:42

Solved list prolog prolog-dif prolog-cut logical-purity

lcs([ H|L1],[ H|L2],[H|Lcs]) :-
    !,
    lcs(L1,L2,Lcs).
lcs([H1|L1],[H2|L2],Lcs):-
    lcs(    L1 ,[H2|L2],Lcs1),
    lcs([H1|L1],    L2 ,Lcs2),
    longest(Lcs1,Lcs2,Lcs),
    !.
lcs(_,_,[]).

longest(L1,L2,Longest) :-
    length(L1,Length1),
    length(L2,Length2),
    (  Length1 > Length2
    -> Longest = L1
    ;  Longest = L2
    ).

This is my code so far. How could I optimize it so that it prints the prefix, e.g.:

["interview", "interrupt", "integrate", "intermediate"]

should return "inte"

A bit rusty with Prolog, haven't done it in a while :)

Shillelagh answered 23/11, 2017 at 19:52 Comment(1)

Is there anything wrong with the code? Does it provide a solution? Or an incorrect solution? Having it "print" the prefix when it doesn't currently is a feature addition, not an "optimization". – Discomfort 23/11, 2017 at 20:19

First, let's start with something related, but much simpler.

:- set_prolog_flag(double_quotes, chars).  % "abc" = [a,b,c]

prefix_of(Prefix, List) :-
   append(Prefix, _, List).

commonprefix(Prefix, Lists) :-
   maplist(prefix_of(Prefix), Lists).

?- commonprefix(Prefix, ["interview", "integrate", "intermediate"]).
   Prefix = []
;  Prefix = "i"
;  Prefix = "in"
;  Prefix = "int"
;  Prefix = "inte"
;  false.

(See this answer, how printing character lists with double quotes is done.)

This is the part that is fairly easy in Prolog. The only drawback is that it doesn't give us the maximum, but rather all possible solutions including the maximum. Note that all strings do not need to be known, like:

?- commonprefix(Prefix, ["interview", "integrate", Xs]).
   Prefix = []
;  Prefix = "i", Xs = [i|_A]
;  Prefix = "in", Xs = [i, n|_A]
;  Prefix = "int", Xs = [i, n, t|_A]
;  Prefix = "inte", Xs = [i, n, t, e|_A]
;  false.

So we get as response a partial description of the last unknown word. Now imagine, later on we realize that Xs = "induce". No problem for Prolog:

?- commonprefix(Prefix, ["interview", "integrate", Xs]), Xs = "induce".
   Prefix = [], Xs = "induce"
;  Prefix = "i", Xs = "induce"
;  Prefix = "in", Xs = "induce"
;  false.

In fact, it does not make a difference whether we state this in hindsight or just before the actual query:

?- Xs = "induce", commonprefix(Prefix, ["interview", "integrate", Xs]).
   Xs = "induce", Prefix = []
;  Xs = "induce", Prefix = "i"
;  Xs = "induce", Prefix = "in"
;  false.

Can we now based on this formulate the maximum? Note that this effectively necessitates some form of extra quantor for which we do not have any direct provisions in Prolog. For this reason we have to limit us to certain cases we know will be safe. The easiest way out would be to insist that the list of words does not contain any variables. I will use iwhen/2 for this purpose.

maxprefix(Prefix, Lists) :-
   iwhen(ground(Lists), maxprefix_g(Prefix, Lists)).

maxprefix_g(Prefix, Lists_g) :-
   setof(N-IPrefix, ( commonprefix(IPrefix, Lists_g), length(IPrefix, N ) ), Ns),
   append(_,[N-Prefix], Ns).   % the longest one

The downside of this approach is that we get instantiation errors should the list of words not be known.

Note that we made quite some assumptions (which I hope really hold). In particular we assumed that there is exactly one maximum. In this case this holds, but in general it could be that there are several independent values for Prefix. Also, we assumed that IPrefix will always be ground. We could check that too, just to be sure. Alternatively:

maxprefix_g(Prefix, Lists_g) :-
   setof(N, IPrefix^ ( commonprefix(IPrefix, Lists_g), length(IPrefix, N ) ), Ns),
   append(_,[N], Ns),
   length(Prefix, N),
   commonprefix(Prefix, Lists_g).

Here, the prefix does not have to be one single prefix (which it is in our situation).

The best, however, would be a purer version that does not need to resort to instantiation errors at all.

Kile answered 23/11, 2017 at 21:50 Comment(0)

Here's the purified variant of the code proposed (and subsequently retracted) by @CapelliC:

:- set_prolog_flag(double_quotes, chars).

:- use_module(library(reif)).

lists_lcp([], []).
lists_lcp([Es|Ess], Ls) :-
   if_((maplist_t(list_first_rest_t, [Es|Ess], [X|Xs], Ess0),
        maplist_t(=(X), Xs))
       , (Ls = [X|Ls0], lists_lcp(Ess0, Ls0))
       , Ls = []).

list_first_rest_t([], _, _, false).
list_first_rest_t([X|Xs], X, Xs, true).

Above meta-predicate maplist_t/3 is a variant of maplist/2 which works with term equality/inequality reification—maplist_t/5 is just the same with higher arity:

maplist_t(P_2, Xs, T) :-
   i_maplist_t(Xs, P_2, T).

i_maplist_t([], _P_2, true).
i_maplist_t([X|Xs], P_2, T) :-
   if_(call(P_2, X), i_maplist_t(Xs, P_2, T), T = false).

maplist_t(P_4, Xs, Ys, Zs, T) :-
   i_maplist_t(Xs, Ys, Zs, P_4, T).

i_maplist_t([], [], [], _P_4, true).
i_maplist_t([X|Xs], [Y|Ys], [Z|Zs], P_4, T) :-
   if_(call(P_4, X, Y, Z), i_maplist_t(Xs, Ys, Zs, P_4, T), T = false).

First here's a ground query:

?- lists_lcp(["a","ab"], []).
false.                                % fails (as expected)

Here are the queries presented in @Fatalize's fine answer.

?- lists_lcp(["interview",X,"intermediate"], "inte").
   X = [i,n,t,e]
;  X = [i,n,t,e,_A|_B], dif(_A,r)
;  false.

?- lists_lcp(["interview","integrate",X], Z).
   X = Z, Z = []
;  X = Z, Z = [i]
;  X = Z, Z = [i,n]
;  X = Z, Z = [i,n,t]
;  X = Z, Z = [i,n,t,e]
;  X = [i,n,t,e,_A|_B], Z = [i,n,t,e]
;  X = [i,n,t,_A|_B]  , Z = [i,n,t]  , dif(_A,e)
;  X = [i,n,_A|_B]    , Z = [i,n]    , dif(_A,t)
;  X = [i,_A|_B]      , Z = [i]      , dif(_A,n)
;  X = [_A|_B]        , Z = []       , dif(_A,i).

?- lists_lcp([X,Y], "abc").
   X = [a,b,c]      , Y = [a,b,c|_A]
;  X = [a,b,c,_A|_B], Y = [a,b,c]
;  X = [a,b,c,_A|_B], Y = [a,b,c,_C|_D], dif(_A,_C)
;  false.

?- lists_lcp(L, "abc").
   L = [[a,b,c]]
;  L = [[a,b,c],[a,b,c|_A]]
;  L = [[a,b,c,_A|_B],[a,b,c]]
;  L = [[a,b,c,_A|_B],[a,b,c,_C|_D]], dif(_A,_C)
;  L = [[a,b,c],[a,b,c|_A],[a,b,c|_B]]
;  L = [[a,b,c,_A|_B],[a,b,c],[a,b,c|_C]]
;  L = [[a,b,c,_A|_B],[a,b,c,_C|_D],[a,b,c]]
;  L = [[a,b,c,_A|_B],[a,b,c,_C|_D],[a,b,c,_E|_F]], dif(_A,_E) 
…

Last, here's the query showing improved determinism:

?- lists_lcp(["interview","integrate","intermediate"], Z).
Z = [i,n,t,e].                              % succeeds deterministically

Lioness answered 29/11, 2017 at 12:58 Comment(11)

Nice, but it seems we can take it a step further! – Kile 30/11, 2017 at 12:16

@false. Further? More (&better) queries showing that answers do not "overlap"? – Lioness 30/11, 2017 at 17:3

There is definitely a better way! – Kile 30/11, 2017 at 17:10

@false. A better implementation? – Lioness 30/11, 2017 at 17:13

Yes, indeed. Different approach! Imagine you have lcp/3. That is, the lcp for two lists. And now ... – Kile 30/11, 2017 at 17:14

@false. What about universal termination in cases like lists_lcp([X,Y], "abc") ? – Lioness 30/11, 2017 at 20:26

Your query possesses an infinite set of solution that cannot be described with a finite number of answers. So how can you maintain termination? – Kile 30/11, 2017 at 20:44

Rather: Approximate maximal efficiency for the cases that are comparable to traditional algorithms – Kile 30/11, 2017 at 20:44

@false. Really? What about X = [a,b,c], Y = [a,b,c|_A] ; X = [a,b,c,_A|_B], Y = [a,b,c] ; X = [a,b,c,_A|_B], Y = [a,b,c,_C|_D], dif(_A,_C) (see above)? – Lioness 30/11, 2017 at 21:43

My reasoning was flawed. It;s just not only for [X,Y] but also larger lists! – Kile 1/12, 2017 at 16:17

(And... it seems the expansion of (',')/3 is flawed, too. Some more meta-predicate issues are badly needed... 'elp!) – Kile 1/12, 2017 at 16:44

Here is how I would implement this:

:- set_prolog_flag(double_quotes, chars).

longest_common_prefix([], []).
longest_common_prefix([H], H).
longest_common_prefix([H1,H2|T], P) :-
    maplist(append(P), L, [H1,H2|T]),
    (   one_empty_head(L)
    ;   maplist(head, L, Hs),
        not_all_equal(Hs)
    ).

one_empty_head([[]|_]).
one_empty_head([[_|_]|T]) :-
    one_empty_head(T).

head([H|_], H).

not_all_equal([E|Es]) :-
    some_dif(Es, E).

some_dif([X|Xs], E) :-
    if_(diffirst(X,E), true, some_dif(Xs,E)).

diffirst(X, Y, T) :-
    (   X == Y -> T = false
    ;   X \= Y -> T = true
    ;   T = true,  dif(X, Y)
    ;   T = false, X = Y
    ).

The implementation of not_all_equal/1 is from this answer by @repeat (you can find my implementation in the edit history).

We use append and maplist to split the strings in the list into a prefix and a suffix, and where the prefix is the same for all strings. For this prefix to be the longest, we need to state that the first character of at least two of the suffixes are different.

This is why we use head/2, one_empty_head/1 and not_all_equal/1. head/2 is used to retrieve the first char of a string; one_empty_head/1 is used to state that if one of the suffixes is empty, then automatically this is the longest prefix. not_all_equal/1 is used to then check or state that at least two characters are different.

Examples

?- longest_common_prefix(["interview", "integrate", "intermediate"], Z).
Z = [i, n, t, e] ;
false.

?- longest_common_prefix(["interview", X, "intermediate"], "inte").
X = [i, n, t, e] ;
X = [i, n, t, e, _156|_158],
dif(_156, r) ;
false.

?- longest_common_prefix(["interview", "integrate", X], Z).
X = Z, Z = [] ;
X = [_246|_248],
Z = [],
dif(_246, i) ;
X = Z, Z = [i] ;
X = [i, _260|_262],
Z = [i],
dif(_260, n) ;
X = Z, Z = [i, n] ;
X = [i, n, _272|_274],
Z = [i, n],
dif(_272, t) ;
X = Z, Z = [i, n, t] ;
X = [i, n, t, _284|_286],
Z = [i, n, t],
dif(_284, e) ;
X = Z, Z = [i, n, t, e] ;
X = [i, n, t, e, _216|_224],
Z = [i, n, t, e] ;
false.

?- longest_common_prefix([X,Y], "abc").
X = [a, b, c],
Y = [a, b, c|_60] ;
X = [a, b, c, _84|_86],
Y = [a, b, c] ;
X = [a, b, c, _218|_220],
Y = [a, b, c, _242|_244],
dif(_218, _242) ;
false.

?- longest_common_prefix(L, "abc").
L = [[a, b, c]] ;
L = [[a, b, c], [a, b, c|_88]] ;
L = [[a, b, c, _112|_114], [a, b, c]] ;
L = [[a, b, c, _248|_250], [a, b, c, _278|_280]],
dif(_248, _278) ;
L = [[a, b, c], [a, b, c|_76], [a, b, c|_100]] ;
L = [[a, b, c, _130|_132], [a, b, c], [a, b, c|_100]];
…

Skylab answered 24/11, 2017 at 8:40 Comment(6)

longest_common_prefix([[A],[B],[b]], []), A=a,B=b. gives two identical solutions? – Kile 24/11, 2017 at 12:17

@Kile My implementation introduces redundant dif constraints that I don't see how to avoid. – Skylab 24/11, 2017 at 12:36

Up to not_all_equal_/1 this is a highly Prologish approach! – Kile 24/11, 2017 at 12:38

@Kile I am in the process of writing a question on the implementation of not_all_equal, because it seems like a useful predicate bu a difficult one to implement properly… – Skylab 24/11, 2017 at 12:39

but how would I then join the list of characters together, eg [i,n,t,e] as "inte"? @Kile – Shillelagh 24/11, 2017 at 21:20

Please note that with the Prolog flag set as above, [i,n,t,e] = "inte"! So they are the same. Seem my answer how to get "inte" written as shown above! – Kile 24/11, 2017 at 21:34

This previous answer presented an implementation based on if_/3.

:- use_module(library(reif)).

Here comes a somewhat different take on it:

lists_lcp([], []).
lists_lcp([Es|Ess], Xs) :-
   foldl(list_list_lcp, Ess, Es, Xs).                % foldl/4

list_list_lcp([], _, []).
list_list_lcp([X|Xs], Ys0, Zs0) :-
   if_(list_first_rest_t(Ys0, Y, Ys)                 % if_/3
      , ( Zs0 = [X|Zs], list_list_lcp(Xs, Ys, Zs) )
      ,   Zs0 = []
      ).

list_first_rest_t([], _, _, false).
list_first_rest_t([X|Xs], Y, Xs, T) :-
   =(X, Y, T).                                       % =/3

Almost all queries in my previous answer give the same answers, so I do not show them here.

lists_lcp([X,Y], "abc"), however, does not terminate universally anymore with the new code.

Lioness answered 30/11, 2017 at 22:49 Comment(8)

This list_first_rest_t, can't this be expressed more succinctly? – Kile 30/11, 2017 at 22:51

... like a conjunction of simpler conditions? – Kile 30/11, 2017 at 22:52

Yes, Ys0 = [X|Ys]. OTOH this would make some residual dif/2 goals more complex... worth it? – Lioness 30/11, 2017 at 23:28

for some reason when i run the code it crashes, it says "stack.pl:6: Singleton variables: [Y]" @Lioness – Shillelagh 1/12, 2017 at 13:15

@blazing. Which query are you running? This stack.pl-message does not make much sense to me. Please provide more data showing the problems you encountered. – Lioness 1/12, 2017 at 14:36

the code doesn't compile, I tried to run it on swipl in my terminal but it didn't just gave me that error @Lioness – Shillelagh 1/12, 2017 at 14:44

@blazing. Download and install library(reif). I added a link in my answer. – Lioness 1/12, 2017 at 14:49

@blazing. Be specific about the error you get! "That error" doesn't help me localize where the problem is. – Lioness 1/12, 2017 at 14:51

A simple version:

:- set_prolog_flag(double_quotes, chars).
pref([],_,[]).
pref(_,[],[]).
pref([H|T1],[H|T2],[H|Tr]):-
    pref(T1,T2,Tr).
pref([H|_],[H|_],[]).
pref([H1|_],[H2|_],[]):-
    dif(H1,H2).

lcf([],[]).
lcf([W],R):-
    pref(W,W,R).
lcf([W1,W2|L],R):-
    pref(W1,W2,R),
    lcf([W2|L],R).

Examples:

pref("interview","integrate",R).
R = [i, n, t, e] ;
R = [i, n, t] ;
R = [i, n] ;
R = [i] ;
R = [] ;
False.

lcf(["interview", "interrupt", "integrate", "intermediate"],R).
R = [i, n, t, e]

lcf(["interview", "interrupt", X, "intermediate"],R).
R = X, X = [i, n, t, e, r]

Garate answered 21/7, 2018 at 11:19 Comment(0)

I recently had to implement this for two lists, and this is the code I came up with. It assumes the two input lists are sufficiently instantiated.

longest_common_prefix([X|Xs], [X|Ys], [X|Common]) :- !,
    longest_common_prefix(Xs, Ys, Common).
longest_common_prefix(_, _, []).

This is easily extended to multiple lists:

lcs([], []).
lcs([L1|Ls], Prefix) :-
    foldl(longest_common_prefix, Ls, L1, Prefix).

If you don't like using foldl:

lcs([], []).
lcs([L1|Ls], Prefix) :-
    lcs(Ls, L1, Prefix).

lcs([], Prefix, Prefix).
lcs([L1|Ls], Prefix0, Prefix) :-
    longest_common_prefix(L1, Prefix0, Prefix1),
    lcs(Ls, Prefix1, Prefix).

Cirque answered 29/11, 2018 at 22:42 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Examples

Recommended topics

Hot tags