Check if string is substring in Prolog
Asked Answered
D

4

7

Is there a way to check if a string is a substring of another string in Prolog? I tried converting the string to a list of chars and subsequently checking if the first set is a subset of the second that that doesn't seem to be restrictive enough. This is my current code:

isSubstring(X,Y):-
        stringToLower(X,XLower),
        stringToLower(Y,YLower),
        isSubset(XLower,YLower).

isSubset([],_).
isSubset([H|T],Y):-
        member(H,Y),
        select(H,Y,Z),
        isSubset(T,Z).

stringToLower([],[]).
stringToLower([Char1|Rest1],[Char2|Rest2]):-
        char_type(Char2,to_lower(Char1)),
        stringToLower(Rest1,Rest2).

If I test this with

isSubstring("test","tesZting").

it returns yes, but should return no.

Dow answered 27/11, 2013 at 18:40 Comment(4)
A sublist algorithm should satisfy - can you give an example of your char list subset algorithm and the input that makes it fail?Dashed
I just edited the question and added the code and an example.Dow
I am not sure, but what you want is rather a subsequence. See en.wikipedia.org/wiki/SubstringMarston
I mean a string as in Java.Dow
L
1

Prolog strings are lists, where each element of the list is the integer value representing the codepoint of the character in question. The string "abc" is exactly equivalent to the list [97,98,99] (assuming your prolog implementation is using Unicode or ASCII, otherwise the values might differ). That leads to this (probably suboptimal from a Big-O perspective) solution, which basically says that X is a substring of S if

  • S has a suffix T such that, and
  • X is a prefix of T

Here's the code:

substring(X,S) :-
  append(_,T,S) ,
  append(X,_,T) ,
  X \= []
  .

We restrict X to being something other than the empty list (aka the nil string ""), since one could conceptually find an awful lot of zero-length substrings in any string: a string of length n has 2+(n-1) nil substrings, one between each character in the string, one preceding the first character and one following the last character.

Looksee answered 27/11, 2013 at 22:39 Comment(0)
M
6

It is not clear what you mean by a string. But since you say you are converting it to a list, you could mean atoms. ISO Prolog offers atom_concat/3 and sub_atom/5 for this purpose.

?- atom_concat(X,Y,'abc').
   X = '', Y = abc
;  X = a, Y = bc
;  X = ab, Y = c
;  X = abc, Y = ''.

?- sub_atom('abcbcbe',Before,Length,After,'bcb').
   Before = 1, Length = 3, After = 3
;  Before = 3, Length = 3, After = 1.

Otherwise, use DCGs! Here's how

seq([]) --> [].
seq([E|Es]) --> [E], seq(Es).

... --> [] | [_], ... .

subseq([]) --> [].
subseq(Es) --> [_], subseq(Es).
subseq([E|Es]) --> [E], subseq(Es).

seq_substring(S, Sub) :-
   phrase((...,seq(Sub),...),S).

seq_subseq(S, Sub) :-
   phrase(subseq(Sub),S).

Acknowledgements

The first appearance of above definition of ... is on p. 205, Note 1 of

David B. Searls, Investigating the Linguistics of DNA with Definite Clause Grammars. NACLP 1989, Volume 1.

Marston answered 27/11, 2013 at 19:43 Comment(0)
L
1

Prolog strings are lists, where each element of the list is the integer value representing the codepoint of the character in question. The string "abc" is exactly equivalent to the list [97,98,99] (assuming your prolog implementation is using Unicode or ASCII, otherwise the values might differ). That leads to this (probably suboptimal from a Big-O perspective) solution, which basically says that X is a substring of S if

  • S has a suffix T such that, and
  • X is a prefix of T

Here's the code:

substring(X,S) :-
  append(_,T,S) ,
  append(X,_,T) ,
  X \= []
  .

We restrict X to being something other than the empty list (aka the nil string ""), since one could conceptually find an awful lot of zero-length substrings in any string: a string of length n has 2+(n-1) nil substrings, one between each character in the string, one preceding the first character and one following the last character.

Looksee answered 27/11, 2013 at 22:39 Comment(0)
S
1

The problem is with your isSubset/2.
There are two distinct situations that you've tried to capture in one predicate. Either you're looking for the first position to try to match your substring, or you've already found that point and are checking whether the strings 'line up'.

isSubset([], _).
isSubSet(Substring, String) :-
    findStart(Substring, String, RestString),
    line_up(Substring, RestString).

findStart([], String, String).
findStart([H|T], [H|T1], [H|T1]).
findStart(Substring, [_|T], RestString) :-
    findStart(Substring, T, RestString).

line_up([], _).
line_up([H|T], [H|T1]) :-
    line_up(T, T1).

You can combine these into one predicate, as follows:

isSublist([], L, L).
isSublist([H|T], [H|T1], [H|T1]) :-
    isSublist(T, T1, T1).
isSublist(L, [_|T], Rest) :-
    isSublist(L, T, Rest).
Surtout answered 28/11, 2013 at 7:52 Comment(0)
J
1

Using DCG's you can do the following: (SWI)

%                   anything  substring anything
substr(String) --> ([_|_];[]), String,  ([_|_];[]).

% is X a substring of Y ?
substring(X,Y) :- phrase(substr(X),Y).
Juieta answered 22/6, 2014 at 16:19 Comment(2)
This is rejected due to [_|_] in B, GNU, SICStus, Ciao, and the current DCG draft. It seems only SWI and YAP permit this.Marston
Ok, I'll add that note. Thanks @false.Juieta

© 2022 - 2024 — McMap. All rights reserved.