Listing as facts
Lets try to explain this with a counter example. Let's specify the nouns, verbs, etc. with simple facts:
det(the).
det(a).
n(woman).
n(man).
v(shoots).
Now we can implement a noun phrase np
as:
np([X,Y]) :-
det(X),
n(Y).
In other words we say "a noun phrase is a sentence with two words the first being a det
, the second a n
". And this will work: if we query np([a,woman])
, it will succeed etc.
But now we need to do something more advance, define the verb phrase. There are two possible verb phrases: the one with a verb and a noun phrase that was originally defined as:
vp(X,Z):- v(X,Y),np(Y,Z).
We could define it as:
vp([X|Y]) :-
v(X),
np(Y).
And the one with only a single verb:
vp(X,Z):- v(X,Z).
That can be converted into:
vp([X]) :-
v(X).
The guessing problem
The problem however is that both variants have a different number of words: there are verb phrases with one word and with three words. That's not really a problem, but now say - I know this is not correct English - there exists a sentence defined as vp
followed by np
, so this would be:
s(X,Z):- vp(X,Y), np(Y,Z).
in the original grammar.
The problem is that if we want to transform this into our new way of representing it, we need to know how much vp
will consume (how much words will be eaten by vp
). We cannot know this in advance: since at this point we don't know much about the sentence, we cannot make a guess whether vp
will eat one or three words.
We might of course guess the number of words with:
s([X|Y]) :-
vp([X]),
np(Y).
s([X,Y,Z|T]) :-
vp([X,Y,Z]),
np(Z).
But I hope you can imagine that if you would define verb phrases with 1, 3, 5 and 7 words, things will get problematic. Another way to solve this, is leave this to Prolog:
s(S) :-
append(VP,NP,S),
vp(VP),
np(NP).
Now Prolog will guess how to subdivide the sentence into two parts first and then try to match every part. But the problem is that for a sentence with n words, there are n breakpoints.
So Prolog will for instance first split it as:
VP=[],NP=[shoots,the,man,the,woman]
(remember we swapped the order of the verb phrase and noun phrase). Evidently vp
will not be very happy if it gets an empty string. So that will be rejected easily. But next it says:
VP=[shoots],NP=[the,man,the,woman]
Now vp
is happy with only shoots
, but it will require some computational effort to realize that. np
is however not amused with this long part. So Prolog backtracks again:
VP=[shoots,the],NP=[man,the,woman]
now vp
will complain again that it has been given too much words. Finally Prolog will split it correctly with:
VP=[shoots,the,woman],NP=[the,woman]
The point is that it requires a large number of guesses. And for each of these guesses vp
and np
will require work as well. For a real complicated grammar, vp
and np
might split the sentence further resulting in a huge amount of try-and-error.
The true reason is the append/3
has no "semantical" clue how to split the sentence, so it tries all possibilities. One is however more interested in an approach where vp
could provide information about what share of the sentence it really wants.
Furthermore if you have to split the sentence into 3 parts, the number of ways to do this even increases to O(n^2) and so on. So guessing will not do the trick.
You might also try to generate a random verb phrase, and then hope the verb phrase matches:
s(S) :-
vp(VP),
append(VP,NP,S),
np(NP).
But in that case the number of guessed verb phrases will blow up exponentially. Of course you can give "hints" etc. to speed up the process, but it will still take some time.
The solution
What you want to do is to provide the part of the sentence to every predicate such that a predicate looks like:
predicate(Subsentence,Remaining)
Subsentence
is a list of words that start for that predicate. For instance for a noun phrase, it might look like [the,woman,shoots,the,man]
. Every predicate consumes the words it is interested in: the words up to a certain point. In this case, the noun-phrase is only interested in ['the','woman']
, because that is a noun-phrase. To do the remaining parsing, it returns the remaining part [shoots,the,woman]
, in the hope that some other predicate can consume the remainder of the sentence.
For our table of facts that is easy:
det([the|W],W).
det([a|W],W).
n([woman|W],W).
n([man|W],W).
v([shoots|W],W).
It thus means that if you query a setence: [the,woman,shoots,...]
, and you ask the det/2
whether that's a determiner, it will say: "yes, the
is a determiner, but the remaining part [woman,shoots,...]
" is not part of the determiner, please match it with something else.
This matching is done because a list is represented as a linked list. [the,woman,shoots,...]
, is actually represented as [the|[woman|[shoots|...]]]
(so it points to the next "sublist"). If you match:
[the|[woman|[shoots|...]]]
det([the|W] ,W)
It will unify [woman|[shoots|...]]
with W
and thus result in:
det([the|[woman|[shoots|...]],[woman|[shoots|...]]).
Thus returning the remaining list, it has thus consumed the the
part.
Higher level predicates
Now in case we define the noun phrase:
np(X,Z):- det(X,Y), n(Y,Z).
And we again call with [the,woman,shoots,...]
, it will query unifying X
with that list. It will first call det
that will consume the
, without any need for backtracking. Next Y
is equal to [woman,shoots,...]
, now the n/2
will consume woman and return [shoots,...]
. This is also the result the np
will return, and another predicate will have to consume this.
Usefulness
Say you introduce your name as an additional noun:
n([doug,smith|W],W).
(sorry for using small cases, but otherwise Prolog sees these as variables).
It will simply try to match the first two words with doug
and smith
, and if that succeeds, try to match the remaining of the sentence. So one can make a setence like: [the,doug,smith,shoots,the,woman]
(sorry about that, furthermore in English, some noun phrases map to a noun directly np(X,Y) :- n(X,Y)
so the the
can be removed for a more complex English grammar).
Guessing completely eliminated?
Is the guessing completely eliminated? No. It is still possible that there is overlap in consuming. For instance you might add:
n([doug,smith|W],W).
n([doug|W],W).
In that case if you query for [the,doug,smith,shoots,the,woman]
. It will first consume/eat the in det
, next it will look for a noun to consume from [doug,smith,...]
. There are two candidates. Prolog will first try to eat only doug
, and match [smith,shoots,...]
as an entire verb phrase, but since smith
is not a verb, it will backtrack, reconsider eating only one word and thus decinde to eat both doug
and smith
as a noun instead.
The point is that when using append, Prolog would have to backtrack as well.
Conclusion
By using difference lists, you can eat an arbitrary amount of words. The remainder is returned such that other sentence parts like the verb phrase aim to consume the remainder. The list is furthermore always fully grounded, so one definitely does not use brute force to generate all kinds of verb phrases first.
s([a,woman,shoots,a,man], X)?
– Kimberliekimberlins(Sentence, []).
at a Prolog prompt (ors(Sentence, [])?
in the visualizer). – Kimberliekimberlin