Taking an example from the Introduction to Latin Wikiversity, consider the sentence:
the sailor gives the girl money
We can handle this in Prolog with a DCG fairly elegantly with this pile of rules:
sentence(s(NP, VP)) --> noun_phrase(NP), verb_phrase(VP).
noun_phrase(Noun) --> det, noun(Noun).
noun_phrase(Noun) --> noun(Noun).
verb_phrase(vp(Verb, DO, IO)) --> verb(Verb), noun_phrase(IO), noun_phrase(DO).
det --> [the].
noun(X) --> [X], { member(X, [sailor, girl, money]) }.
verb(gives) --> [gives].
And we see that this works:
?- phrase(sentence(S), [the,sailor,gives,the,girl,money]).
S = s(sailor, vp(gives, money, girl)) ;
It seems to me that the DCG is really optimized for handling word-order languages. I'm at a complete loss as to how to handle this Latin sentence:
nauta dat pecuniam puellae
This means the same thing (the sailor gives the girl money), but the word order is completely free: all of these permutations also mean exactly the same thing:
nauta dat puellae pecuniam
nauta puellae pecuniam dat
puellae pecuniam dat nauta
puellae pecuniam nauta dat
dat pecuniam nauta puellae
The first thing that occurs to me is to enumerate the permutations:
sentence(s(NP, VP)) --> noun_phrase(NP), verb_phrase(VP).
sentence(s(NP, VP)) --> verb_phrase(VP), noun_phrase(NP).
but this won't do, because while nauta
belongs to the subject noun phrase, puellae
which belongs to the object noun phrase is subordinate to the verb, but can precede it. I wonder if I should approach it by building some kind of attributed list first like so:
?- attributed([nauta,dat,pecuniam,puellae], Attributed)
Attributed = [noun(nauta,nom), verb(do,3,s), noun(pecunia,acc), noun(puella,dat)]
This seems like it will turn out to be necessary (and I don't see a good way to do it), but grammatically it's pushing food around on my plate. Maybe I could write a parser with some kind of horrifying non-DCG contraption like this:
parse(s(NounPhrase, VerbPhrase), Attributed) :-
parse(subject_noun_phrase(NounPhrase, Attributed)),
parse(verb_phrase(VerbPhrase, Attributed)).
parse(subject_noun_phrase(Noun), Attributed) :-
member(noun(Noun,nom), Attributed).
parse(object_noun_phrase(Noun), Attributed) :-
member(noun(Noun,acc), Attributed)
This seems like it would work, but only as long as I have no recursion; as soon as I introduce a subordinate clause I'm going to reuse subjects in an unhealthy way.
I just don't see how to get from a non-word-order sentence to a parse tree. Is there a book that discusses this? Thanks.