Is adaptive parsing possible in Prolog?
Asked Answered
T

2

8

I am attempting to write an adaptive parser in Prolog: in other words, a parser that can modify its own parsing rules at runtime.

In order to do this, I'll need to generate new predicates at runtime, but I'm not sure if this is possible. Would it be possible to write a predicate that takes a list like this one:

generate_dcg_rule([A," is greater than ",B]).

...and then generates a new predicate like this one?

expr(A," is greater than ",B) -->
    symbol(A)," is greater than ",symbol(B).
Tude answered 24/4, 2016 at 0:6 Comment(0)
H
6

Yes, that's easily possible.

Prolog is a very dynamic language, and you can assert arbitrary clauses at runtime using for example assertz/1.

You can expand DCG rules to ordinary Prolog rules using your Prolog's term expansion mechanism that is usually used to do this.

For example, using expand_term/2:

?- expand_term((expr(A," is greater than ",B) -->
    symbol(A)," is greater than ",symbol(B)), Clause).
Clause =  (expr(A, [' ', i, s, ' ', g, r, e|...], B, _G242, _G253):-symbol(A, _G242, _G264), _G264=[' ', i, s, ' ', g|...], symbol(B, _G275, _G253)).

You can assert such clauses with assertz/1:

?- expand_term((Head --> Body), Clause), assertz(Clause).

Note that I am using double_quotes set to chars in this example, i.e., use:

:- set_prolog_flag(double_quotes, chars).

in your source files. This is a good idea in any case.

Note also that I am assuming that you have already found a way to translate the lists you give in your generate_dcg_rule/1 example to actual DCGs. For this translation, I rather recommend a predicate like list_dcg/2 that declaratively describes the relation between such lists and DCG rules. The advantage is clear: You can for example test such relations interactively and with test cases etc. For your concrete example, one clause that defines this relation could be similar to:

list_dcg([A,Ls,B], DCG) :-
        Ls = "is greater than ",
        DCG = (expr(A, Ls, B) --> symbol(A), Ls, symbol(B)).

I leave generalizing this to your other use cases as an exercise. In total, one way to assert such clauses dynamically is thus:

?- list_dcg(List, DCG), expand_term(DCG, Clause), assertz(Clause).

Note how we benefit from Prolog's homoiconic nature in such examples: Prolog rules and DCG rules have a natural representation as Prolog terms, and we can simply write them down and reason about them like any other terms also within goals.

Haden answered 24/4, 2016 at 1:7 Comment(2)
With set_prolog_flag/2 being both a directive and a built-in predicate, why use the initialization/1 directive, which would delay setting the flag to after the source file containing it is compiled and loaded?Unideaed
Very good point, thank you! I've changed this to use the directive right away.Haden
T
2

I have written an adaptive parser that converts English phrases into mathematical expressions. You can easily extend it with your own translation rules, so it can be used to create extensible natural language user interfaces.

This is one possible input:

(A squared) times (b squared) equals c to the power of 3 times the product of 5 and 6

and this is its output:

(A * A) * (b * b) = c ^ 3 * 5 * 6

The program's implementation is shown here:

:- initialization(main).
:- set_prolog_flag(double_quotes, chars).  % This is for SWI 7+ to revert to the prior interpretation of quoted strings.

%This is an adaptive parser for SWI-Prolog.

main :-
    %Type any kind of input here to see the output! The input must be compatible with the grammar that is defined below.        
    Input = "(A squared) times (b squared) equals c to the power of 3 times the product of 5 and 6",

    iterated_translate(Input,Output), writeln(Input), writeln(Output), writeln('\n'), writeln('\n').

%The grammar is defined here. The variables must be uppercase letters.
%The input in each translation rule is followed by its output.
theList(TheList) :-
            %You can easily extend this parser by adding more rules to this list.
            TheList = 
            [['A to the power of B',
                 'A ^ B'],
            %The next transformation is the final output of 'A to the power of B'.
            ['A ^ B',
                'A ^ B'],
            ['A * B',
                'A * B'],
            ['the product of A and B',
                'A times B'],
            ['A squared',
                 'the product of A and A'],
            ['A times B',
                'A * B'],
            ['A = B',
                'A = B'],
            ['A equals B', 'A = B']].
%This is the end of the grammar. The rest of the translator is implemented below.

output_expr(Lang,[Input,[A,B]]) -->
        {
        theList(TheList),
        list_to_output__(TheList,TheList1,[A,B]),
        member([Input,Output],
            TheList1)
        },
        input_to_output_(Lang,Input,Output).

atom_is_upper(N) :-
    atom_chars(N, [L]),
    char_type(L, upper).

atom_is_var(Names_to_vars,Atom,Var) :-
    atom(Atom),atom_is_upper(Atom),member([Atom,Var],Names_to_vars).

list_to_output__([],[],_) :- true.
list_to_output__([Start1|Rest1],[Start2|Rest2],Vars) :-
    list_to_output_(Start1,Start2,Vars),list_to_output__(Rest1,Rest2,Vars).

list_to_output_([A1_,B1_],[A2,B2],Vars) :- atomic_list_concat(A1,' ',A1_),atomic_list_concat(B1,' ',B1_),list_to_output(A1,A2,Vars),list_to_output(B1,B2,Vars).

list_to_output([],[],_) :- true.
list_to_output([Start1|Rest1],[Start2|Rest2],[A,B]) :-
    (Start1='A'->Start2=A;Start1='B'-> Start2=B;Start1=Start2),list_to_output(Rest1,Rest2,[A,B]).

list_to_grammar_(Lang,Start,Rest) -->
    {(Start = [A])->(Rest = []->Start1 = expr(Lang,A);Start1 = parentheses_expr(Lang,A));atom_chars(Start,Start1)},Start1.

list_to_grammar(Lang,[Start]) -->
    list_to_grammar_(Lang,Start,[]).

list_to_grammar(Lang,[Start|Rest]) -->
    list_to_grammar_(Lang,Start,Rest),ws_,list_to_grammar(Lang,Rest).

a_number([A,B]) -->
    (a__number(A), ".", a__number(B)).

a_number(A) -->
    a__number(A).

a__number([L|Ls]) --> digit(L), a__number_r(Ls).
a__number_r([L|Ls]) --> digit(L), a__number_r(Ls).
a__number_r([])     --> [].
digit(Let)     --> [Let], { code_type(Let, digit) }.

symbol([L|Ls]) --> letter(L), symbol_r(Ls).
symbol_r([L|Ls]) --> letter(L), symbol_r(Ls).
symbol_r([])     --> [].
letter(Let)     --> [Let], { code_type(Let, alpha) }.

ws --> "";((" ";"\n"),ws).
ws_ --> (" ";"\n"),ws.

input_to_output(Lang,A,B) -->
    {Lang = input} ->
        A;
    {Lang=output} ->
        B.

input_to_output_(Lang,A,B) -->
    {A_=list_to_grammar(Lang,A),B_=list_to_grammar(Lang,B)},input_to_output(Lang,A_,B_).

parentheses_expr(Lang,["(",A,")"]) -->
    ("(",(expr(Lang,A)),")").

parentheses_expr(_,symbol(A)) -->
    symbol(A).

parentheses_expr(_,a_number(A)) -->
    a_number(A).

expr(Lang,A) -->
    parentheses_expr(Lang,A);output_expr(Lang,A).

translate(Input1,Output1) :-
    phrase(output_expr(input,Ls),Input1),
    phrase(output_expr(output,Ls),Output1).

iterated_translate(Input1, Output2) :-
    %Keep translating until the input is the same as the output.
    translate(Input1,Output1),
    (Input1=Output1, Output1 = Output2;iterated_translate(Output1,Output2)).

Based on this example, I wrote another adaptive grammar system with a natural language user interface.

Tude answered 11/6, 2016 at 5:15 Comment(4)
Oh yes that's nice! In input_to_output//3, you can pull the unification into the DCG head, using two clauses like input_to_output(input, A, _) --> A. and input_to_output(output, _, B) --> B. Advantage: More general (can be used also if first argument is uninstantiated)! In list_to_output/3, you can simply use facts instead of Head :- true., i.e., simply write Head.. Moreover, it seems you can benefit from maplist/3 if you reorder the arguments to place Vars first. You can then say maplist(list_to_output(Vars), Ls0, Ls) and shorten the code. Thank you for sharing this!Haden
@Haden I've been trying to make this parser compatible with left-recursive grammars also. Would this be possible in ISO Prolog?Tude
Yes, that's of course possible. One way to do this is to use a technique called left-factorization. It's a nice exercise to rewrite an existing grammar in this way. A different approach is to pass around a list of tokens that can at most be consumed, and let each DCG rule state in advance the number of tokens it will itself eventually consume. Using a simple fixpoint algorithm, you can then pull these numbers "in front" of each rule, and check in advance if there are enough tokens for the rule to be applicable. You're definitely on the right track, it's very nice to do this all in Prolog!Haden
@Haden Also, there is an implementation of an Earley parser in Prolog that could be used for this purpose.Tude

© 2022 - 2024 — McMap. All rights reserved.