Prolog - DCG parser with input from file
Asked Answered
C

4

12

As part of a project I need to write a parser that can read a file and parse into facts I can use in my program.

The file structure looks as follows:

property = { el1 , el2 , ... }.  

What I want in the end is:

property(el1).
property(el2).
...

I read my file like this:

main :-
       open('myFile.txt', read, Str),
       read_file(Str,Lines),
       close(Str),
       write(Lines), nl.

read_file(Stream,[]) :-
                       at_end_of_stream(Stream).

read_file(Stream,[X|L]) :-
                          \+ at_end_of_stream(Stream),
                          read(Stream,X),
                          parse(X),            % Here I call upon my parser.
                          read_file(Stream,L).

Now I have read in several books and online about DCG, but they all explain the same simple examples where you can generate sentences like "the cat eats the bat" etc... When I want to use it for the above example I fail miserably.

What I did manage was "parsing" the underneath line:

property = el1.

to

property(el1).

with this:

parse(X) :-
           X =.. List,    % Reason I do this is because X is one atom and not a list.
           phrase(sentence(Statement), List),
           asserta(Statement).

sentence(Statement) --> ['=', Gender, Person] , { Statement =.. [Gender, Person] }.

I don't even know if I'm using the dcg in a correct way here, so any on help on this would be appreciated. Now the problem I having is, how to do this with multiple elements in my list, and how to handle '{' and '}'.
What I really want is a dcg that can handle these types of sentences (with more than 2 elements): Sentence split in parts

Now I know many people around here refer to the libraries dcg_basics and pio when it comes to dcgs. However, I have an additional problem that when I try to use the library I receive the error:

ERROR: (c:/users/ldevriendt/documents/prolog/file3.pl:3):
      Type error: `text' expected, found `http/dcg_basics'
Warning: (c:/users/ldevriendt/documents/prolog/file3.pl:3):
      Goal (directive) failed: user:[library(http/dcg_basics)]

when I do this:

:- [library(http/dcg_basics)].

Additional info:

Any help on this would be appreciated!

EDIT: The aim of this is question is to learn more about DCG and its use in parsers.

Cotquean answered 24/12, 2012 at 16:50 Comment(8)
try to replace with :- use_module(library(dcg/basics)).Arraign
but I think your SWI-Prolog could be too oldArraign
The version installed on my laptop appears to be an older version of SWI-Prolog indeed. I installed everything on my desktop and the use of libraries now works fine. Thank you!Cotquean
Thank you, Floris, for the question +1. I think could be kind if you could accept my answer, if useful.Arraign
As said before, I'd first like to try out a DCG parser if that is possible. I'm trying to use elements of my course (the DCG) to get this working. While your solution to the other question works fine for that example, it will not work in other cases where the input is different. For example, sometimes the preferences will be given as: m1: w1 > {w2 , w3} > w4 (not necessarily in that order). Also the list can be incomplete and also needs to be handled. I'm thinking I'll have more possibilities when using the DCG.Cotquean
Hi Floris, see my edit...Arraign
Did you finally go for a DCG? (fellow student)Ceil
In the end I went for a DCG, although my implementation is kinda complicated. I'm sure mine can be written easier. Both answers below helped me in finally achieving it. I've heard from other students, that they used a different way which in code looks a lot easier to read. That way is partly the same as in CapelliC's linked answer (the one he refers too).Cotquean
A
13

as long as your file is in plain Prolog syntax, you're advised to use Prolog term IO. Fully structured terms are read with a single call. Using a DCG its' way more complicate, and a bit less efficient (not sure here, should measure, but read(Term) invokes a Prolog parser implemented in C...) See this other question, that uses the very same format (at least, you could check if some other guy got an answer here on SO about your same assignment...)

edit after comments...

You're right that DCG are the right way to handle general parse in Prolog. Arguments in DCG productions can be seen as semantic attributes, thus programming DCG can be seen as providing a working semantic analysis on the input (see Attribute Grammar, an important technique -also- in language engineering).

And indeed the presented examples can perfectly well be solved without the hacks required with term IO.

Here it is:

:- use_module(library(pio)).  % autoload(ed), added just for easy browsing
:- use_module(library(dcg/basics)).

property(P) -->
    b, "my props", b, "=", b, "{", elS(Es) , b, "}", b,
    { P =.. [property|Es] }.

elS([E|Es]) --> el(E), b, ("," -> elS(Es) ; {Es = []}).
el(N) --> number(N).
el(S) --> csym(S). % after Jeremy Knees comment...
b --> blanks.

%   parse a C symbol
csym(S) -->
    [F], { code_type(F, csymf) },
    csym1(Cs),
    !, { atom_codes(S, [F|Cs]) }.

csym1([C|Cs]) -->
    [C], { code_type(C, csym) },
    csym1(Cs).
csym1([]) --> [].

with that, we have

?- phrase(property(P), "my props = {1,2,3}").
P = property(1, 2, 3).

Thanks to library(pureio) we can apply semantic programming to Prolog streams, and be rewarded of the same behaviour of phrase/2.

more

This other answer show a practical way to implement an expression calculator with operator resolution, and lazy evaluation.

Arraign answered 24/12, 2012 at 18:1 Comment(2)
Ah, appearantly someone did ask the same question and he might very well be someone from the same class (although I don't know him). He doesn't seem to have the idea to use a DCG (which is something I assumed we had to use). Ideally I wanted it in a DCG, but if I can't figure it out, I'll try your solution.Cotquean
here the assumption is made that you always have a list of numbers {1,2,3}. But what if you want to read it as atoms? For example {el1, el2} ?Ceil
G
1

Well, the purpose of a homework question is to learn. Doing it with a DCG will teach you a more generally useful skill than horsing operators about.

I think your issues are less with DCG's inherently than with string handling.

You have a bunch of places where you use univ (the =.. operator) to convert between lists and strings. Univ probably is NOT what you want here. Univ unifies a term with a list.

foo(bar, baz)  =..  [foo, bar, baz]

What you need to understand is that a string in Prolog can be in several different forms The string 'hi Flores' could be

'hi Flores' - this is an atom - a 'solid chunk' of thing. The single quotes aren't needed for some character sequences (see your book), so hi_flores is a perfectly good atom without single quotes.

[104,105,32,70,108,111,114,101,115] - a list of ASCII codes.  This is likely what you want. These can be written with double quotes, "hi Floris"  in prolog code.

To save your sanity, put

:- portray_text(true).  

in your file so it prints out "hi Floris" in debug, not a bunch of numbers.

There's also a list of one character atoms

[h, i, ' ', 'F', l, o, r, i, s]

but you probably don't want those.

You might find the SICSTUS compatability pred read_line useful.

Now, in a DCG, you sometimes want to match 'literals' - literally that thing. If so, put that in a list. Here's a DCG for if statements in some vaguely VBish language

if_statement  --> "if", wh, "(", condition, ")", wh, 
                  "then", wh, body, wh, "else", wh,
                  else_body, wh, "endif".

% whitespace
wh -->  [].
wh -->  " ", wh.
wh --> [10], wh.   % handle newline and cr
wh --> [12], wh.

the wh's everywhere are optional whitespace.

Now, for overall strategy, you can either read in one line at a time, or read in the whole file. For one line, use read_line, which returns a list of codes. read_file_to_codes will get the whole file.

If you use the whole file strategy, and newlines are significant, you'll need to remove them from the definition of whitespace, obviously.

And, of course, all this leads to the question why questions about this problem are flooding SO instead of the instructor's in box.

Gymnosophist answered 26/12, 2012 at 4:45 Comment(0)
J
1

I parse the string into a list and then manipulate the list. Using DCG you can convert

T = (saf>{saf, as13s}>a32s>asf).

to

S = [saf-0, saf-1, as13s-1, a32s-2, asf-3] .

Note to do:

1. parseLine(<<Yourpattern>>,Position) --> parseLine(L,Position), parseLine(R,NewPosition)
2. parseLine(Item,Pos) --> [Item-Pos].

Here you have 2 patterns to handle those are the (L>R) and the {L,R}. That won't be much complicated and really easy to read.

Jacie answered 4/1, 2013 at 7:58 Comment(0)
V
0

IMHO, DCG grammar rules are quite ugly at tokenizing, I really thing DCG should have never been even proposed for that task; the real deal with DCG is to parse the tokens, because prolog uses symbolics, so I may say that, the best option is to create a foreign call to a, say C tokenizer which will unify with the plain list of tokens and then let DCG do what it has been though for. This way the implementation is cleaner and you don't have to worry about parsing cr, blanks...

Say you have an hypothetical language which has an statement that looks as follows:

object:
       object in a yields b,
       object in b yields C.

I don't want to even imagine in tokenizing this in DCG; I am too lazy to learn how to do so with a tool which have not being designed for such a task. Better would be to have a foreign call to a predicate that will provide me with the plain list of tokens.

 tokenize(A,ListOfTokens), phrase(yourDGCstartRule(Information), ListOfTokens, _).

The list for our running example will look simply as:

ListOfTokens = [object,:,object,in,a,yields,b,',',object,in,b,yields,c].

I think this is way much more elegant and your rules maps accordingly. I could be wrong in my thoughs but at the end it's a matter of taste, and to mine, DCG is not a tokenizer and I would never use it for that unless is strictly required. Admitedly I can spot some applications where it would make sense to use it also as tokenizer but still I think the tasks should be separated.

Please notice that I am NOT saying that prolog doesn't have good facilities, you could always do tokenizing in prolog but you should separate the tasks and let DCG deal only with symbols and some other stricly needed characters or strings (as Uppercase strings, like proper names or other characters).

Finally it seems to me that people might has forgotten that tokenizing and parsing are two separated tasks; more in prolog, since tokens are symbols which is what prolog is good at, and parsing tokens/symbols (not characters) what DCG does better, for as embeeded semantics interfaces prolog which is the desirable scenario.

Violet answered 28/4, 2013 at 9:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.