Why does this very simple grammar cause GLR parsers to choke?
Asked Answered
U

2

5

I've tried several different parser generators (Bison, DParser, etc.) that claim to be able to generate GLR parsers i.e., ones that can handle ambiguous grammars. Here is a very simple ambiguous grammar of the type I'm talking about:

START: A | B;
A: C | D;
B: C | D;
C: T1 | T2;
D: T3 | T4;
T1: 't1';
T2: 't2';
T3: 't3';
T4: 't4';

I can generate the parsers just fine, but I get "unresolved ambiguity" errors or just outright crashes when I give the parser input that should be valid. There are no problems of any kind when I change the grammar to an unambiguous version.

What am I not understanding about GLR parsers? I thought the whole point was that in cases of ambiguity, ALL possible parses are tracked up until they merge or reach a dead end. All I need is a parser that can tell me whether there is ANY valid parse of the input.

Thanks for any help.

edit:

This is frustrating. Using %dprec and %merge I've been able to get Bison to handle ambiguous rules and terminals, but it still chokes on very simple but highly pathological pseudo-English grammars of the kind that I need to handle:

S: NP VP | NPA VP;
NPA: D N | NP PP;
NP: D N | NP PP | NPA;
VP: V NP | VP PP;
PP: P NP;
D: "the" | "a";
P: "in" | "with";
V: "saw";
N: "saw" | "girl" | "boy" | "park" | "telescope";

With input "a boy saw a girl", Bison is unable to parse and returns with code 1. Tom on the other hand, deals with this grammar and this input sentence flawlessly, and even naturally handles unknown terminals by just assigning them to all possible terminal types. But unlike Bison, Tom chokes on large grammars. (By "chokes" I mean fails in various different ways. If the failure modes would be helpful, I can report those.)

Anyone have any other ideas?

Uroscopy answered 20/2, 2014 at 2:21 Comment(0)
G
5

Unfortunately, bison really insists on producing a (single) parse, so you have to specify some way to merge ambiguous parses. If you don't, and there is more than one possible parse, bison's GLR parser will indeed complain that the parse is ambiguous.

If you don't really care which of the multiple parses is accepted, then it's not too difficult to bend bison to your will. The simplest way is to just assign a different %dprec to every possibly ambiguous production. Bison will then select whichever applicable production happens to have the best precedence.

You can even get bison to tell you about multiple parses with a simple %merge function; there is an example in the bison manual. (The documentation of this feature isn't great but it might be adequate to your needs. If not, feel free to ask a more specific question.)

I don't have much experience with DParser, but the manual indicates that its behaviour when faced with multiple possible parses is similar: the default is to complain, but you can provide a trivial merge function of your own: (The quote comes from Section 12, Ambiguities)

Ambiguities are resolved automatically based on priorities and associativities. In addition, when the other resolution techniques fail, user defined ambiguity resolution is possible. The default ambiguity handler produces a fatal error on an unresolved ambiguity. This behavior can be replaced with a user defined resolvers the signature of which is provided in dparse.h.


Here's an example bison GLR grammar for the second example. I left out the lexer, which is really not relevant (and slightly embarrassing because I was rushing).

%{
  int yylex();
  void yyerror(const char* msg);
%}

%error-verbose
%glr-parser

%token WORD_A "a"
%token WORD_BOY "boy"
%token WORD_GIRL "girl"
%token WORD_IN "in"
%token WORD_LIKED "liked"
%token WORD_PARK "park"
%token WORD_SAW "saw"
%token WORD_TELESCOPE "telescope"
%token WORD_THE "the"
%token WORD_WITH "with"

%%
S  : NP VP      {puts("S: NP VP");}     %dprec 1
   | NPA VP     {puts("S: NPA VP");}    %dprec 2
   ;
NPA: D N        {puts("NPA: D N");}     %dprec 3
   | NP PP      {puts("NPA: NP PP");}   %dprec 4
   ;
NP : D N        {puts("NP: D N");}      %dprec 6
   | NP PP      {puts("NP: NP PP");}    %dprec 7
   | NPA        {puts("NP: NPA");}      %dprec 10
   ;
VP : V NP       {puts("VP: V NP ");}    %dprec 11
   | VP PP      {puts("VP: VP PP");}    %dprec 12
   ;
PP : P NP       {puts("PP: P NP");}     %dprec 14
   ;
D  : "the"      {puts("D: the");}       %dprec 15
   | "a"        {puts("D: a");}         %dprec 16
   ;
P  : "in"       {puts("P: in");}        %dprec 17
   | "with"     {puts("P: with");}      %dprec 18
   ;
V  : "liked"    {puts("V: liked");}     %dprec 19
   | "saw"      {puts("V: saw");}       %dprec 20
   ;
N  : "girl"     {puts("N: girl");}      %dprec 21
   | "boy"      {puts("N: boy");}       %dprec 22
   | "park"     {puts("N: park");}      %dprec 23
   | "saw"      {puts("N: saw");}       %dprec 24
   | "telescope"{puts("N: telescope");} %dprec 25
   ;
%%

int main(int argc, char** argv) {
  printf("yyparse returned %d\n", yyparse());
  return 0;
}

Compilation:

$ make ambig2
bison30 -v -d -o ambig2.c ambig2.y 
ambig2.y: warning: 6 shift/reduce conflicts [-Wconflicts-sr]
ambig2.y: warning: 10 reduce/reduce conflicts [-Wconflicts-rr]
gcc-4.8 -ggdb -Wall -D_POSIX_C_SOURCE=200809L -std=c99 -c -o ambig2.o ambig2.c
gcc-4.8   ambig2.o   -o ambig2
rm ambig2.o ambig2.c

Sample parses:

$ ./ambig2 <<<"a boy saw a girl"
D: a
N: boy
NPA: D N
V: saw
D: a
N: girl
NPA: D N
NP: NPA
VP: V NP 
S: NPA VP
yyparse returned 0

$ ./ambig2 <<<"a saw saw the saw in a saw"
D: a
N: saw
NPA: D N
V: saw
D: the
N: saw
NPA: D N
NP: NPA
VP: V NP 
P: in
D: a
N: saw
NPA: D N
NP: NPA
PP: P NP
VP: VP PP
S: NPA VP
yyparse returned 0
Gilliam answered 20/2, 2014 at 3:55 Comment(8)
Thanks for your reply. I will look more closely at telling Bison and DParser how to resolve ambiguities. Hopefully I can say something like: if(ambiguity) { do nothing; } In the meantime, I found this ancient, tiny (~700 lines of pure C) parser called Tom (cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/parsing/…) that so far is handling everything I throw at it perfectly. Maybe all the much fancier guys like Bison are much larger because they are concerned with efficient parsing, which I have no need for.Uroscopy
@user3268289: In both cases, the ambiguity routine is only called if there is an ambiguity, and it can do nothing. So yes, it's pretty simple. While bison is, of course, concerned about efficiency, the main reason for its behaviour is that it is concerned with parsing real languages for the purposes of compilation/interpretation, and those generally require an unambiguous parse. The %dprec solution is a lot of typing but otherwise trivial; just add %dprec N to the end of every production, where N is a unique integer (you could number them sequentially, for example.)Gilliam
Rici, your ideas were good and improved Bison's handling of some nasty grammars. It still doesn't work for some cases though, as in the second example in the original question above.Uroscopy
Rici, sorry I guess you saw a version of the second example while I was editing it. Can you also get the newest version of the second example (with "girl" "boy" "telescope" etc.) to work? Really appreciate your help by the way.Uroscopy
@user3268289: Your second example includes "NP: NP" which (oddly) is not removed by bison's glr parser, and which therefore creates an endless loop. Aside from that, the same transformation seems to work. What input are you having trouble with? By the way, I'd list saw as both an N and a V rather than put it into a separate grammatical category, but maybe I'm missing your point. (Oh, and you have no Vs)Gilliam
Sorry, that's what I get for editing and trying things out at the same time. The second example now should be correct. Bison can't handle it, but Tom can. Don't know why.Uroscopy
@user3268289: Okey-dokey, there you go.Gilliam
Thanks very much Rici, I appreciate your help. It looks like I was having some problems with my tokenizer/lexer, not the grammar as I had thought. It works great in Bison now, although I am rapidly learning that Bison and related tools are not appropriate for NLP, which is what I am really trying to do.Uroscopy
R
2

Your grammar doesn't cause GLR parsers to choke.

You need a GLR parsing engine that delivers what GLR parsers are supposed to deliver: parsing in the face of ambiguities, and handing you the result. Presumably you use additional context to resolve the ambiguities. (You can tangle context-checking into the parsing process if you really insist on avoiding producing context-prevented ambiguities. If you do that, you get the kind of complications that the GCC guys had when they tried to parse C and C++ with LALR).

Here's the output for OP's problem, given to our DMS Software Reengineering Toolkit's GLR parser generator. I had to define a lexer and a grammar compatible for DMS:

Lexer (defining individual tokens as words; a more scalable version might have defined word class tokens such as D P V N):

%%
%%main
#skip "\s+"
#skip "[\u000d\u000a]+"
#token 'the' "the"
#token 'a' "a"
#token 'in' "in"
#token 'with' "with"
#token 'saw' "saw"
#token 'girl' "girl"
#token 'boy' "boy"
#token 'park' "park"
#token 'telescope' "telescope"
%%

Grammar (DMS doesn't bother with EBNF):

S = NP VP ;
S = NPA VP ;
NPA = D N ;
NPA = NP PP ;
NP = D N ;
NP = NP PP ;
NP = NPA ;
VP = V NP ;
VP = VP PP ;
PP = P NP ;
D = 'the' ;
D = 'a';
P = 'in' ;
P = 'with' ;
V = 'saw' ;
N = 'saw' ;
N = 'girl' ;
N = 'boy' ;
N = 'park' ;
N = 'telescope' ;

Sample file "aboysawagirl.txt"

a boy saw a girl\n

From start to finish, building lexer and parser (about 10 minutes of fumbling...)

Parsing the sample file and dumping the automatically built tree:

C:\DMS\Domains\simpenglish\Tools\Parser\Source>run ..\domainparser ++AST ..\..\Lexer\aboysawagirl.txt
simpenglish Domain Parser Version 2.5.15
Copyright (C) 1996-2013 Semantic Designs, Inc; All Rights Reserved; SD Confidential
Powered by DMS (R) Software Reengineering Toolkit
24 tree nodes in tree.
3 ambiguity nodes in tree.
(AMBIGUITY<S=11>@simpenglish=31@#1f35140^0{2} Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
 (S@simpenglish=1@#1f350e0^1#1f35140:1 Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
  (AMBIGUITY<NP=12>@simpenglish=31@#1f34ba0^1#1f350e0:1{2} Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   (NP@simpenglish=5@#1f34b80^1#1f34ba0:1 Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   |(D@simpenglish=12@#1f34aa0^2#1f34b80:1#1f34b40:1 Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   | ('a'@simpenglish=22@#1f349c0^1#1f34aa0:1[Keyword:0] Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt)'a'
   |)D#1f34aa0
   |(N@simpenglish=18@#1f34b20^2#1f34b80:2#1f34b40:2 Line 1 Column 3 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   | ('boy'@simpenglish=27@#1f34a80^1#1f34b20:1[Keyword:0] Line 1 Column 3 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt)'boy'
   |)N#1f34b20
   )NP#1f34b80
   (NP@simpenglish=7@#1f34c60^1#1f34ba0:2 Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   |(NPA@simpenglish=3@#1f34b40^2#1f35040:1#1f34c60:1 Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   | (D@simpenglish=12@#1f34aa0^2... [ALREADY PRINTED] ...)
   | (N@simpenglish=18@#1f34b20^2... [ALREADY PRINTED] ...)
   |)NPA#1f34b40
   )NP#1f34c60
  )AMBIGUITY#1f34ba0
  (VP@simpenglish=8@#1f34fc0^1#1f350e0:2 Line 1 Column 7 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   (V@simpenglish=15@#1f34d60^1#1f34fc0:1 Line 1 Column 7 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   |('saw'@simpenglish=25@#1f34b00^2#1f34d60:1#1f34d40:1[Keyword:0] Line 1 Column 7 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt)'saw'
   )V#1f34d60
   (AMBIGUITY<NP=12>@simpenglish=31@#1f34f00^2#1f34f80:2#1f34fc0:2{2} Line 1 Column 11 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   |(NP@simpenglish=5@#1f34e60^1#1f34f00:1 Line 1 Column 11 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   | (D@simpenglish=12@#1f34da0^2#1f34e60:1#1f34de0:1 Line 1 Column 11 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   |  ('a'@simpenglish=22@#1f34ce0^1#1f34da0:1[Keyword:0] Line 1 Column 11 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt)'a'
   | )D#1f34da0
   | (N@simpenglish=17@#1f34dc0^2#1f34e60:2#1f34de0:2 Line 1 Column 13 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   |  ('girl'@simpenglish=26@#1f34d80^1#1f34dc0:1[Keyword:0] Line 1 Column 13 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt)'girl'
   | )N#1f34dc0
   |)NP#1f34e60
   |(NP@simpenglish=7@#1f34f20^1#1f34f00:2 Line 1 Column 11 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   | (NPA@simpenglish=3@#1f34de0^1#1f34f20:1 Line 1 Column 11 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   |  (D@simpenglish=12@#1f34da0^2... [ALREADY PRINTED] ...)
   |  (N@simpenglish=17@#1f34dc0^2... [ALREADY PRINTED] ...)
   | )NPA#1f34de0
   |)NP#1f34f20
   )AMBIGUITY#1f34f00
  )VP#1f34fc0
 )S#1f350e0
 (S@simpenglish=2@#1f35040^1#1f35140:2 Line 1 Column 1 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
  (NPA@simpenglish=3@#1f34b40^2... [ALREADY PRINTED] ...)
  (VP@simpenglish=8@#1f34f80^1#1f35040:2 Line 1 Column 7 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   (V@simpenglish=15@#1f34d40^1#1f34f80:1 Line 1 Column 7 File C:/DMS/Domains/simpenglish/Tools/Lexer/aboysawagirl.txt
   |('saw'@simpenglish=25@#1f34b00^2... [ALREADY PRINTED] ...)
   )V#1f34d40
   (AMBIGUITY<NP=12>@simpenglish=31@#1f34f00^2... [ALREADY PRINTED] ...)
  )VP#1f34f80
 )S#1f35040
)AMBIGUITY#1f35140

Your simple english grammar parser parses your example sentence in different ways.

This is a lot more spectacular with a full C++11 grammar.

Reddick answered 20/2, 2014 at 23:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.