What do flex and bison need from each other?
Asked Answered
F

1

2

When using flex and bison together,

  • whyc does a flex file need to #include a C header created by bison?

  • compilation requires both the C source files created by bison and by flex. What do the C source files created by bison and by flex require from each other?

Frottage answered 7/2, 2019 at 22:45 Comment(1)
flex need the definition of yylval from y.tab.h. bison just needs extern int yylex();. I can't make any sense of your second point, but you have to compile all your source code to be able to link it.Foe
F
5

The most important thing in the bison-generated header are the enum values used to identify the token types (which are the values returned to the parser by lexical actions).

The header also declares the YYSTYPE semantic type, and the variable yylval (which has that type), used to communicate the semantic value of each token to the parser. (At least, for tokens which have semantic values.) Similarly, if the parser uses location information, the header defines the YYLTYPE location type, and the variable yylloc of that type.

Since header dependencies cannot be circular, the parser does not have any header dependency on the scanner. It is for this reason that your bison input file must include a declaration of yylex.

That's fine for the classical interface between parser and scanner, which use global variables to communicate. And it also more or less works with a re-entrant ("pure") parser, in which the semantic value (and location, if used) are communicated through arguments to yylex, although the fact that the declaration of yylex is not automatic is more annoying.

Where it starts to break down is when the scanner is also re-entrant. In that case, the parser must call the scanner with an opaque scanner context object of type yyscan_t. yyscan_t "belongs to" the scanner, so it can only be defined in a header for the scanner. But as noted above, that would lead to a circular dependency chain. This reveals the weakness in the traditional model: the parser is a client of the scanner, so the fact that the scanner depends on the parser to define essential datastructures is a dependency inversion.

This is a very real problem because the public interface for the re-entrant scanner includes functions whose prototypes require parser-specific datatypes (YYSTYPE and YYLTYPE), while the parser prototype will almost certainly need to accept a scanner context object as an argument, so it cannot be declared without the scanner-specific datatype yyscan_t.

The usual solution to this problem is to break encapsulation by noting that yyscan_t is simply a void*, so it can be declared as such in the parser, thereby avoiding the need for the parser to #include the scanner header (as long as it does not require access to any other public methods declared in that header).

A less ugly solution, in my opinion, is to avoid this particular configuration altogether. Bison lets you request a reentrant "push parser", and that can be used along with a reentrant scanner without any of the above complications.

In the push parser model, the scanner is the top level function called to parse the file, and the scanner calls the parser each time it identifies a token. The header dependencies are no longer circular because the parser has no need to know anything about the scanner's context object, or for that matter about the prototype of yylex(). The scanner is now a client of the parser, and thus has a natural header dependency on the parser so the fact that the parser defines the token enums and semantic and location datatypes is no longer exceptional.

As well as simplifying the header dependencies between the two components, the push parser often simplifies control flow inside the scanner itself. In many use cases, a single scanner pattern will result in the identification of multiple tokens. In the traditional model, the scanner must keep a queue of tokens, releasing them one at a time when called by the parser. But in the push model, a scanner action can simply call the parser several times, once for each identified token. This model was popularised by the Lemon parser generator (part of sqlite3) and subsequently implemented by other parser generators including bison.

Feder answered 8/2, 2019 at 0:1 Comment(6)
Thanks. If I use bison without flex, what do I need to do differently comparing to use bison with flex?Frottage
For example, #54601202Frottage
@tim: did you look at the examples in the bison manual? Those are complete working calculators complete with hand-written yylex.Feder
Is what bison file need from flex just yylex()? If I define yylex() in the third section of bison file, do I not need flex file and flex any more?Frottage
@tim: as simple as that. See the examples I cited.Feder
This is exactly the kind of info I needed to grasp how flex and bison work, I have a hard time finding meaningful info in a lot of manuals especially flex's. Thank you so much!Faena

© 2022 - 2024 — McMap. All rights reserved.