How does flex support bison-location exactly?
Asked Answered
G

8

23

I'm trying to use flex and bison to create a filter, because I want get certain grammar elements from a complex language. My plan is to use flex + bison to recognise the grammar, and dump out the location of elements of interest. (Then use a script to grab text according the locations dumped.)

I found flex can support a bison feature called bison-locations, but how it works in exactly. I tried the example in flex document, it seems the yylloc is not set automatically by flex, I always get (1,0)-(1,0). Could flex calculate each token's location automatically? If not, what interface function is defined for me to implement? Is there any example?

Any better solution regarding to tools?

Best Regards, Kevin

Edit:

Now the interface for yylex turn to:

int yylex(YYSTYPE * yylval_param,YYLTYPE * yylloc_param );

bison manual does not specify how lexer should implement to correctly set yylloc_param. For me it is hard to manually trace column number of each token.

Gorlicki answered 18/3, 2009 at 1:49 Comment(1)
In more modern versions of Flex, look up %option bison-locations and %option bison-bridge (see Flex manual — Code-level and API options).Emeric
E
8

Take a look at section 3.6 of the Bison manual - that seems to cover locations in some detail. Combined with what you found in the Flex manual, that may be sufficient.

Emeric answered 18/3, 2009 at 2:50 Comment(1)
I figured that only line number is import to me.Gorlicki
H
23

The yylex declaration probably changed because you used a reentrant or pure-parser. Seems like many documents around the web suggest it's required if you want bison locations to work but it's not required.

I needed line numbers too and found the Bison documentation confusing in that regard. The simple solution (using the global var yylloc): In your Bison file just add the %locations directive:

%{
...
%}
%locations
...
%%
...

in your lexer:

%{
...
#include "yourprser.tab.h"  /* This is where it gets the definition for yylloc from */
#define YY_USER_ACTION yylloc.first_line = yylloc.last_line = yylineno;
%}
%option yylineno
...
%%
...

The YY_USER_ACTION macro is "called" before each of your token actions and updates yylloc. Now you can use the @N/@$ rules like this:

statement : error ';'   { fprintf(stderr, "Line %d: Bad statement.\n", @1.first_line); }

, or use the yylloc global var:

void yyerror(char *s)
{
  fprintf(stderr, "ERROR line %d: %s\n", yylloc.first_line, s);
}
Heyes answered 27/4, 2011 at 22:53 Comment(3)
I don't think this is enough. I've tried this and always get a `yylloc' undeclared error when I try to build. There must be something else that you have to do to enable yylloc.Vile
Did you add the %locations directive? Did you include you generated .tab.h file in the lexer? Maybe you're using very old versions of bison+flex? It works for me with Bison 2.4.1 and Flex 2.5.35.Heyes
In a reentrant scanner, I had to yyget_lineno(scanner) instead of just yylineno.Whitsunday
W
22

Neither bison nor flex updates yylloc automatically, but it's actually not difficult to do it yourself—if you know the trick.

The trick to implementing yylloc support is that, even though yyparse() declares yylloc, it never changes it. That means that if you modify yylloc in one call to the lexer, you'll find the same values in it at the next call. Thus, yylloc will contain the position of the last token. Since the last token's end is the same as the current token's start, you can use the old yylloc value to help you determine the new value.

In other words, yylex() should not calculate yylloc; it should update yylloc.

To update yylloc, we must first copy the last_ values to first_, and then update the last_ values to reflect the length of the just-matched token. (This is not the strlen() of the token; it's the lines-and-columns length.) We can do this in the YY_USER_ACTION macro, which is called just before any lexer action is performed; that ensures that if a rule matches but it doesn't return a value (for instance, a rule skipping whitespace or comments), the location of that non-token is skipped, rather than being included at the beginning of the actual token, or lost in a way that makes the location tracking inaccurate.

Here's a version meant for a reentrant parser; you could modify it for a non-reentrant parser by swapping the -> operators for .:

#define YY_USER_ACTION \
    yylloc->first_line = yylloc->last_line; \
    yylloc->first_column = yylloc->last_column; \
    for(int i = 0; yytext[i] != '\0'; i++) { \
        if(yytext[i] == '\n') { \
            yylloc->last_line++; \
            yylloc->last_column = 0; \
        } \
        else { \
            yylloc->last_column++; \
        } \
    }

If you'd prefer, you could instead put that code in a function and make the macro call the function, but the two techniques are equivalent.

Welles answered 2/3, 2014 at 8:36 Comment(3)
Much more helpful than the other answers, though I'd suggest making a function and just calling that in the macro - static void update_loc(YYLTYPE *loc, char *txt) { loc->first_line = ...} / #define YY_USER_ACTION update_loc(yylloc, yytext);.Fieldsman
Never mind about the location thing, turns out I had a rule that matched the whole line and REJECTed it.Fieldsman
This is by far the most helpful answer, and IMO should be the accepted oneGiltzow
C
16

I like Shlomi's answer.

In addition I was looking for updating column location as well. Found http://oreilly.com/linux/excerpts/9780596155971/error-reporting-recovery.html which made more sense after reading Shlomi's answer.

Unfortunately there is a typo on that page for yylloc. I've simplified it below a bit.

In your parser add:

%locations

in your lexer:

%{

#include "parser.tab.h"

int yycolumn = 1;

#define YY_USER_ACTION yylloc.first_line = yylloc.last_line = yylineno; \
    yylloc.first_column = yycolumn; yylloc.last_column = yycolumn + yyleng - 1; \
    yycolumn += yyleng; \
    yylval.str = strdup(yytext);

%}

%option yylineno

There might be something going on with column location which doesn't strictly keep track of columns but rather just keeps increasing. That's just my ignorance and appologize if it confuses anyone. I'm currently using column to keep a file character count which in my case is more beneficial than column location.

Hope that helps.

Casserole answered 6/11, 2011 at 2:30 Comment(7)
The reason the column number just keeps increasing is because you never set it back to 1 on a newline and neither will Flex, since it doesn't even know about your yycolumn variable. Apparently what you need to do is keep track of the newlines yourself instead of relying on %option yylineno.Camarilla
Doesn't yylval.str = strdup(yytext) set the contents of Bison token? You would only want this to be the default action in a rule if every token was a string, no?Gustative
Is the yylval.str = strdup(yytext); part of the macro necessary?Heliochrome
I think yylval.str = strdup(yytext); is not necessaryJaimie
I am using the YY_USER_ACTION from this answer in my reentrant scanner, but without the definition "int yycolumn = 1;" as for reentrant scanners, it is already defined as a macro for yy_bs_column in the buffer struct. I am puzzled that the first token on line 1 gets column 0, whereas the first on subseq. lines get column 1. The yy.lex.c file contains code (not by me, so from flex "skeleton") to initialize it to 0 and reset to 0 after incrementing yylineno. I wonder if this is because the LF gets counted somehow as well, or if the column pos should be 1-based, and initialized in yy.lex.c to 1?Integrand
Silly me; given that the code above defines it as "int yycolumn = 1;" the outcome woud be a 1-based column position; however the "skeleton" code sets it to 0 as mentioned, so it would seem the intention was to have it 0-based. Separating the NL out of my WS rule and resetting yycolumn to 0 there (after the YY_USER_ACTION has been performed) ensures a consistent 0-based column position. This seemed easier than editing yy.lex.c to have it initialized to 1. I leave my previous comment and add this in the hope it may help somebody stumbling into the same problem.Integrand
As I cannot edit my comments after 5 minutes, this is just to note that I meant lex.yy.c where I wrote yy.lex.c...Integrand
E
8

Take a look at section 3.6 of the Bison manual - that seems to cover locations in some detail. Combined with what you found in the Flex manual, that may be sufficient.

Emeric answered 18/3, 2009 at 2:50 Comment(1)
I figured that only line number is import to me.Gorlicki
C
8

Shomi's answer is the simplest solution if you only care about keeping the line number. However, if you also want column numbers then you need to keep track of them.

One way to do that is to add yycolumn = 1 rules everywhere a newline shows up (as suggested in David Elson's answer) but if you don want to keep track of all the places a newline could show up (whitespace, comments, etc...) an alternative is inspecting the yytext buffer at the start of every action:

static void update_loc(){
  static int curr_line = 1;
  static int curr_col  = 1;

  yylloc.first_line   = curr_line;
  yylloc.first_column = curr_col;

  {char * s; for(s = yytext; *s != '\0'; s++){
    if(*s == '\n'){
      curr_line++;
      curr_col = 1;
    }else{
      curr_col++;
    }
  }}

  yylloc.last_line   = curr_line;
  yylloc.last_column = curr_col-1;
}

#define YY_USER_ACTION update_loc();

Finally, one thing to note is that once you start keeping track of column numbers by hand you might as well also keep track of the line numbers in the same place and not bother with using Flex's yylineno option.

Camarilla answered 3/10, 2013 at 0:58 Comment(0)
C
5

So, I got this to "work", but with a couple of extra steps (I may have overlooked them here ... apologies in that case):

  1. In parser.y, I had to say:

    #define YYLEX_PARAM &yylval, &yylloc
    

    even with %locations and bison --locations, to get it to pass the data.

  2. In lexer.l I had to use -> instead of . for yylloc

  3. Also in lexer.l, I reset the column in the action:

    [\n] { yycolumn = 1; }
    

Obviously a bit more complex, for \r etc, but at least I got it to work.

Candi answered 1/6, 2012 at 22:3 Comment(1)
Adding --locations to the command line OR %locations after the grammar will mean that yylloc is in scope of the .y file, provided you refer to it from code in the final "%%" section.Mundell
S
2

An addition to Shlomi's answer:

If you're using %define api.pure in bison to create a reentrant parser, you also need to specify %option bison-locations in flex. This is because in a reentrant parser yylloc is not a global variable, and needs to be passed into the lexer.

So, in the parser:

%define api.pure
%locations

in the lexer:

#include "yourprser.tab.h"
#define YY_USER_ACTION yylloc.first_line = yylloc.last_line = yylineno;
%option bison-locations
%option yylineno
Saccharometer answered 9/11, 2015 at 16:27 Comment(0)
A
1

I think I managed to make it work ( credit goes to the writer of the bison manual ltcalc lexical analyzer). By default, bison creates yylloc that contains

{ first_line, first_column , last_line , last_column }

We only need to update those values in our lexical analyzer. Ex :

[ \t]     { ++yylloc.last_column; }
[\n]      { yyloc.last_column = 0; return EOL; }
[a-zA-Z]+ { 
            yylloc.last_column += strlen(yytext);
            return IDENTIFIER;
          }

Now in bison, to retrieve those fields:

statement : IDENTIFIER '=' expression 
            { printf("%d - %d\n", @1.last_line, @1.last_column); }

By default these fields are initialized to one, we should initialize the column fields to zero otherwise they will report the wrong column.

Anecdotic answered 13/8, 2015 at 14:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.