Does GNU/Flex C++ work at all?
Asked Answered
S

1

6

Quoting the book flex & bison (O'Reilly, John Levine, 2009):

"Bison can create parsers in C++. Although flex appears to be able to create C++, scanners, the C++ code doesn't work.[21] Fortunately, C scanners created by flex compile under C++ and it is not hard to use a flex C scanner with a bison C++ parser". (Footnote [21]: "This is confirmed by the guy who wrote it. It will probably be fixed eventually, but it turned out to be surprisingly hard to design a good C++ interface for flex scanners.")

Before I commit the effort of writing a rather complex Flex scanner I (and I think many of us) would like to know if anything about this has changed since 2009. Is anyone out there successfully writing Flex/C++ parsers? If so, is it worth the effort or is a C scanner with a C++ parser still the safest course?

Salamanca answered 16/4, 2016 at 16:5 Comment(2)
Fully explained here: bitbucket.org/emmetac/miscellany/src/master/flex/…Greatnephew
I was using it for exactly that purpose 25 years ago. Of course it works.Footlambert
M
7

It's totally possible and it works great once set-up. Unfortunately documentation about pure C++ Flex/Bison lexer parser is not so easy to find and follow.

I can expose you a barebone of a parser I wrote but it's just an example of how you could do it.

Mind that some of this code has been set-up by trial and error, since documentation is scarce, so therey could be superfluous operations or things that are not exactly correct but it works.

ypp file

%skeleton "lalr1.cc"
%require "3.0.2"

%defines
%define api.namespace {script}
%define parser_class_name {Parser}

%define api.token.constructor
%define api.value.type variant
%define parse.assert true

%code requires {

  namespace script
  {
    class Compiler;
    class Lexer;
  }
}

%lex-param { script::Lexer &lexer }
%lex-param { script::Compiler &compiler }
%parse-param { script::Lexer &lexer }
%parse-param { script::Compiler &compiler }

%locations
%initial-action
{
  @$.begin.filename = @$.end.filename = &compiler.file;
};

%define parse.trace
%define parse.error verbose

%code top {
  #include "Compiler.h"
  #include "MyLexer.h"
  #include "MyParser.hpp"

  static script::Parser::symbol_type yylex(script::Lexer &scanner, script::Compiler &compiler) {
    return scanner.get_next_token();
  }

  using namespace script;
}

// tokens and grammar

void script::Parser::error(const location_type& l, const std::string& m)
{
  compiler.error(l,m);
}

Here you can use C++ everywhere, for example

%type<std::list<Statement*>> statement_list for_statement
...
statement_list:
  { $$ = std::list<Statement*>(); }
  | statement_list statement { $1.push_back($2); $$ = $1; }
;

l file

%{
  #include "MyParser.hpp"
  #include "MyLexer.h"
  #include "Compiler.h"
  #include <string>

  typedef script::Parser::token token;

  #define yyterminate() script::Parser::make_END(loc);

  static script::location loc;

  using namespace script;
%}

%x sstring
%x scomment

%option nodefault
%option noyywrap
%option c++
%option yyclass="Lexer"
%option prefix="My"


%{
  # define YY_USER_ACTION  loc.columns((int)yyleng);
%}


%%

%{
  loc.step();
%}

Then you'll need a header file which defines your Lexer class which will inherit from yyFlexLexer that is how C++ Flex works, which is something like

#if ! defined(yyFlexLexerOnce)
#undef yyFlexLexer
#define yyFlexLexer NanoFlexLexer
#include <FlexLexer.h>
#endif

#undef YY_DECL
#define YY_DECL script::Parser::symbol_type script::Lexer::get_next_token()

#include "MyParser.hpp"

namespace script
{
  class Compiler;

  class Lexer : public yyFlexLexer
  {
  public:

    Lexer(Compiler &compiler, std::istream *in) : yyFlexLexer(in), compiler(compiler) {}

    virtual script::Parser::symbol_type get_next_token();
    virtual ~Lexer() { }

  private:

    Compiler &compiler;
  };

}

The last step is defining your Compiler class which will get called from the Bison grammar rules (that's what parse-param attributes in ypp file are for). Something like:

#include "parser/MyParser.hpp"
#include "parser/MyLexer.h"
#include "parser/location.hh"

#include "Symbols.h"

namespace script
{
  class Compiler
  {

  public:
    Compiler();

    std::string file;

    void error(const location& l, const std::string& m);
    void error(const std::string& m);

    vm::Script* compile(const std::string& text);

    bool parseString(const std::string& text);

    void setRoot(ASTRoot* root);
    Node* getRoot() { return root.get(); }
  };
}

Now you can execute parsing easily and totally passing by C++ code, eg:

bool Compiler::parseString(const std::string &text)
{      
  constexpr bool shouldGenerateTrace = false;

  istringstream ss(text);

  script::Lexer lexer = script::Lexer(*this, &ss);
  script::Parser parser(lexer, *this);
  parser.set_debug_level(shouldGenerateTrace);
  return parser.parse() == 0;
}

The only thing you must take care is to invoke flex on the .l file with -c++ argument to make it produce a C++ lexer.

Actually with some careful operations I've been also able to have multiple independent and self-reentrant lexers/parsers in the same project.

Marchpane answered 16/4, 2016 at 16:22 Comment(1)
Thanks to all. I especially like the word reentrant in one of the answers. I'll give the C++ lexer a go.Salamanca

© 2022 - 2024 — McMap. All rights reserved.