Class hierarchy of tokens and checking their type in the parser

I'm attempting to write a reusable parsing library (for fun).

I wrote a Lexer class which generates a sequence of Tokens. Token is a base class for a hierarchy of subclasses, each representing different token type, with its own specific properties. For example, there is a subclass LiteralNumber (deriving from Literal and through it from Token), which has its own specific methods for dealing with numeric value of its lexeme. Methods for dealing with lexemes in general (retrieving their character string representation, position in the source etc.) are in the base class, Token, because they're general to all token types. Users of this class hierarchy can derive their own classes for specific token types not predicted by me.

Now I have a Parser class which reads the stream of tokens and tries to match them with its syntax definition. For example, it has a method matchExpression, which in turn calls matchTerm and this one calls matchFactor, which has to test if the current token is Literal or Name (both derived from Token base class).

The problem is:
I need to check now what is the type of the current token in the stream and whether it matches the syntax or not. If not, throw an EParseError exception. If yes, act accordingly to get its value in the expression, generate machine code, or do whatever the parser needs to do when the syntax matches.

But I've read much about that checking the type in runtime, and deciding from it, is a bad design™, and it should be refactored as polymorphic virtual methods. Of course, I agree with that.

So my first attempt was to put some type virtual method in the Token base class, which would be overrided by the derived classes and return some enum with type id.

But I already see a shortcomings of this approach: Users deriving from Token their own classes of tokens won't be able to add additional id's to the enum, which is in the library source! :-/ And the goal was to allow them for extending the hierarchy for new types of tokens when they'll need it.

I could also return some string from the type method, which would allow for easy defining new types.

But still, in both these cases, the information about base types is lost (only leaf type is returned from type method) and the Parser class wouldn't be able to detect the Literal derived type when someone would derive from it and override the type to return something other than "Literal".

And of course the Parser class, which also is meant for extending by users (that is, writing their own parsers, recognizing their own tokens and syntax) doesn't know what descendants of the Token class will be there in the future.

Many FAQs and books on design recommend in this scenario to take the behavior from the code which needs to decide by type, and put it into the virtual method overriden in derived classes. But I cannot imagine how could I put this behavior into the Token descendants, because it's not their busines, for example, to generate machine code, or evaluate expressions. Moreover, there are parts of the syntax which need to match more than one token, so there is no one particular token which I could put that behavior into. It's rather the responsibility of particular syntax rules, which could match more than one token as their terminal symbols.

Any ideas how to improve this design?

class Base { public: // Shared virtual functions // ... template <typename T> T *instance() {return dynamic_cast<T *>(this);} }; class Derived : public Base { // ... }; // Somewhere in your code Base *x = f(); if (x->instance<Derived>()) ;// Do something // or Derived *d = x->instance<Derived>();

Recommended topics

Hot tags