I'm attempting to write a reusable parsing library (for fun).
I wrote a Lexer
class which generates a sequence of Tokens
. Token
is a base class for a hierarchy of subclasses, each representing different token type, with its own specific properties. For example, there is a subclass LiteralNumber
(deriving from Literal
and through it from Token
), which has its own specific methods for dealing with numeric value of its lexeme. Methods for dealing with lexemes in general (retrieving their character string representation, position in the source etc.) are in the base class, Token
, because they're general to all token types. Users of this class hierarchy can derive their own classes for specific token types not predicted by me.
Now I have a Parser
class which reads the stream of tokens and tries to match them with its syntax definition. For example, it has a method matchExpression
, which in turn calls matchTerm
and this one calls matchFactor
, which has to test if the current token is Literal
or Name
(both derived from Token
base class).
The problem is:
I need to check now what is the type of the current token in the stream and whether it matches the syntax or not. If not, throw an EParseError
exception. If yes, act accordingly to get its value in the expression, generate machine code, or do whatever the parser needs to do when the syntax matches.
But I've read much about that checking the type in runtime, and deciding from it, is a bad design™, and it should be refactored as polymorphic virtual methods. Of course, I agree with that.
So my first attempt was to put some type
virtual method in the Token
base class, which would be overrided by the derived classes and return some enum
with type id.
But I already see a shortcomings of this approach: Users deriving from Token
their own classes of tokens won't be able to add additional id's to the enum
, which is in the library source! :-/ And the goal was to allow them for extending the hierarchy for new types of tokens when they'll need it.
I could also return some string
from the type
method, which would allow for easy defining new types.
But still, in both these cases, the information about base types is lost (only leaf type is returned from type
method) and the Parser
class wouldn't be able to detect the Literal
derived type when someone would derive from it and override the type
to return something other than "Literal"
.
And of course the Parser
class, which also is meant for extending by users (that is, writing their own parsers, recognizing their own tokens and syntax) doesn't know what descendants of the Token
class will be there in the future.
Many FAQs and books on design recommend in this scenario to take the behavior from the code which needs to decide by type, and put it into the virtual method overriden in derived classes. But I cannot imagine how could I put this behavior into the Token
descendants, because it's not their busines, for example, to generate machine code, or evaluate expressions. Moreover, there are parts of the syntax which need to match more than one token, so there is no one particular token which I could put that behavior into. It's rather the responsibility of particular syntax rules, which could match more than one token as their terminal symbols.
Any ideas how to improve this design?
boost::spirit
is polymorphic, and I think AXE is, too. – Hemline