The first thing to consider is: at what point do we have enough information so that we can do the semantic check?
For a static language like C, we can do this semantic right at parse time, with a syntax-directed rule, such as those trigged in Yacc.
Your parser needs to maintain symbol tables. That is to say, whenever you open a new scope such as new function body or statement block, you have to create a new symbol table object for that scope (and keep a pointer to that in some global parser variable as the "current scope"). The scope also has a pointer to the previous scope. When the scope closes, you restore the original scope as "current scope". This scope opening and closing is tied to the parser rules which handle the block constructs like function or statement bodies, or structure bodies.
The scope contains associations between variable names and semantic information, like what kind of a symbol it is, and other attributes like type.
When your parser processes a declaration of some kind, then the declared name is introduced into the current symbol table, and thereafter it is known.
So, fast forward to our problem: how to check that a name is not defined. This is not difficult. Somewhere, your parser has rules like
primary_expression : '(' expression ')'
/* ...*/
| CONSTANT
| IDENT
;
A primary expression can be an identifier such as a variable, constant or the name of a function. If the rules are strict, that these have to be defined if they can be used, we can put the check right here.
For the action rule of IDENT
, we look up the identifier in the current symbol table. If the search comes up with nothing, we raise an error that there is an undefined identifier.
Pseudo-code:
primary_expression : '(' expression ')'
/* ...*/
| CONSTANT
| IDENT {
struct symbol *sym = symbol_lookup(current_scope, $1);
if (sym == NULL) {
static_error("undeclared identifier %s", $1);
$$ = error_node();
} else {
/* ... */
}
}
The symbol_lookup
function does not only look in the current scope! If the identifier is not found in the current scope, it recurses into the parent scope and so on. The toplevel scope in the chain of scopes is the file scope. If the identifier is found there, then it is a global identifier of some kind. If it is not found there either, it's undefined.
I also made up static_error
; it has printf
-like arguments, and adds file/line number information, and increments the error count (so that when the parser is done it can indicate failure based on the error count being nonzero). I made up error_node
also; it's a function or macro that produces some kind of node which indicates an error (perhaps just a null pointer). Your parser rule has to produce something and store it into $$
. For an identifier that does not exist, we can put some marker into the tree instead.
If you're writing a compiler in C using Yacc, you have a lot of work to do to invent all these data structures like symbol tables and write the supporting libraries.
c
if at all possible). – Cosherm
might note that it is anint
; the symbol table entry forc
might note that it is an undeclared variable (and therefore not object to it being used after the first error report because you can't tell whether it was being misused or not if the code was syntactically correct). – Cosher