In compiler construction, when you talk about tokens, is a token the same like a symbol / just another term for a symbol? After some research I think to understand, that a token is a symbol with a reference to the symbol table, therefore some kind of attributed symbol / a symbol with some additional informations? Thanks for any clearfication :-)
A token is not necessarily a symbol in the symbol table. For example, if a token is a reserved word, then it is not entered in the symbol table. If a token is an identifier, then it will likely be entered in the symbol table.
Take for example the following declaration:
char s[100];
A lexical analyzer could output the following tokens:
<"char", IDENTIFIER>
depending on the implementation it could be recognized as a reserved word or be entered in the symbol table as a predefined type name (I am not 100% sure here),
<"s", IDENTIFIER>
"s" is entered in symbol table as a variable identifier,
<"[", OPEN_SQUARE_BRACKET>
not entered in symbol table,
<"100", INTEGER_LITERAL>
not entered in symbol table,
<"]", CLOSE_SQUARE_BRACKET>
not entered in symbol table,
<";", SEMI_COLON>
not entered in symbol table.
So you basically enter in the symbol table only those tokens that you need to reference later during the compilation process. E.g., later in the function body, when you find
strcpy(s, "Hello, world\n");
you recognize again the token <"s", IDENTIFIER> and look it up in the symbol table. The symbol table will say that "s" has been declared as a variable of type char [].
So , I would say a token is any chunk of input that is recognized by the lexical analizer, and that only certain tokens with a special meaning are entered as symbols in the symbol table.
A symbol is the GIVEN constant IN THAT KEY as in S=S in that table, or Pi = Pi in that equation, whereas a token represents the given as the medium OF exchange in that condition.
When parsing the code, you split the source file into tokens. You might use the analogy of words in a sentence. Tokens are created by the Lexer/Scanner.
int x = arr[4];
In this example, the tokens would be:
int
x
=
arr
[
4
]
;
The parser checks the syntax and (often) generates an abstract syntax tree (AST). This tree may be traversed multiple times to gether symbols and types, perform checks, etc.
Symbols are
- variables
- functions
- etc.
Here the variables x
and arr
would be symbols. Symbols are inserted to the symbol-table. They may hold additionaly information, like type, address, etc.
So the token x
represents the symbol x
of type int
. To use the analogy with words again: subject/object/predicate are represented by words, but theire meaning is different.
© 2022 - 2025 — McMap. All rights reserved.