In compiler construction, is a symbol the same as a token?
Asked Answered
P

3

15

In compiler construction, when you talk about tokens, is a token the same like a symbol / just another term for a symbol? After some research I think to understand, that a token is a symbol with a reference to the symbol table, therefore some kind of attributed symbol / a symbol with some additional informations? Thanks for any clearfication :-)

Plucky answered 29/7, 2011 at 12:10 Comment(0)
B
19

A token is not necessarily a symbol in the symbol table. For example, if a token is a reserved word, then it is not entered in the symbol table. If a token is an identifier, then it will likely be entered in the symbol table.

Take for example the following declaration:

char s[100];

A lexical analyzer could output the following tokens:

<"char", IDENTIFIER>

depending on the implementation it could be recognized as a reserved word or be entered in the symbol table as a predefined type name (I am not 100% sure here),

<"s", IDENTIFIER>

"s" is entered in symbol table as a variable identifier,

<"[", OPEN_SQUARE_BRACKET>

not entered in symbol table,

<"100", INTEGER_LITERAL>

not entered in symbol table,

<"]", CLOSE_SQUARE_BRACKET>

not entered in symbol table,

<";", SEMI_COLON>

not entered in symbol table.

So you basically enter in the symbol table only those tokens that you need to reference later during the compilation process. E.g., later in the function body, when you find

strcpy(s, "Hello, world\n");

you recognize again the token <"s", IDENTIFIER> and look it up in the symbol table. The symbol table will say that "s" has been declared as a variable of type char [].

So , I would say a token is any chunk of input that is recognized by the lexical analizer, and that only certain tokens with a special meaning are entered as symbols in the symbol table.

Barbarity answered 29/7, 2011 at 12:30 Comment(3)
thank you so much for this great reply! I somehow mixed up the name for the symbol table ;-)Plucky
So following from above - would it be fair to argue that all tokens entered into the symbol table are in fact - only identifiers? i.e. what tokens might be entered into the symbol table that are not identifiers? ..Is the definition of a symbol exactly an identifier? (as opposed to other tokens '[', ';', '100', etc. I'm speculating, I have no idea.)Khaddar
As far as I know you are correct: the only tokens that are entered in the symbol table are identifiers (of functions, variables, etc) (See also en.wikipedia.org/wiki/Symbol_table)Barbarity
T
0

A symbol is the GIVEN constant IN THAT KEY as in S=S in that table, or Pi = Pi in that equation, whereas a token represents the given as the medium OF exchange in that condition.

Tokoloshe answered 2/7, 2017 at 16:0 Comment(0)
M
0

When parsing the code, you split the source file into tokens. You might use the analogy of words in a sentence. Tokens are created by the Lexer/Scanner.

int x = arr[4];

In this example, the tokens would be:

  • int
  • x
  • =
  • arr
  • [
  • 4
  • ]
  • ;

The parser checks the syntax and (often) generates an abstract syntax tree (AST). This tree may be traversed multiple times to gether symbols and types, perform checks, etc.

Symbols are

  • variables
  • functions
  • etc.

Here the variables x and arr would be symbols. Symbols are inserted to the symbol-table. They may hold additionaly information, like type, address, etc.

So the token x represents the symbol x of type int. To use the analogy with words again: subject/object/predicate are represented by words, but theire meaning is different.

Meunier answered 21/3, 2024 at 9:16 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.