Does Symbol table for C++ code contain function names along with class names?
Asked Answered
W

3

6

I have been searching through various posts regarding whether symbol table for a C++ code contains functions' name along with the class name. Something which i could find on a post is that it depends on the type of compiler,

if it compiles code in one-pass then it will not need to store class name and subroutine names in your symbol table

but if it is a multi-pass compiler, it could add information about the class(es) it encounters and their subroutines so that it could do argument type checking and issue meaningful error messages.

I could not understand whether it is actually compiler dependent or not? I was assuming that compiler(for C++ code) would put function names with class names in the table whether it is single pass or multi pass compiler. How is it dependent on the passes? I don't have such a great/deep knowledge. Moreover, could anyone show a sample symbol table for a simple C++ class, how would it look like (function names with class name)?

Whit answered 14/8, 2015 at 13:49 Comment(5)
Err... we're talking about the symbol table here, the one that has to unambiguously identify an entity in order to resolve external dependencies? How, do you think, would "unambiguously" work unless the function name were included?Aerospace
Read this en.m.wikipedia.org/wiki/Name_manglingMarcello
"if it compiles code in one-pass then it will not need to store class name and subroutine names in your symbol table" - I can't imagine what you mean by this. What do you suppose a symbol table is for? I think you've got some fundamental misunderstanding that lies below the level of this question.Amazonas
Basically the symbol table contains the fully qualified names, including the parameter types.Chowder
Note that any kind of database table with an enumerated column can be split up in multiple tables. If I have one FOO table with a COLOR=RED | GREEN | BLUE column, I can split that table in three tables FOO_RED, FOO_GREEN and FOO_BLUE. I can split a symbol table with TYPE = FUNCTION | CLASS | VARIABLE into 3 separate tables for functions, classes and variables. Logically it's all the same, just a matter of convenience.Vish
H
8

Most compiler textbooks will tell you about symbol tables, and often show you details about a modest complexity langauge such as Pascal. You won't find information about C++ symbol tables in a textbook; it is too arcane.

We offer a complete C++14 front end for our DMS Software Reengineering Toolkit. It parses C++, builds detailed ASTs, and performs name-and-type resolution, which includes building a precise symbol table.

What follows are slides from our tutorial on how to use DMS, focused on the C++ symbol table structures.

OP asked specifically for a view of what happens with classes. The following diagram shows this for the tiny C++ program in the upper left corner. The rest of the diagram shows boxes, which represent what we call "symbol spaces" (or "scopes"), which are essentially hash tables mapping symbol names (each box lists the symbols it owns) to the information that DMS knows about that symbol (source file location of definition, list of AST nodes that reference the definition, and a complex union that represents the type, and that may in turn point to other types). The arrows show how symbol spaces are connected; an arrow from space A to space B means "scope A is contained within scope B". Typically the symbol space lookup process, searching scope A for a symbol x, will continue the search in scope B if x is not found in A. You'll note the arrows are numbered with an integer; this tells the search machinery to look in the least-numbered parent scope first, before trying to search scopes using arrows with larger numbers. This is how scopes are ordered (note Class C inherits from A and B; any lookup of a field in class C such as "b" will be forced to first look in the scope for A, and then in the scope for B. In this way, the C++ lookup rules are achieved.

Note the the class names are recorded in the (unique) global namespace because they is declared at top level. If they had been defined in some explicit namespace, then the namespace would have a corresponding symbol space of its own that recorded the declared classes, and the namespace itself would be recorded in the global symbol space.

C++ Symbol Table: Class Perspective

OP did not ask what the symbol table looks like for function bodies, but I just so happen to have an illustrative slide for that that, too, below. The symbol spaces work the same way. What is shown in this slide is the linkage between a symbol space, and the scoped region it represents. That linkage is actually implemented by a pointer associated with the symbol space, to the corresponding AST(s, namespace definitions can be scattered around in multiple places).

Note that in this case, the function name is recorded in the global namespace because it is declared at top level. If it had been defined inside the scope of a class, the function name would have been recorded in the symbol space for the class body (on previous diagram).

C++ Symbol Table: Function Perspective

As a general rule, the details of how the symbol table is organized is completely dependent on the compiler, and the choices the designers made. In our case, we designed a very general symbol table management package because we planned (and have) used the same package to handle multiple languages (C, C++, Java, COBOL, several legacy languages) in a uniform way. However, the abstract structures of symbol spaces and inheritance will have to implemented in essentially equivalent ways across C++ compilers; after all, they have to model the same information. I'd expect similar structures in the GCC and Clang compilers (well, the integer-numbered inheritance arcs, maybe not :)

As a practical matter, it doesn't matter how many "passes" your compiler has. It pretty much has to build these structures to remember what it knows about the symbols, within a pass, and across passes.

While building a C++ parser is very hard by itself, building such a symbol table is much harder. The effort dwarfs the effort to build the C++ parser. Our C++ name resolver is some 250K SLOC of attribute-grammar code compiled and executed by DMS. Getting the details rights is an enormous headache; the C++ reference manual is enormous, confusing, the facts are scattered everywhere across the document, and in a variety of places it is contradictory (we try to send complaints about this to the committee) and or inconsistent between compilers (we have versions for GCC and Visual Studio 201x).

Update March 2017: Now have symbol tables for C++2014. Update June 2018: Now have symbol tables for C++2017.

Hezekiah answered 14/8, 2015 at 14:43 Comment(2)
what is a "precise symbol table". All linkers I know were able to work with the default "precise" symbol table... So what exactly is the difference?Heigho
There are a lot of sloppy reverse engineering tools out there that build sloppy symbol tables: they are incomplete, they fail to model inheritance let alone overload lookup correctly, etc. Doxygen I think now uses Clang, but earlier versions used a complete (sloppy) hack for parsing C++, and then built such a sloppy table. Tools that actually manipulate real programs (GCC, Clang, linkers, DMS can't avoid being precise, or they wouldn't work).Hezekiah
A
2

A symbol table maps names to constructs within the program. As such it is used to record the names of classes, functions, variables, and anything else that has a user-specified name within the program.

(There are two common kinds of symbol table - one that the compiler maintains when it is compiling your program, and another that exists in object file so that it can be linked to other objects. The two are strongly related, but need not have similar representation internally. Typically only some of the symbols from the compiler's symbol table will be output into the object).

Part of what you say makes no sense:

if it compiles code in one-pass then it will not need to store class name and subroutine names in your symbol table

How can the compiler determine to what construct a name refers if it cannot look it up in the symbol table?

but if it is a multi-pass compiler, it could add information about the class(es) it encounters and their subroutines so that it could do argument type checking and issue meaningful error messages.

There's no reason it could not do this in a single pass.

I could not understand whether it is actually compiler dependent or not?

All compilers are going to use a symbol table, but its use will be hidden inside the implementation.

I was assuming that compiler(for C++ code) would put function names with class names in the table whether it is single pass or multi pass compiler. How is it dependent on the passes?

How is what dependent on the passes? All names go in the symbol table - that's what it's for - and usually symbol resolution is important for just about everything else the compiler does, so it needs to be done early (i.e. in the first pass - and in fact the main purpose of the first pass in a multi-pass compiler compiler may well be just to build the symbol table!).

Moreover, could anyone show a sample symbol table for a simple C++ class, how would it look like (function names with class name)?

I'll give it a stab:

class A
{
    int a;
    void f(int, int);
};

Will yield a symbol table containing symbols "A", "a", and "f". Typically "a" and "f" would be marked with a scope to simplify lookup, eg:

"A"  -> (class)
"A::a"  ->  (class variable member)
"A::f(int,int)"  ->  (class function member)

It's also possible that the a and f symbols will not be stored in the top-level symbol table, but rather that each name space (including C++ namespaces and classes) will have its own symbol table, containing the symbols defined inside it. But this is, arguably, just a data structure choice. You can still abstractly view the symbol table as a flat table, where a name maps to a construct.

In general the "A::a" symbol would not be output to the object file, since it is not required for linking.

Amazonas answered 14/8, 2015 at 14:15 Comment(0)
S
0

Short answer: yes, using 'nm --demangle' on linux

Long answer: The functions in the symbol table contain the function name plus the return value and if it is belongs to a class, the class name too. But the names,types (not always) and classes are not written with it's fulls names to use less space. This strings called demangle. But you know that this short name is unique and you can parse the full class name from it. To view the symbol table of your program you can use 'nm' on linux.

http://linux.about.com/library/cmd/blcmdl1_nm.htm

It got the --demangle flag to view the original names. You can compile random short programs to see what comes out.

Solarium answered 14/8, 2015 at 14:19 Comment(2)
Mangled name does not typically include the return value type.Remuneration
You're confusing the symbols embedded in object files with the symbol table built by the compiler, which itself has no use for mangled names.Hezekiah

© 2022 - 2024 — McMap. All rights reserved.