What is Symbol Resolution?

Asked 24/7, 2009 at 2:20 Answered 13/6 at 14:55

This seems to be one of those things that every talks about but no one defines...I can't seem to find any information on this topic. What is symbol resolution? This is the best thing I've found: http://docs.oracle.com/cd/E23824_01/html/819-0690/chapter2-90421.html#chapter2-93321

Does it have something to do with how your program is compiled?

Citadel answered 24/7, 2009 at 2:20 Comment(0)

Well, now that you mention Unix's nm, I can pinpoint the symbol resolution.

Executable files can make reference to entities which are not defined inside themselves. For instance, variables or procedures on shared libraries. Those entities are identified by external symbols. The executable might as well have internal symbols that can be referenced by external files -- as is the case, of course of libraries.

Symbol resolution, in this context, is, once a program has been loaded into memory, assigning proper addresses to all external entities it refers to. This means changing every position in the loaded program where a reference to an external symbol was made.

These addresses will depend on where, in the memory, the code with the external symbols has been loaded.

In Unix, the default compilation mode for programs is to use the systems shared library, instead of pre-linking everything necessary in the executable. When compiling a program with gcc, for instance, you pass the -static flag if you wish it to be statically compiled, instead of having unresolved symbolic references.

Look up "shared libraries" for further information.

Indefeasible answered 24/7, 2009 at 2:33 Comment(2)

Makes sense, thank you very much. I will look up "shared libraries" as well. Do you have any books you could recommend for learning more about this? – Citadel 24/7, 2009 at 3:37

The best book I know of on the topic is Linkers and Loaders by John R. Levine. – Verde 14/10, 2012 at 16:42

As mentioned, it can refer to run-time or link-time symbol resolution. However you shouldn't forget compile-time symbol resolution.

This is the rules a language uses to map symbols to "things". Symbols being just about anything that looks like a name (local, members and global variables, functions, methods, types, etc.) and "things" being the compilers understanding of what the name refers to.

The rules for doing this can be fairly simple (for instance, IIRC in C it's little more than an ordered list of places to look) or complex (C++ has all sorts of case with overloading, templates and whatnot). Generally, these rules interact with the semantics of the program and sometimes they can even result in (potentially) ambiguities:

C++:

int First(int i) { return i; }
float First(float f) { return f; }

void Second(int (*fn)(int)) { printf("int"); }
void Second(float (*fn)(float); { printf("float"); }

...

Second(&First); // What will be printed?

Lindesnes answered 24/7, 2009 at 15:44 Comment(0)

I am not sure what context you mean symbol resolution in. But it reminds me of dlopen(3), and dlsym(3) for run-time symbol resolution in shared libraries.

Listed answered 24/7, 2009 at 2:34 Comment(1)

I mean it in the context of using it with the nm command on unix. Does that help? I'm not sure what dlopen is either, so I can't say if you're close. – Citadel 24/7, 2009 at 2:38

When building a compiler, the role of the parser is to deliver an abstract syntax tree, which represents the code "stripped" of syntactic frills (spaces, punctuation marks etc.). In this tree, let's say we have a declaration of a variable 'x' of type 'int'. This declaration is part of the large AST. Another part may concern the use of "x", as in a binary node that captures "x+42". In the latter node, we have two branches: one that captures the 'x' and a second that captures the integer literal 42. However, this last 'x' is still not linked to the declaration: nothing in this AST specifies that this last 'x' corresponds to the declared 'x'. As far as the AST is concerned, these are two separate symbols! The role of symbol resolution is to link these two "x": for example, it would be a good idea for the "x" identifier used in the expression "x+42" to point to its assumed definition. In general, there are two ways of doing this: either to actually create a link (pointer) to this definition, or to maintain a table of symbols throughout the compiler's life. Note that even in the first case, it will be necessary to build a symbol table, which allows concept definitions (e.g. variable declarations) to be recorded as and when they are encountered; but this table can be deleted once the links have been correctly created. In the second case, we often keep several symbol tables "hooked" to certain AST nodes. The former is more memory-intensive, but faster, and the latter more compact, but more laborious to use. Personally, I use both, depending on the project!

Nerissa answered 13/6 at 14:55 Comment(0)

Recommended topics

Hot tags