Offset in nm symbol value?
Asked Answered
G

4

7

Just to give you some context, here's what I'm trying to achieve: I am embedding a const char* in a shared object file in order to have a version string in the .so file itself. I am doing data analysis and this string enables me to let the data know which version of the software produced it. This all works fine.

The issue I am having is when I try to read the string out of the .so library directly. I tried to use

nm libSMPselection.so | grep _version_info

and get

000000000003d968 D __SMPselection_version_info

this is all fine and as expected (the char* is called _SMPselection_version_info). However I would have expected to now be able to open the file, seek to 0x3d968 and start reading my string, but all I get is garbage.

When I open the .so file and simply search for the contents of the string (I know how it starts), I can find it at address 0x2e0b4. At this address it's there, zero terminated and as expected. (I am using this method for now.)

I am not a computer scientist. Could someone please explain to me why the symbol value shown by nm isn't correct, or differently, what is the symbol value if it isn't the address of the symbol?

(By the way I am working on a Mac with OSX 10.7)

Gratian answered 3/5, 2012 at 11:43 Comment(0)
O
2

Nobody suggested the simplest way: Do a binary that dynamically loads your lib (give it the name on the command line) and does dlsym() for your symbol (or it can get that on the command line too) cast it to string pointer and prints it to stdout.

Omnibus answered 3/5, 2012 at 18:7 Comment(2)
This is a great idea. I'm trying it right now. There is only one problem: The libraries that I am testing have a rather long chain of dependencies on other libraries. If I try to load with dlopen them I get Symbol-not-found errors. The version string that I am interested in of course has no dependencies. How do I make dl ignore dependencies?Gratian
I have checked. This works great if I have all dependencies loaded which is one of my two use-cases. Thanks for the idea.Gratian
S
7

Assuming its an ELF or similarily structured binary, you have to take into account the address where stuff is loaded, which is influenced by things in the ELF header.

Using objdump -Fd on your binary, you can have the disassembler also show the exact file offset of a symbol.

Using objdump -x you can find this loader address, usually 0x400000 for standard linux executables.

The next thing you have to be careful with is to see if its an indirect string, this you can do most easily by using objdump -g. When the string is found as being an indirect string, at the position output by objdump -Fd you will not find the string, but the address. From this you need to subtract the loader address again. Let me show you an example for one of my binaries:

objdump -Fd BIN | grep VersionString
  45152f:       48 8b 1d 9a df 87 00    mov    0x87df9a(%rip),%rbx        # ccf4d0 <acVersionString> (File Offset: 0x8cf4d0)

objdump -x BIN
...
LOAD off    0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**12
...

So we look at 0x8cf4d0 in the file and find in the hexeditor:

008C:F4D0 D8 C1 89 00  00 00 00 00  01 00 00 00  FF FF FF FF

So we take the 0x89C1D8 there, subtract 0x400000 and have 0x49c1d8 and when we look there in the hexeditor we find:

0049:C1D0 FF FF 7F 7F  FF FF 7F FF  74 72 75 6E  6B 5F 38 30
0049:C1E0 34 33 00 00  00 00 00 00  00 00 00 00  00 00 00 00

Which means "trunk_8043".

YMMV, especially when its some other file format, but that is the general way on how these things are structured, with lots of warts and details that deviate for special cases.

Samaveda answered 3/5, 2012 at 12:26 Comment(2)
Ok, thanks, I feel like you answered it. What I was hoping for was to be able to get the string without scanning the entire file (or diassembling it). By the way, my version of objdump doesn't have the -F option (i'm using GNU objdump 2.17.50.0.6-20.el5 20061020).Gratian
@Simon: That is a pretty ancient version of objdump (I can't even remember anymore how 2006 was like). You can get this fileoffset on your own by subtracting the same 0x400000 offset from the 0xccf4d0. Maybe there is also a tool that does all those things for your, or you could write yourself a small script..Samaveda
O
2

Nobody suggested the simplest way: Do a binary that dynamically loads your lib (give it the name on the command line) and does dlsym() for your symbol (or it can get that on the command line too) cast it to string pointer and prints it to stdout.

Omnibus answered 3/5, 2012 at 18:7 Comment(2)
This is a great idea. I'm trying it right now. There is only one problem: The libraries that I am testing have a rather long chain of dependencies on other libraries. If I try to load with dlopen them I get Symbol-not-found errors. The version string that I am interested in of course has no dependencies. How do I make dl ignore dependencies?Gratian
I have checked. This works great if I have all dependencies loaded which is one of my two use-cases. Thanks for the idea.Gratian
O
1

On Linux you have the 'strings' command which help you extract strings from binaries.

http://linux.about.com/library/cmd/blcmdl1_strings.htm

In HPUX (and I think in other Unix flavors too) there's a similar command called 'what'. It extracts only strings that start with "@(#)", but if you control the content of the string this is not a problem.

Omnibus answered 3/5, 2012 at 11:59 Comment(2)
How will that help him to get the contents of a specific symbol?Samaveda
"what" is nice but I really want my string to be multiple lines and what stops at newlines. The strings command prints all string without telling me where my own string ends. Also it seems to just read the entire file which is exactly what I do. It seems more elegant if I could read the symbol entry and jump to the string directly.Gratian
C
1

Why would you expect the offset displayed by nm to be the offset in the .so file? .so files are not simply memory images; they contain a lot of other information as well, and have a more or less complicated format. Under Unix (at least under most Unices), shared objects use the elf format. To find the information, you will have to interpret the various fields in the file, to find where the symbol you want is located, in which segment, and where that segment starts in the file. (You can probably find a library which will simplify reading them.)

Also, if you are correct in saying that you've embedded a char const*, i.e. that your code contained something like:

char const* version = "...";

then the address or offset of version is the address or offset of the pointer, not the string data it is pointed to. Defining it as:

char const version[] = "...";

will solve this.

Finally, the simplest solution might be to just make sure that the string has some highly identifiable pattern, and scan the entire file linearly looking for this pattern.

Clamant answered 3/5, 2012 at 12:30 Comment(3)
Scanning the entire file is exactly what I do. It just seems less elegant and I want to learn something, so I asked this question. Declaring the array instead of the pointer makes is disappear from the list of symbols that nm displays.Gratian
@Gratian Well, it is more elegant to parse the file correctly, but it's also a lot more work. As for declaring the array instead of a pointer, the reason it disappears is because of a subtlety of C++: a const object has internal linkage by default. If you declare it extern char const version[] = "...", this won't happen; the extern forces external linkage and the initialization makes it a definition, and not a declaration.Clamant
Thanks, of course I forgot about linkage! Using the extern keyword the string now appears in the symbol table and the address I get from nm actually matches the location of the sting, It works now. I am able to get the string by seeking to the address I get from nm!Gratian

© 2022 - 2024 — McMap. All rights reserved.