Analyze backtrace of a crash occurring due to a faulty library
Asked Answered
D

2

7

In my application I have setup signal handler to catch Segfaults, and print bactraces. My application loads some plugins libraries, when process starts.

If my application crashes with a segfault, due to an error in the main executable binary, I can analyze the backtrace with:

addr2line -Cif -e ./myapplication 0x4...

It accurately displays the function and the source_file:line_no

However how do analyze if the crash occurs due to an error in the plugin as in the backtrace below?

/opt/myapplication(_Z7sigsegvv+0x15)[0x504245]
/lib64/libpthread.so.0[0x3f1c40f500]
/opt/myapplication/modules/myplugin.so(_ZN11ICAPSection7processEP12CONNECTION_TP7Filebufi+0x6af)[0x7f5588fe4bbf]
/opt/myapplication/modules/myplugin.so(_Z11myplugin_reqmodP12CONNECTION_TP7Filebuf+0x68)[0x7f5588fe51e8]
/opt/myapplication(_ZN10Processors7ExecuteEiP12CONNECTION_TP7Filebuf+0x5b)[0x4e584b]
/opt/myapplication(_Z15process_requestP12CONNECTION_TP7Filebuf+0x462)[0x4efa92]
/opt/myapplication(_Z14handle_requestP12CONNECTION_T+0x1c6d)[0x4d4ded]
/opt/myapplication(_Z13process_entryP12CONNECTION_T+0x240)[0x4d79c0]
/lib64/libpthread.so.0[0x3f1c407851]
/lib64/libc.so.6(clone+0x6d)[0x3f1bce890d]

Both my application and plugin libraries have been compiled with gcc and are unstripped. My application when executed, loads the plugin.so with dlopen Unfortunately, the crash is occurring at a site where I cannot run the application under gdb.

Googled around frantically for an answer but all sites discussing backtrace and addr2line exclude scenarios where analysis of faulty plugins may be required. I hope some kind-hearted hack knows solution to this dilemma, and can share some insights. It would be so invaluable for fellow programmers.

Tons of thanks in advance.

Dodona answered 19/9, 2013 at 10:18 Comment(7)
Is there something special about these "plugin libraries" or are they just standard shared libraries?Absorber
Also, have you seen this SO question #7556545 ?Absorber
@Absorber yes I had checked out that link a few months back and had also requested for a bit more clarity in derivation, see my comment there. Btw. my plugin libraries are simple c/c++ so objects, nothing really special about them. Sincere thanks for taking interest.Dodona
Oh, that was your comment there! Didn't realize that while looking at the question. Do you have the lib's starting address from the pmap? Where that other question gets the numbers from isn't quite clear to me, but imho it should be as simple as (in your case) 0x7f5588fe4bbf - [appropriate start address from pmap]. If you're having trouble with finding the right sections, add the part of the pmap corresponding to your library to the question.Absorber
@Absorber pmap -d <pid> results with: 00007fab30287000 48 r-x-- 0000000000000000 008:00001 myplugin.so 00007fab30293000 2044 ----- 000000000000c000 008:00001 myplugin.so 00007fab30492000 4 r-x-- 000000000000b000 008:00001 myplugin.so 00007fab30493000 4 rwx-- 000000000000c000 008:00001 myplugin.so so my library starting address would be 0x7fab30287000 right?Dodona
Hrm. That looks like the library is now at a different virtual address than where it was when your program last crashed :-/ . Nothing about myplugin.so in the 0x7f5.... range?Absorber
@Absorber yes you are right. The segfault had occurred at another site, yesterday, and the bactraces were captured then. btw i was struggling to get the formatting while I suppose you were trying to respond here. regret inconvenience.Dodona
A
7

Here are some hints that may help you debug this:

The address in your backtrace is an address in the address space of the process at the time it crashed. That means that, if you want to translate it into a 'physical' address relative to the start of the .text section of your library, you have to subtract the start address of the relevant section of pmap from the address in your backtrace.

Unfortunately, this means that you need a pmap of the process before it crashed. I admittedly have no idea whether loading addresses for libraries on a single system are constant if you close and rerun it (imaginably there are security features which randomize this), but it certainly isn't portable across systems, as you have noticed.

In your position, I would try:

  • demangling the symbol names with c++filt -n or manually. I don't have a shell right now, so here is my manual attempt: _ZN11ICAPSection7processEP12CONNECTION_TP7Filebufi is ICAPSection::process(CONNECTION_T *, Filebuf *, int). This may already be helpful. If not:
  • use objdump or nm (I'm pretty sure they can do that) to find the address corresponding to the mangled name, then add the offset (+0x6af as per your stacktrace) to this, then look up the resulting address with addr2line.
Absorber answered 19/9, 2013 at 13:23 Comment(2)
Bingo! You absolutely nailed it. It worked, and is precise. Can't thank you enough. I knew I would get an ace hack around here. Thanks again.Dodona
I used nm. It has fewer switches and the output is rather easier to use. Check out below, I have elaborated upon your tip, hopefully - correctly. Cheers.Dodona
D
5

us2012's answer was quite the trick required to solve the problem. I am just trying to restate it here just to help any other newbie struggling with the same problem, or if somebody wishes to offer improvements.

In the backtrace it is clearly visible that the flaw exists in the code for myplugin.so. And the backtrace indicates that it exists at:

/opt/myapplication/modules/myplugin.so(_ZN11ICAPSection7processEP12CONNECTION_TP7Filebufi+0x6af)[0x7f5588fe4bbf]

The problem of locating the line corresponding to this fault cannot be determined as simplistically as:

addr2line -Cif -e /opt/myapplication/modules/myplugin.so 0x7f5588fe4bbf

The correct procedure here would be to use nm or objdump to determine the address pointing to the mangled name. (Demangling as done by us2012 is not really necessary at this point). So using:

nm -Dlan /opt/myapplication/modules/myplugin.so | grep "_ZN11ICAPSection7processEP12CONNECTION_TP7Filebufi"

I get:

0000000000008510 T _ZN11ICAPSection7processEP12CONNECTION_TP7Filebufi   /usr/local/src/unstable/myapplication/sources/modules/myplugin/myplugin.cpp:518

Interesting to note here is that myplugin.cpp:518 actually points to the line where the opening "{" of the function ICAPSection::process(CONNECTION_T *, Filebuf *, int)

Next we add 0x6af to the address (revealed by the nm output above) 0000000000008510 using linux shell command

 printf '0x%x\n' $(( 0x0000000000008510 + 0x6af ))

And that results in 0x8bbf

And this is the actual source_file:line_no of the faulty code, and can be precisely determined with addr2line as:

addr2line -Cif -e /opt/myapplication/modules/myplugin.so 0x8bbf

Which displays:

std::char_traits<char>::length(char const*)
/usr/include/c++/4.4/bits/char_traits.h:263
std::string::assign(char const*)
/usr/include/c++/4.4/bits/basic_string.h:970
std::string::operator=(char const*)
/usr/include/c++/4.4/bits/basic_string.h:514
??
/usr/local/src/unstable/myapplication/sources/modules/myplugin/myplugin.cpp:622

I am not too sure why the function name was not displayed here, but myplugin.cpp:622 was quite precisely where the fault was.

Dodona answered 19/9, 2013 at 15:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.