Why does this dynamic library loading code work with gcc?
Asked Answered
N

1

7

Background:

I've found myself with the unenviable task of porting a C++ GNU/Linux application over to Windows. One of the things this application does is search for shared libraries on specific paths and then loads classes out of them dynamically using the posix dlopen() and dlsym() calls. We have a very good reason for doing loading this way that I will not go into here.

The Problem:

To dynamically discover symbols generated by a C++ compiler with dlsym() or GetProcAddress() they must be unmangled by using an extern "C" linkage block. For example:

#include <list>
#include <string>

using std::list;
using std::string;

extern "C" {

    list<string> get_list()
    {
        list<string> myList;
        myList.push_back("list object");
        return myList;
    }

}

This code is perfectly valid C++ and compiles and runs on numerous compilers on both Linux and Windows. It, however, does not compile with MSVC because "the return type is not valid C". The workaround we've come up with is to change the function to return a pointer to the list instead of the list object:

#include <list>
#include <string>

using std::list;
using std::string;

extern "C" {

    list<string>* get_list()
    {
        list<string>* myList = new list<string>();
        myList->push_back("ptr to list");
        return myList;
    }

}

I've been trying to find an optimal solution for the GNU/Linux loader that will either work with both the new functions and the old legacy function prototype or at least detect when the deprecated function is encountered and issue a warning. It would be unseemly for our users if the code just segfaulted when they tried to use an old library. My original idea was to set a SIGSEGV signal handler during the call to get_list (I know this is icky - I'm open to better ideas). So just to confirm that loading an old library would segfault where I thought it would I ran a library using the old function prototype (returning a list object) through the new loading code (that expects a pointer to a list) and to my surprise it just worked. The question I have is why?

The below loading code works with both function prototypes listed above. I've confirmed that it works on Fedora 12, RedHat 5.5, and RedHawk 5.1 using gcc versions 4.1.2 and 4.4.4. Compile the libraries using g++ with -shared and -fPIC and the executable needs to be linked against dl (-ldl).

#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
#include <list>
#include <string>

using std::list;
using std::string;

int main(int argc, char **argv)
{
    void *handle;
    list<string>* (*getList)(void);
    char *error;

    handle = dlopen("library path", RTLD_LAZY);
    if (!handle)
    {
        fprintf(stderr, "%s\n", dlerror());
        exit(EXIT_FAILURE);
    }

    dlerror();

    *(void **) (&getList) = dlsym(handle, "get_list");

    if ((error = dlerror()) != NULL)
    {
        printf("%s\n", error);
        exit(EXIT_FAILURE);
    }

    list<string>* libList = (*getList)();

    for(list<string>::iterator iter = libList->begin();
          iter != libList->end(); iter++)
    {
        printf("\t%s\n", iter->c_str());
    }

    dlclose(handle);

    exit(EXIT_SUCCESS);
}
Nostradamus answered 23/2, 2011 at 21:20 Comment(5)
Because you got lucky. I suspect if you tried this sort of thing with a more complicated program, you'd start seeing effects of a smashed stack or similar.Diphthongize
The code I've posted is simplified. The actual application is around 100k lines of code and I've run some pretty extensive test cases that all seem to work. I agree though, this shouldn't work unless there is some quirk with GCC in this instance.Nostradamus
I am not sure if they must be unmangled is true. If you ask dlsym() for the mangled name will it not find it correctly.Passementerie
It will, but you have to know the mangled name. Name mangling algorithms are compiler specific, which is another can of worms altogether.Nostradamus
I would recommend switching over to the use of mangled names. Yeah, that means you have to know the mangled names, but you can get the compiler to tell you what they are by compiling a big dummy function that calls each of the functions of interest, and then inspecting the undefined symbol list for the object file you get. And you won't have this MSVC problem.Miscellanea
D
5

As aschepler says, its because you got lucky.

As it turns out, the ABI used for gcc (and most other compilers) for both x86 and x64 returns 'large' structs (too big to fit in a register) by passing an extra 'hidden' pointer arg to the function, which uses that pointer as space to store the return value, and then returns the pointer itself. So it turns out that a function of the form

struct foo func(...)

is roughly equivlant to

struct foo *func(..., struct foo *)

where the caller is expected to allocate space for a 'foo' (probably on the stack) and pass in a pointer to it.

So it just happens that if you have a function that is expecting to be called this way (expecting to return a struct) and instead call it via a function pointer that returns a pointer, it MAY appear to work -- if the garbage bits it gets for the extra arg (random register contents left there by the caller) happen to point to somewhere writable, the called function will happily write its return value there and then return that pointer, so the called code will get back something that looks a like a valid pointer to the struct it is expecting. So the code may superficially appear to work, but its actually probably clobbering a random bit of memory that may be important later.

Davison answered 24/2, 2011 at 1:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.