Dynamic Loading Without extern "C"
Asked Answered
L

1

9

I'd like to use libdl to dynamically load C++ in general. The problem is identifying symbols at runtime that have been name mangled.

As described here, one solution is to remove name mangling by using extern "C".

http://www.tldp.org/HOWTO/C++-dlopen/theproblem.html

This solution has the drawback of limiting dynamically loaded resources to C style interfaces. Dynamically loaded functions cannot, for instance, be overloaded functions.

What is a good way to overcome this limitation?

One possible solution would be tools to name mangle the library source code with an accompanying function to get the mangled names when the library needs to be linked. Does llvm provide tools for this?

Maybe a clumsy solution would be a function that takes a function signature, creates dummy code with a function that has the signature, pipes into the compiler that was used with a flag for generating assembly, parses the output to retrieve the mangled name, and returns the mangled name as a string. The string could then be passed to dlsym().

To keep the problem concrete, here are two example programs that illustrate something the extern "C" solution can't dynamically load without modifying library code. The first dynamically links a library in traditional C++ fashion. The second uses dlopen. Linking an overloaded function in the first program is simple. There's no simple way to link the overloaded function in the second program.

Program 1: Loadtime Dynamic Linking

main.cpp

// forward declarations of functions that will be linked
void say(int);
void say(float);

int main() {
    int myint = 3;
    say(myint);
    float myfloat = 5.0f;
    say(myfloat);
}

say.cpp

#include <iostream>

//extern "C" function signatures would collide

//extern "C" void say(int a) {
void say(int a) {
    std::cout << "The int value is " << a << ".\n";
}

//extern "C" void say(float a) {
void say(float r) {
    std::cout << "The float value is " << r << ".\n";
}

output

$ ./main
The int value is 3.
The float value is 5.

Program 2: Runtime Dynamic Linking

main_with_dl.cpp

#include <iostream>
#include <dlfcn.h>

int main() {
    // open library
    void* handle = dlopen("./say_externC.so", RTLD_LAZY);
    if (!handle) {
        std::cerr << "dlopen error: " << dlerror() << '\n';
        return 1;
    }

    // load symbol
    typedef void (*say_t)(int);

    // clear errors, find symbol, check errors
    dlerror();
    say_t say = (say_t) dlsym(handle, "say");
    const char *dlsym_error = dlerror();
    if (dlsym_error) {
        std::cerr << "dlsym error: " << dlsym_error << '\n';
        dlclose(handle);
        return 1;
    }

    // use function
    int myint = 3;
    say(myint);
    // can't load in void say(float)
    // float myfloat = 5.0f;
    // say(myfloat);

    // close library
    dlclose(handle);
}

output

$ ./main_with_dl
The int value is 3.

Compiling

Makefile

CXX = g++

all: main main_with_dl say_externC.so

main: main.cpp say.so
    $(CXX) -o $@ $^

main_with_dl: main_with_dl.cpp
    $(CXX) -o $@ $<

%.so : %.cpp
    $(CXX) -shared -o $@ $<

.PHONY: clean
clean:
    rm main main_with_dl say.so say_externC.so
Lianaliane answered 6/6, 2014 at 18:30 Comment(18)
I was researching how you can pass the decorated name to dlsym, but it occurs to me that if you're trying to call an overloaded function and a conversion will be necessary then that's not enough to cut it. Since there's no way (yet) in C++ to determine the signature of an overloaded function that would be called after conversions, Your only option I know of is to make wrappers that resolve the overloads in Program2 that can delegate to Program1. Unless... typeid?Coop
Losing automatic conversion is not a tragedy and importantly that can be worked around in client code.Lianaliane
but it means you must have a wrapper with the exact same signature in the client code, which means overloads can't be added later to the library. Is that a problem for you?Coop
You understand that even with a solution such as you propose, you still couldn't use objects in such an environment, right?Replenish
@MichaelKohne You mean that once you get function pointers from dlsym() you still need to know how to call and what to expect as the return value?Lianaliane
@MooingDuck I see what you're saying. That's not a huge problem. If the client code has to have function pointer names that indicate signature that's fine.Lianaliane
Just noticed: "creates dummy code with a function that has the signature, pipes into the compiler that was used with a flag for generating assembly, parses the output to retrieve the mangled name, and returns the mangled name as a string." This is the sort of thing that explains the awesomeness that is Visual Studio lib/dll pairs.Coop
@MooingDuck Wait, I don't see what you're saying. Why couldn't other overloads be added to the library?Lianaliane
@MooingDuck Ha, what is it that Visual Studio lets you do?Lianaliane
@Praxeolitic: because each "wrapper" in the client would have a 1:1 relation with one in the library. If you add one in the library, there's no wrapper in the client that calls it, so the client would never use it.Coop
@Praxeolitic: When Visual Studio generates a dynamic library, it also generates a link library. You link your client code with the link library, and the link library knows the signatures and mangled names and such, and automagically resolves all of this. I didn't even realize that this stuff was complicated until I did research for this question, it just works in Windows. Actually, just realized, none of that applies to dlls that you don't know the names of, like plugins. Nevermind.Coop
@MooingDuck Got it. Well if the code dynamically loading the library was itself library code that might be a concern but let's not get too crazy. If something gets added to the library assume a human may or may not choose to use that something in the client code.Lianaliane
@MooingDuck That Visual Studio solution is pretty good. I might try implementing that.Lianaliane
Let us continue this discussion in chat.Coop
@Lianaliane - Compilers have an enormous amount of lee-way in how they lay out objects in memory, and how they construct vtables, where they put the vtables, etc. All of which can change not only from compiler to compiler, but between compiler versions. Add in the various optimizations and structure packing options, and I don't see how you can possibly make this work for objects - there's just no sensible way for one compiler to understand another's object layout.Replenish
@MichaelKohne Good point. That is indeed a limitation but virtual base classes that provided known interfaces could go pretty far for interacting with objects that were unknown at link time.Lianaliane
1. Run nm on your dynamic library. 2. Run c++filt on the output of nm. 3. Voila! You have a table that maps between mangled and demangled form of names. 4. Use it.Professionalize
@Lianaliane - Sadly, I don't think that even really works. If you're dealing with different compilers, you don't even have a guarantee of what order they build their vtables, so even pure virtual bases are no good.Replenish
L
4

Thanks to Mooing Duck I was able to come up with a solution using clang and inspired by Visual Studio.

The key is a macro provided by Visual Studio and clang. The __FUNCDNAME__ macro resolves to the mangled name of the enclosing function. By defining functions with the same signature as the ones we want to dynamically link, we can get __FUNCDNAME__ to resolve to the needed name mangle.

Here's the new version of program 2 that can call both void say(int) and void say(float).

EDIT Mooing Duck dropped more knowledge on me. Here's a version of main_with_dl.cpp that works with say.cpp in the question.

#include <iostream>
#include <dlfcn.h>

void* handle;

template<class func_sig> func_sig get_func(const char* signature)
{
    dlerror();
    func_sig func = (func_sig) dlsym(handle, signature);
    const char *dlsym_error = dlerror();
    if (dlsym_error) {
        std::cerr << "dlsym error: " << dlsym_error << '\n';
        dlclose(handle);
        exit(1);
    }
    return func;
}

void say(int a) {
    typedef void(*func_sig)(int);
    static func_sig func = get_func<func_sig>(__FUNCDNAME__);
    return func(a);
}

void say(float a) {
    typedef void(*func_sig)(float);
    static func_sig func = get_func<func_sig>(__FUNCDNAME__);
    return func(a);
}

int main() {
    // open library
    //void* handle = dlopen("./say_externC.so", RTLD_LAZY);
    handle = dlopen("./say.so", RTLD_LAZY);
    if (!handle) {
        std::cerr << "dlopen error: " << dlerror() << '\n';
        return 1;
    }

    // use function
    int myint = 3;
    say(myint);
    float myfloat = 5.0f;
    say(myfloat);

    // close library
    dlclose(handle);
}

http://coliru.stacked-crooked.com/a/7249cc6c82ceab00

The code must be compiled using clang++ with the -fms-extensions flag for __FUNCDNAME__ to work.

Lianaliane answered 6/6, 2014 at 20:9 Comment(3)
coliru.stacked-crooked.com/a/62dffb457ca3eb6a, less code duplication, and fewer calls to dlerror() after the symbol has already been resolved once.Coop
The links (both in the comment, @MooingDuck, and in the answer) do not work anymore.Downwards
They work for me. I've inlined the code from the answer, for future resiliance.Coop

© 2022 - 2024 — McMap. All rights reserved.