Compiling part of a C++ program for GPU

Asked 22/5, 2014 at 13:14 Answered 9/6, 2014 at 8:28

Is it possible to compile (C++) code for the GPU with nvcc into a shared object (.so file) and load it dynamically from a C++ program (in this case, Cern's ROOT, which is essentially a C++ interpreter ("CINT")).

A simple example that I would like to run is:

extern "C"
void TestCompiled() {
  printf("test\n");
  exit(0); 
}

This code was compiled with nvcc --compiler-options '-fPIC' -o TestCompiled_C.so --shared TestCompiled.cu. Loading the shared object into ROOT with:

{ // Test.C program
  int error, check;
  check = gROOT->LoadMacro("TestCompiled_C.so", &error);
  cout << "check " << check << " " << " error: " << error << endl;
  TestCompiled();  // run macro
  exit(0); 
}

loads the library OK, but does not find TestCompiled():

$ root -b -l Test.C
root [0] 
Processing Test.C...
check 0  error: 0
Error: Function Hello() is not defined in current scope  Test.C:11:
*** Interpreter error recovered ***

Doing the same by compiling the first test script with ROOT (without the extern line, compiling with root TestCompiled.C++) works… What can I try in order to make the C++ program find the test function when nvcc does the compilation?

Loraine answered 22/5, 2014 at 13:14 Comment(7)

Have you looked at G++ NVCC How to compile CUDA code then Link it to a G++ C++ project – Featly 29/5, 2014 at 2:49

if the c++ program cannot find the test function, that suggests that there is a library path problem. I don't use nvcc, but generally you either have to export the library path, or include -Wl,-rpath=/path/to/your/lib in order for the lib and functions to be found. Run ldd -v execuatablename and see if there are any problems with the c++ executable seeing the library. I'm sure you are using it, but Cuda Toolkit had exhaustive documentation. – Featly 29/5, 2014 at 3:53

@DavidC.Rankin Thank you, but the library is found (loaded), it's just that it doesn't find the function that's in it (maybe is it mangled in a different way?)… My student will have a look at the links you gave. – Loraine 29/5, 2014 at 8:24

The header file including the function definition is included in the c++ source file you compile with nvcc? – Featly 31/5, 2014 at 0:19

No header file was included. We'll try this. Note, however, that everything works well if ROOT itself compiles the first (TestCompile) program: it does not need any header file for loading the compiled file and running its TestCompiled() function. – Loraine 1/6, 2014 at 15:4

@DavidC.Rankin: Including a header file appeared to be part of the solution, indeed. There are details at root.cern.ch/phpBB3/viewtopic.php?f=3&t=18147. – Loraine 5/6, 2014 at 1:11

Glad you were pointed in the right direction. Sometimes it is just a forest-for-the-trees issue. (happens to me all the time :) Especially when you are working to marry multiple compilers and libraries together... – Featly 9/6, 2014 at 2:33

I'm copying, for reference, the salient points of the answer from the RootTalk forum that solved the problem:

A key point is that the C interpreter of ROOT (CINT) requires a "CINT dictionary" for the externally compiled function. (There is no problem when compiling through ROOT, because ACLiC creates this dictionary when it pre-compiles the macro [root TestCompiled.C++]).

So, an interface TestCompiled.h++ must be created:

#ifdef __cplusplus
extern "C" {
#endif

  void TestCompiled(void);

#ifdef __cplusplus
} /* end of extern "C" */
#endif

The interface must then be loaded inside ROOT along with the shared object:

{ // Test.C ROOT/CINT unnamed macro (interpreted)
  Int_t check, error;
  check = gROOT->LoadMacro("TestCompiled_C.so", &error);
  std::cout << "_C.so check " << check << " error " << error << std::endl;
  check = gROOT->LoadMacro("TestCompiled.h++", &error);
  std::cout << "_h.so check " << check << " error " << error << std::endl;
  TestCompiled(); // execute the compiled function
}

ROOT can now use the externally compiled program: root -b -l -n -q Test.C works.

This can be tested with, e.g., g++ on the following TestCompiled.C:

#include <cstdio>
extern "C" void TestCompiled(void) { printf("test\n"); }

compiled with

g++ -fPIC -shared -o TestCompiled_C.so TestCompiled.C

Loraine answered 9/6, 2014 at 8:28 Comment(0)

I am assuming that the shared object file being output is like any other shared library, such as one created with GCC using the shared option. In this case, to load the object dynamically, you will need to use the dlopen function to get a handle to the shared object. Then, you can use the dlsym function to look for a symbol in the file.

void *object_handle = dlopen("TestCompiled_C.so", RTLD_NOW);
if (object_handle == NULL)
{
  printf("%s\n", dlerror());
  // Exit or return error code
}
void *test_compiled_ptr = dlsym(object_handle, "TestCompiled");
if (!test_compiled)
{
  printf("%s\n", dlerror());
  // Exit or return error code
}

void (*test_compiled)() = (void (*)()) test_compiled_ptr;
test_compiled();

You will need to include dlfcn.h and link with -ldl when you compile.

The difference between this and what you are doing now is that you are loading the library statically rather that dynamically. Even though shared objects are "dynamically linked libraries," as they are called in the windows world, doing it the way you are now is loading all of the symbols in the object when the program is launched. To dynamically load certain symbols at runtime, you need to do it this way.

Byyourleave answered 2/6, 2014 at 3:3 Comment(0)