Is there any way to compile additional code at runtime in C or C++?
Asked Answered
F

7

23

Here is what I want to do:

  1. Run a program and initialize some data structures.
  2. Then compile additional code that can access/modify the existing data structures.
  3. Repeat step 2 as needed.

I want to be able to do this with both C and C++ using gcc (and eventually Java) on Unix-like systems (especially Linux and Mac OS X). The idea is to basically implement a read-eval-print loop for these languages that compiles expressions and statements as they are entered and uses them to modify existing data structures (something that is done all the time in scripting languages). I am writing this tool in python, which generates the C/C++ files, but this should not be relevant.

I have explored doing this with shared libraries but learned that modifying shared libraries does not affect programs that are already running. I have also tried using shared memory but could not find a way to load a function onto the heap. I have also considered using assembly code but have not yet attempted to do so.

I would prefer not to use any compilers other than gcc unless there is absolutely no way to do it in gcc.

If anyone has any ideas or knows how to do this, any help will be appreciated.

Fletafletch answered 12/5, 2012 at 14:39 Comment(3)
You should look at llvm/clang. But a REPL for C or C++ sounds like quite a task.Valvule
Something to do with this?Equilibrium
I'd rather use real scripting language, except you can tell what's all this forMischief
S
14

I think you may be able to accomplish this using dynamic libraries and loading them at runtime (using dlopen and friends).

void * lib = dlopen("mynewcode.so", RTLD_LAZY);
if(lib) {
    void (*fn)(void) = dlsym(lib, "libfunc");

    if(fn) fn();
    dlclose(lib);
}

You would obviously have to be compiling the new code as you go along, but if you keep replacing mynewcode.so I think this will work for you.

Spoils answered 12/5, 2012 at 14:46 Comment(9)
Loading should be supported, I'm not sure that un loading is supported in all cases however.Gazetteer
@ChrisStratton: I'll confess I'm far from an expert on runtime loading, but the man page leads me to believe the symbols are unloaded at dlclose (specifically the RTLD_NODELETE flag). Take all that with a grain of salt though :).Spoils
@ChrisStratton I don't know of 'all' cases but in one project of mine I have never seen dlclose() not unload the symbols. Unless of course RTLD_NODELETE is passed in which case it does not unload them.Jeffiejeffrey
N.B. cosine = (double (*)(double)) dlsym(handle, "cos"); .. According to the ISO C standard, casting between function pointers and 'void *', as done above, produces undefined results. POSIX.1-2003 and POSIX.1-2008 accepted this state of affairs and proposed the following workaround: *(void **) (&cosine) = dlsym(handle, "cos"); (1/2)Jeffiejeffrey
This (clumsy) cast conforms with the ISO C standard and will avoid any compiler warnings. The 2013 Technical Corrigendum to POSIX.1-2008 (a.k.a. POSIX.1-2013) improved matters by requiring that conforming implementations support casting 'void *' to a function pointer. Nevertheless, some compilers (e.g., gcc with the '-pedantic' option) may complain about the cast used in this program. (From dlopen(3)) Just as a note about this issue :) (2/2)Jeffiejeffrey
Oh one more thing: In the past I noticed that without matching CFLAGS in the shared libraries and the binary that loads them crashes (obviously not making it a .so but otherwise yes and making sure the binary can load them properly maybe with - and I am vague on this one - -rdynamic). I don't remember the specifics as this was decades ago but looking at it more deeply matching the flags stopped the (random) crashing. And depending on the environment you might need different PIC compiler options.Jeffiejeffrey
Make that one more thing now. As for RTLD_LAZY just to clarify what it means: for functions (not variables) it only resolves the unresolved symbols when the function is called. Depending on the use and importance of the functions this might not be desirable - as in it might be desirable to abort the program if the symbols cannot be resolved at the time. In which case you can use RTLD_NOW. You can use dlerror() to report the error of the most recent related function. You of course need the linker flag for these functions -ldl. I think that's all I have now!Jeffiejeffrey
Where is the libary that has all these functions?Nagari
@Nagari - From the man page: Link with -ldl. If you don't have libdl.so on your system, check your package manager or distribution's documentation.Spoils
G
16

There is one simple solution:

  1. create own library having special functions
  2. load created library
  3. execute functions from that library, pass structures as function variables

To use your structures you have to include same header files like in host application.

structs.h:

struct S {
    int a,b;
};

main.cpp:

#include <iostream>
#include <fstream>
#include <dlfcn.h>
#include <stdlib.h>

#include "structs.h"

using namespace std;

int main ( int argc, char **argv ) {

    // create own program
    ofstream f ( "tmp.cpp" );
    f << "#include<stdlib.h>\n#include \"structs.h\"\n extern \"C\" void F(S &s) { s.a += s.a; s.b *= s.b; }\n";
    f.close();

    // create library
    system ( "/usr/bin/gcc -shared tmp.cpp -o libtmp.so" );

    // load library        
    void * fLib = dlopen ( "./libtmp.so", RTLD_LAZY );
    if ( !fLib ) {
        cerr << "Cannot open library: " << dlerror() << '\n';
    }

    if ( fLib ) {
        int ( *fn ) ( S & ) = dlsym ( fLib, "F" );

        if ( fn ) {
            for(int i=0;i<11;i++) {
                S s;
                s.a = i;
                s.b = i;

                // use function
                fn(s);
                cout << s.a << " " << s.b << endl;
            }
        }
        dlclose ( fLib );
    }

    return 0;
}

output:

0 0
2 1
4 4
6 9
8 16
10 25
12 36
14 49
16 64
18 81
20 100

You can also create mutable program that will be changing itself (source code), recompiling yourself and then replace it's actual execution with execv and save resources with shared memory.

Gillie answered 12/5, 2012 at 15:42 Comment(4)
very useful info but how would you go about including the main.cpp in the tmp.cpp?Eggnog
Okey, I was going to edit the question to answer you, but there is no need :) You can't include the main.cpp in tmp. If you want to share some data, then you have to use headers (or write it directly to file) and pass structures into dynamically created function :)Gillie
thnx! when using headers however the values of the variables shared are not the same. so i end up having to pass them to the function. i am wondering if there is some way to go around passing variablesEggnog
If you include header into cpp, it's same as if you have written its content into cpp. So you end up with two instances of variables ( main.cpp and dynamic library ), but if u had the header(defining variables) included in two objects(cpp-s) in same library, then it will throw you error in compilation. You have to use "extern" keyword in header, to tell compiler, these variables are not instantiated in current object(cpp) and will be linked by linker. You can make variables 'static', they will be instantiated privately in every object, but you won't share anything anyway.Gillie
S
14

I think you may be able to accomplish this using dynamic libraries and loading them at runtime (using dlopen and friends).

void * lib = dlopen("mynewcode.so", RTLD_LAZY);
if(lib) {
    void (*fn)(void) = dlsym(lib, "libfunc");

    if(fn) fn();
    dlclose(lib);
}

You would obviously have to be compiling the new code as you go along, but if you keep replacing mynewcode.so I think this will work for you.

Spoils answered 12/5, 2012 at 14:46 Comment(9)
Loading should be supported, I'm not sure that un loading is supported in all cases however.Gazetteer
@ChrisStratton: I'll confess I'm far from an expert on runtime loading, but the man page leads me to believe the symbols are unloaded at dlclose (specifically the RTLD_NODELETE flag). Take all that with a grain of salt though :).Spoils
@ChrisStratton I don't know of 'all' cases but in one project of mine I have never seen dlclose() not unload the symbols. Unless of course RTLD_NODELETE is passed in which case it does not unload them.Jeffiejeffrey
N.B. cosine = (double (*)(double)) dlsym(handle, "cos"); .. According to the ISO C standard, casting between function pointers and 'void *', as done above, produces undefined results. POSIX.1-2003 and POSIX.1-2008 accepted this state of affairs and proposed the following workaround: *(void **) (&cosine) = dlsym(handle, "cos"); (1/2)Jeffiejeffrey
This (clumsy) cast conforms with the ISO C standard and will avoid any compiler warnings. The 2013 Technical Corrigendum to POSIX.1-2008 (a.k.a. POSIX.1-2013) improved matters by requiring that conforming implementations support casting 'void *' to a function pointer. Nevertheless, some compilers (e.g., gcc with the '-pedantic' option) may complain about the cast used in this program. (From dlopen(3)) Just as a note about this issue :) (2/2)Jeffiejeffrey
Oh one more thing: In the past I noticed that without matching CFLAGS in the shared libraries and the binary that loads them crashes (obviously not making it a .so but otherwise yes and making sure the binary can load them properly maybe with - and I am vague on this one - -rdynamic). I don't remember the specifics as this was decades ago but looking at it more deeply matching the flags stopped the (random) crashing. And depending on the environment you might need different PIC compiler options.Jeffiejeffrey
Make that one more thing now. As for RTLD_LAZY just to clarify what it means: for functions (not variables) it only resolves the unresolved symbols when the function is called. Depending on the use and importance of the functions this might not be desirable - as in it might be desirable to abort the program if the symbols cannot be resolved at the time. In which case you can use RTLD_NOW. You can use dlerror() to report the error of the most recent related function. You of course need the linker flag for these functions -ldl. I think that's all I have now!Jeffiejeffrey
Where is the libary that has all these functions?Nagari
@Nagari - From the man page: Link with -ldl. If you don't have libdl.so on your system, check your package manager or distribution's documentation.Spoils
V
5

Even though LLVM is now used today mostly for its optimizations and backend roles in compilation, as its core it is the Low-Level Virtual Machine.

LLVM can JIT code, even though the return types may be quite opaque, so if you are ready to wrap your own code around it and don't worry too much about the casts that are going to take place, it may help you.

However C and C++ are not really friendly for this kind of thing.

Venlo answered 12/5, 2012 at 15:1 Comment(0)
S
4

This can be done portably with OpenCL

OpenCL is a widely supported standard, mainly used for offloading calculations to specialized hardware, such as GPUs. However, it also works just fine on CPUs and actually performs run-time compilation of C99-like code as one of its core features (this is how the hardware portability is achieved). The newer versions (2.1+) also accept a large subset of C++14.

A basic example of such run-time compilation & execution might look something like this:

#ifdef __APPLE__
#include<OpenCL/opencl.h>
#else
#include<CL/cl.h>
#endif
#include<stdlib.h>
int main(int argc,char**argv){//assumes source code strings are in argv
    cl_int e = 0;//error status indicator
    cl_platform_id platform = 0;
    cl_device_id device = 0;
    e=clGetPlatformIDs(1,&platform,0);                                      if(e)exit(e);
    e=clGetDeviceIDs(platform,CL_DEVICE_TYPE_ALL,1,&device,0);              if(e)exit(e);
    cl_context context = clCreateContext(0,1,&device,0,0,&e);               if(e)exit(e);
    cl_command_queue queue = clCreateCommandQueue(context,device,0,&e);     if(e)exit(e);
    //the lines below could be done in a loop, assuming you release each program & kernel
    cl_program program = clCreateProgramWithSource(context,argc,(const char**)argv,0,&e);
    cl_kernel kernel = 0;                                                   if(e)exit(e);
    e=clBuildProgram(program,1,&device,0,0,0);                              if(e)exit(e);
    e=clCreateKernelsInProgram(program,1,&kernel,0);                        if(e)exit(e);
    e=clSetKernelArg(kernel,0,sizeof(int),&argc);                           if(e)exit(e);
    e=clEnqueueTask(queue,kernel,0,0,0);                                    if(e)exit(e);
    //realistically, you'd also need some buffer operations around here to do useful work
}
School answered 11/8, 2017 at 18:12 Comment(0)
H
3

Yes - you can do this with Runtime Compiled C++ (or take a look at the RCC++ blog and videos), or one of its alternatives.

You could also checkout this talk I gave along with Matthew Jack from Kythera AI on RCC++ at the Develop Game Conference - transcript of the RCC++ talk on my devblog.

Halliday answered 13/2, 2015 at 9:55 Comment(0)
G
2

If nothing else works - in particular, if un-loading a shared library ends up not being supported on your runtime platform, you could do it the hard way.

1) use system() or whatever to execute gcc or make or whatever to build the code

2) either link it as a flat binary or parse whatever format (elf?) the linker outputs on your platform yourself

3) get yourself some executable pages, either by mmap()'ing an executable file or do doing an anonymous mmap with the execute bit set and copying/unpacking your code there (not all platforms care about that bit, but let's assume you have one that does)

4) flush any data and instruction caches (since consistency between the two is typically not guaranteed)

5) call it via a function pointer or whatever

Of course there's another option too - depending on the level of interaction you need, you could build a separate program and either launch it and wait for the result, or fork off and launch it and talk to it by pipes or sockets. If this would meet your needs, it would be a lot less tricky.

Gazetteer answered 12/5, 2012 at 15:17 Comment(0)
S
0

You could consider using EasyJit:

This is a project by a compiler engineer named Juan Manuel Martinez Caamano a few years back. The idea is that you have an LLVM-supported library, which lets you pass pointers to functions in your code, and "JIT-(re)compile" them while setting values to some their parameters. Behind the scenes, the original compilation actually saves some bitcode/bytecode for each relevant function, which can be used by the LLVM library when so triggered.

CppCon 2018 talk about EasyJIT

Spittoon answered 29/5 at 10:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.