WARNING:
I have to explicitely warn everyone who tries to do this. The general premise of having a shared library hooking dlsym
has several significant drawbacks. The biggest issue issue is that the original dlsym
implementation if glibc will internally use stack unwinding techniques to find out from which loaded module the function was called. If the intercepting shared library then calls the original dlsym
on behalf of the original application, this will break lookups using stuff like RTLD_NEXT
, as now the current module isn't the originally calling one, but your hook library.
It might be possible to implement this the correct way, but it requires a lot more work. Without having tried it, I think that using dlinfo
to get to the chained list of linket maps, you could individually walk through all modules, and do a separate dlsym
for each one, to get the RTLD_NEXT
behavior right. You still need to get the address of your caller for that, which you might get via the old backtrace(3)
family of functions.
MY OLD ANSWER FROM 2013
I stumbled across the same problem with hdante's answer as the commenter: calling __libc_dlsym()
directly crashes with a segfault. After reading some glibc sources, I came up with the following hack as a workaround:
extern void *_dl_sym(void *, const char *, void *);
extern void *dlsym(void *handle, const char *name)
{
/* my target binary is even asking for dlsym() via dlsym()... */
if (!strcmp(name,"dlsym"))
return (void*)dlsym;
return _dl_sym(handle, name, dlsym);
}
NOTE two things with this "solution":
- This code bypasses the locking which is done internally by
(__libc_)dlsym()
, so to make this threadsafe, you should add some locking.
- The thrid argument of
_dl_sym()
is the address of the caller, glibc seems to reconstruct this value by stack unwinding, but I just use the address of the function itself. The caller address is used internally to find the link map the caller is in to get things like RTLD_NEXT
right (and, using NULL as thrid argument will make the call fail with an error when using RTLD_NEXT
). However, I have not looked at glibc's unwindind functionality, so I'm not 100% sure that the above code will do the right thing, and it may happen to work just by chance alone...
The solution presented so far has some significant drawbacks: _dl_sym()
acts quite differently than the intended dlsym()
in some situations. For example, trying to resolve a symbol which does not exist does exit the program instead of just returning NULL. To work around that, one can use _dl_sym()
to just get the pointer to the original dlsym()
and use that for everything else (like in the "standard" LD_PRELOAD
hook approch without hooking dlsym
at all):
extern void *_dl_sym(void *, const char *, void *);
extern void *dlsym(void *handle, const char *name)
{
static void * (*real_dlsym)(void *, const char *)=NULL;
if (real_dlsym == NULL)
real_dlsym=_dl_sym(RTLD_NEXT, "dlsym", dlsym);
/* my target binary is even asking for dlsym() via dlsym()... */
if (!strcmp(name,"dlsym"))
return (void*)dlsym;
return real_dlsym(handle,name);
}
UPDATE FOR 2021 / glibc-2.34
Beginning with glibc 2.34, the function _dl_sym()
is no longer publicly exported. Another approach I can suggest is to use dlvsym()
instead, which is offically part of the glibc API and ABI. The only downside is that you now need the exact version to ask for the dlsym
symbol. Fortunately, that is also part of the glibc ABI, unfortunately, it varies per architecture. However, a grep 'GLIBC_.*\bdlsym\b' -r sysdeps
in the root folder of the glibc sources will tell you what you need:
[...]
sysdeps/unix/sysv/linux/i386/libc.abilist:GLIBC_2.0 dlsym F
sysdeps/unix/sysv/linux/i386/libc.abilist:GLIBC_2.34 dlsym F
[...]
sysdeps/unix/sysv/linux/x86_64/64/libc.abilist:GLIBC_2.2.5 dlsym F
sysdeps/unix/sysv/linux/x86_64/64/libc.abilist:GLIBC_2.34 dlsym F
Glibc-2.34 actually introduced new versions of this function, but the old versions are still be kept around for backwards compatibilty.
For x86_64, you could use:
real_dlsym=dlvsym(RTLD_NEXT, "dlsym", "GLIBC_2.2.5");
And, if you both like to get the newest version, as well as a potentially one of another interceptor in the same process, you can use that version to do an unversioned query again:
real_dlsym=real_dlsym(RTLD_NEXT, "dlsym");
If you actually need to hook both dlsym
and dlvsym
in your shared object, this approach of course won't work either.
UPDATE: hooking both dlsym()
and dlvsym()
at the same time
Out of curiosity, I thought about some approach to hook both of the glibc symbol query methods, and I came up with a solution using an additional wrapper library which links to libdl
. The idea is that the interceptor library can dynamically load this library at runtime using dlopen()
with the RTLD_LOCAL | RTLD_DEEPBIND
flags, which will create a separate linker scope for this object, also containing the libdl
, so that the dlsym
and dlvsym
will be resolved to the original methods, and not the one in the interceptor library. The problem now is that our interceptor library can not directly call any function inside the wrapper library, because we can not use dlsym
, which is our original problem.
However, the shared library can have an initialization function, which the linker will call before the dlopen()
returns. We just need to pass some information from the initialization function of the wrapper library to the interceptor library. Since both are in the same process, we can use the environment block for that.
This is the code I came up with:
dlsym_wrapper.h
:
#ifndef DLSYM_WRAPPER_H
#define DLSYM_WRAPPER_H
#define DLSYM_WRAPPER_ENVNAME "DLSYM_WRAPPER_ORIG_FPTR"
#define DLSYM_WRAPPER_NAME "dlsym_wrapper.so"
typedef void* (*DLSYM_PROC_T)(void*, const char*);
#endif
dlsym_wrapper.c
, compiled to dlsym_wrapper.so
:
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
#include "dlsym_wrapper.h"
__attribute__((constructor))
static void dlsym_wrapper_init()
{
if (getenv(DLSYM_WRAPPER_ENVNAME) == NULL) {
/* big enough to hold our pointer as hex string, plus a NUL-terminator */
char buf[sizeof(DLSYM_PROC_T)*2 + 3];
DLSYM_PROC_T dlsym_ptr=dlsym;
if (snprintf(buf, sizeof(buf), "%p", dlsym_ptr) < (int)sizeof(buf)) {
buf[sizeof(buf)-1] = 0;
if (setenv(DLSYM_WRAPPER_ENVNAME, buf, 1)) {
// error, setenv failed ...
}
} else {
// error, writing pointer hex string failed ...
}
} else {
// error: environment variable already set ...
}
}
And one function in the interceptor library to get the pointer to the
original dlsym()
(should be called only once, guared by a mutex):
static void *dlsym_wrapper_get_dlsym
{
char dlsym_wrapper_name = DLSYM_WRAPPER_NAME;
void *wrapper;
const char * ptr_str;
void *res = NULL;
void *ptr = NULL;
if (getenv(DLSYM_WRAPPER_ENVNAME)) {
// error: already defined, shoudn't be...
}
wrapper = dlopen(dlsym_wrapper_name, RTLD_LAZY | RTLD_LOCAL | RTLD_DEEPBIND | RTLD_NOLOAD);
if (wrapper) {
// error: dlsym_wrapper.so already loaded ...
// it is important that we load it by ourselves to a sepearte linker scope
}
wrapper = dlopen(dlsym_wrapper_name, RTLD_LAZY | RTLD_LOCAL | RTLD_DEEPBIND);
if (!wrapper) {
// error: dlsym_wrapper.so can't be loaded
}
ptr_str = getenv(DLSYM_WRAPPER_ENVNAME);
if (!ptr_str) {
// error: dlsym_wrapper.so failed...
}
if (sscanf(ptr_str, "%p", &ptr) == 1) {
if (ptr) {
// success!
res = ptr;
} else {
// error: got invalid pointer ...
}
} else {
// error: failed to parse pointer...
}
// this is a bit evil: close the wrapper. we can be sure
// that libdl still is used, as this mosule uses it (dlopen)
dlclose(wrapper);
return res;
}
This of course assumes that dlsym_wrapper.so
is in the library search path. However, you may prefer to just inject the interceptor library via LD_PRELOAD
using a full path, and not modifying LD_LIBRARY_PATH
at all. To do so, you can add dladdr(dlsym_wrapper_get_dlsym,...)
to find the path of the injector library itself, and use that for searching the wrapper library, too.