Why can the value of the symbol returned by dlsym() be null?

Asked 18/12, 2012 at 21:40 Answered 3/12, 2018 at 9:24

In Linux. Per the dlsym(3) Linux man page,

    *Since the value of the symbol could actually be NULL
    (so that a NULL return from dlsym() need not indicate an error),*

Why is this, when can a symbol (for a function, specifically) be actually NULL? I am reviewing code and found a piece using dlerror to clean first, dlsym next, and dlerror to check for errors. But it does not check the resulting function from being null before calling it:

dlerror();
a_func_name = ...dlsym(...);
if (dlerror()) goto end;
a_func_name(...); // Never checked if a_func_name == NULL;

I am just a reviewer so don't have the option to just add the check. And perhaps the author knows NULL can never be returned. My job is to challenge that but don't know what could make this return a valid NULL so I can then check if such a condition could be met in this code's context. Have not found the right thing to read with Google, a pointer to good documentation would be enough unless you want to explain explicitly which would be great.

Average answered 18/12, 2012 at 21:40 Comment(1)

You can define, in assembler or using GCC specific tricks, a given symbol to be at address 0 and you could dlsym that symbol. – Semmes 19/12, 2012 at 6:12

Well, if it's returned with no errors, then pointer is valid and NULL is about as illegal as any random pointer from the shared object. Like the wrong function, data or whatever.

Erny answered 18/12, 2012 at 21:45 Comment(2)

This would make sense if the return value was the value of the shared variable (or function). But it's supposed to be the address, isn't it (or does that depend on flags)? Well, presumably it's actually reading a value out of a table of addresses, and the binary could be edited to have zeros (or any invalid pointer as you said) in that table. – Goerke 18/12, 2012 at 22:25

Well, I'm not sure, but couldn't exported symbol have an absolute address? – Erny 18/12, 2012 at 22:26

I know of one particular case where the symbol value returned by dlsym() can be NULL, which is when using GNU indirection functions (IFUNCs). However, there are presumably other cases, since the text in the dlsym(3) manual page pre-dates the invention of IFUNCs.

Here's an example using IFUNCs. First, a file that will be used to create a shared library:

$ cat foo.c 
/* foo.c */

#include <stdio.h>

/* This is a 'GNU indirect function' (IFUNC) that will be called by
   dlsym() to resolve the symbol "foo" to an address. Typically, such
   a function would return the address of an actual function, but it
   can also just return NULL.  For some background on IFUNCs, see
   https://willnewton.name/uncategorized/using-gnu-indirect-functions/ */

asm (".type foo, @gnu_indirect_function");

void *
foo(void)
{
    fprintf(stderr, "foo called\n");
    return NULL;
}

Now the main program, which will look up the symbol foo in the shared library:

$ cat main.c
/* main.c */

#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char *argv[])
{
    void *handle;
    void (*funcp)(void);

    handle  = dlopen("./foo.so", RTLD_LAZY);
    if (handle == NULL) {
        fprintf(stderr, "dlopen: %s\n", dlerror());
        exit(EXIT_FAILURE);
    }

    dlerror();      /* Clear any outstanding error */

    funcp = dlsym(handle, "foo");

    printf("Results after dlsym(): funcp = %p; dlerror = %s\n",
            (void *) funcp, dlerror());

    exit(EXIT_SUCCESS);
}

Now build and run to see a case where dlsym() returns NULL, while dlerror() also returns NULL:

$ cc -Wall -fPIC -shared -o libfoo.so foo.c
$ cc -Wall -o main main.c libfoo.so -ldl
$ LD_LIBRARY_PATH=. ./main
foo called
Results after dlsym(): funcp = (nil); dlerror = (null)

Coward answered 3/12, 2018 at 8:32 Comment(0)

Well, if it's returned with no errors, then pointer is valid and NULL is about as illegal as any random pointer from the shared object. Like the wrong function, data or whatever.

Erny answered 18/12, 2012 at 21:45 Comment(2)

Well, I'm not sure, but couldn't exported symbol have an absolute address? – Erny 18/12, 2012 at 22:26

It can't be if the library/PIE is a product of normal C compilation, as C won't ever put a global object at the NULL address, but you can get a symbol to resolve to NULL using special linker tricks:

null.c:

#include <stdio.h>
extern char null_addressed_char;
int main(void) 
{
    printf("&null_addressed_char=%p\n", &null_addressed_char);
}

Compile, link, and run:

$ clang null.c -Xlinker --defsym -Xlinker null_addressed_char=0 && ./a.out
&null_addressed_char=(nil)

If you don't allow any such weirdness, you can treat NULL returns from dlsym as errors.

Peridotite answered 3/12, 2018 at 9:24 Comment(7)

Thanks. But, in what circumstances could that lead dlsym() to return NULL as the symbol value? I can't produce a case, but there's probably something I am missing. – Coward 4/12, 2018 at 10:52

@Coward If you were to make a shared lib out of the null.c file using defsym to provide the null_addressed_char symbol, you should be able to dlopen it and dlsym null_addressed_char and it should return NULL without setting an error. Theoretically. Practically, Linux's dynamic linker is setting an error and unless you remove main (as it references null_addressed_char) it won't even dlopen. In light of what the dlsym manpage says, I'd call that a bug. – Peridotite 4/12, 2018 at 11:47

@Coward It's funny but it looks like while the dlsym manpage wants people to allow for the possibility of dynamic symbols being NULL addressed, the Linux dynamic linker doesn't play nice with them. – Peridotite 4/12, 2018 at 11:49

"Practically, Linux's dynamic linker is setting an error and unless you remove main (as it references null_addressed_char) it won't even dlopen.". Yes. That's what I see. And if one removes the reference to null_addressed_char, so that the library can load, then a look up with dlsym() gives NULL + dlerror()="undefined symbol". This seems specifically to do with with null_addressed_char having the value 0. If I --defsym to a nonzero value, then dlsym() succeeds. But, what text in the dlsym(3) man page makes you conclude that it's a bug? – Coward 4/12, 2018 at 12:31

@Coward The dlsym manpage says: "Since the value of the symbol could actually be NULL (so that a NULL return from dlsym() need not indicate an error), the correct way to test for an error is to call dlerror(3) to clear any old error conditions, then call dlsym(), and then call dlerror(3) again, saving its return value into a variable, and check whether this saved value is not NULL. " The Linux linker doesn't allow for this case. The lookup succeeded, it just returned NULL, so dlerror() shouldn't have been set. – Peridotite 4/12, 2018 at 13:46

@Coward Similarly, dlopen should've succeeded even &null_addressed_char was referenced, since that reference should've been successfully resolved to NULL. – Peridotite 4/12, 2018 at 13:48

Yes, thinking about it more (and also after lightly instrumenting the lookup code in glibc), I agree that this does look like a bug. – Coward 4/12, 2018 at 15:13

-2

dlerror() returns the last error, not the status of the last call. So if nothing else the code you show may potentially get a valid result from dlsym() and fool itself into thinking there was an error (because there was still one in the queue). The purpose behind dlerror is to provide human-readable error messages. If you aren't printing the result, you are using it wrong.

Mckamey answered 18/12, 2012 at 22:8 Comment(7)

That's the purpose of the dlerror() call immediately before dlsym -- to clear the most recent error variable. There is no queue (if the man page can be believed). – Goerke 18/12, 2012 at 22:16

Ah, missed that. Yeah, so this is senseless but correct. dlsym is documented as returning NULL on error, but a non-NULL result from dlerror is equivalent (barring things like threadsafety bugs -- obviously there's a race here if another thread is doing the same nonsense). It's still an abuse of the API. – Mckamey 18/12, 2012 at 22:19

Surely every thread has its own copy of the error variable. In any case, this is the correct way to call the API, not an abuse. Compare: errno = 0; int a = itoa(s); if (errno) ... because if (a) cannot distinguish s = "0"; from s = "Garbage";. – Goerke 18/12, 2012 at 22:22

This is the correct way, but on the other hand, in this particular case checking for NULL should be enough as well. – Erny 18/12, 2012 at 22:28

That's probably so in glibc (the same is true for errno/perror), but not per the docs which are silent. I wouldn't count on all C libraries being as robust... In fact I just checked bionic and it it's not threadsafe at all. And "abuse" isn't about bugs, it's about intent -- dlerror is designed to format error messages, period. Using it to avoid the need to check the return value that you already have is just insane, sorry. – Mckamey 18/12, 2012 at 22:30

This isn't "an abuse of the API", it's exactly how the dlsym man page says to use it. It's a little circuitous if you know the symbol can't be NULL, but if it can be it's exactly what you should do – Forint 19/5, 2015 at 19:4

Off: Nowadays errno is a function returning an integer-pointer (#define errno (*_thread_errno_addr())). – Drogheda 3/12, 2018 at 18:0

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags