Building .so with recursive function in it
Asked Answered
B

1

4

During working on some project, I faced the issue that I can't build so library. I have got the error like: relocation R_X86_64_PC32 against symbol '' can not be used when making a shared object; recompile with -fPIC Eventually I managed to find the root cause. And it was recursive function in the library. For example, I have the following well know example:

.section .text
.globl factorial
.type  factorial,STT_FUNC
factorial:
    push %rbp
    mov %rsp,%rbp

    mov 16(%rbp),%rax
    cmp $1,%rax
    je end_factorial
    dec %rax
    push %rax  #this is how we pass the argument to function
    call factorial
    pop %rbx
    inc %rbx
    imul %rbx,%rax
end_factorial:
    mov %rbp, %rsp
    pop %rbp
    ret

Now, let's try to build the shared library:

as  -g -o fact.o fact.s
ld -shared fact.o -o libfact.so
ld: fact.o: relocation R_X86_64_PC32 against symbol `factorial' can not be used when making a shared object; recompile with -fPIC

If I wrap the factorial function, like this:

.section .text
.globl fact
.type  fact,STT_FUNC
fact:
factorial:
    push %rbp
    mov %rsp,%rbp

    mov 16(%rbp),%rax
    cmp $1,%rax
    je end_factorial
    dec %rax
    push %rax  #this is how we pass the argument to function
    call factorial
    pop %rbx
    inc %rbx
    imul %rbx,%rax
end_factorial:
    mov %rbp, %rsp
    pop %rbp
    ret

I can build the so library with no errors.


The question is: why do I get an error while building shared library that contains recursive function? P.S. The static linking works fine in that case. Thanks!

Bulganin answered 19/3, 2018 at 21:31 Comment(1)
For global symbols, you need to use either the PLT or the GOT, that is call factorial@PLT or call *factorial@GOTPCREL(%rip). If you wish, you can do the wrapping in reverse of course so you can keep the public factorial symbol and use some local for the recursion.Suave
A
4

factorial is a global label, so it can be subject to symbol interposition. See Sorry state of dynamic libraries on Linux. (Also, an example of interposing malloc with LD_PRELOAD, and some docs).

When making a shared library, the target of the call factorial instruction is not assumed to be the factorial: label defined in the same file. That's because you used .globl factorial.

As Jester points out, you should define a separate local label for the call target so you can keep the global factorial name.

You can make a simpler "helper" function that uses its own custom calling convention and doesn't make stack frames with %rbp for the recursive part, if you want. (But taking an arg on the stack is already non-standard for x86-64).


You could call through the PLT or memory-indirect through the GOT, but don't do that; you don't want the extra overhead on each call, and you don't want symbol interposition to replace your non-standard-calling-convention implementation with a normal one that passes the first integer arg in %rdi.

Speaking of which, passing an arg on the stack is slow. You do need to save/restore something, unless you rewrite the recursion to be tail-recursive, like factorial_helper(accumulator*n, n-1). But you don't also need to make a stack frame with %rbp every time.

You're not maintaining 16-byte stack alignment before the call, but you don't need that when calling your own private functions that don't care about that.

Of course, if you care about performance at all, you wouldn't use a recursive implementation in the first place, because the only reason to do that for factorial is as a learning exercise. Rewriting to tail-recursive allows you (or the compiler if writing in C) to turn the call / ret into a jmp, which of course becomes a loop.


Related: What are good examples that actually motivate the study of recursion?. A binary tree traversal, or the Ackermann function, are easier to implement recursively than iteratively, but factorial or Fibonacci are harder (and in the case of Fibonacci, much slower).

Avelin answered 19/3, 2018 at 22:25 Comment(3)
As a side note, GCC8 should know to optimize recursive calls to not use PLT (thanks to PR56727).Archives
Thank you very much, now it is clear. Factorial was just an example. Also you said about 16 byte stack alignment. I thought that it should be 8 byte alignment on x86-64, isn't it?Bulganin
@AndrewBolotov: if this answers your question, click the "accept" checkmark under the up/down vote arrows. :)Avelin

© 2022 - 2024 — McMap. All rights reserved.