What determines binary compatibility of shared libraries on Linux?
Asked Answered
L

1

8

I am building a shared library on Linux, which serves as a "plugin" to some software (to be specific, it extends Mathematica).

I find that if I build on Ubuntu 16.04, the resulting library does not work on RHEL 7.6. However, if I build on RHEL 7.6, the library works both on RHEL and Ubuntu.

By "does not work", I mean that Mathematica refuses to load it, but it only gives a generic and unuseful "failed to load" error message.

I have eliminated a number of factors that could break compatibility, and I cannot find any more. This question is about what else might affect compatibility than what I list below.

The library is written in a mix of C and C++, but it exports a C interface. It is built with -static-libstdc++ and -static-libgcc. If I use ldd on the .so file, the only dependencies it lists are:

linux-vdso.so.1 =>  (0x00007ffc757b9000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa286e62000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa287854000)

One potential source of incompatibility is the glibc version. I looked at the symbols in the library using nm -gC, and the highest GLIBC version reference I see when I build on Ubuntu is 2.14. RHEL 7.6 has glibc 2.17, i.e. newer than 2.14. Thus I do not believe that the incompatibility is due to glibc.

What else is there that can cause a shared object compiled on Ubuntu 16.04 not to load on RHEL 7.6?


Update: I managed to coax Mathematica to give a more descriptive error (it was a not very well documented feature), so I have a concrete error message. The same could also be seen with @Ctx's suggestion to set LD_DEBUG=all.

The error is:

IGraphM.so: undefined symbol: _ZTVNSt7__cxx1115basic_stringbufIcSt11char_traitsIcESaIcEEE

(IGraphM.so is my library.)

This function would seem to be part of libstdc++ unless I am mistaken. Why does this error occur if I specified -static-libstdc++ and verified that ldd does not list libstdc++?


Update 2:

Per the advice by SergeyA and this QA, I compiled after defining _GLIBCXX_USE_CXX11_ABI=0. This does fix the incompatibility.

But I still do not understand why. The error message complains about a missing symbol. Where is this symbol normally loaded from? I was under the impression that if I use -static-libstdc++, then it should be contained within my library. This seems to be wrong.

While I seem to have a practical solution for the incompatibility for this specific case, I would appreciate some explanations so in the future I can solve similar problems on my own.

Lise answered 13/11, 2018 at 14:51 Comment(20)
glibc should be fine. I assume both systems are the same platform (say, x86-64). Perhaps Mathematica is a 32-bit application but you're supplying a 64-bit shared object? Have you tried creating an empty "dummy" plugin with no code, and using C only, to see if that works? The result might rule out some possibilities either way.Thyme
Also look around and see if Mathematica writes a log file that might contain more information. I'd be surprised if it didn't log something useful somewhere.Thyme
You can try to export LD_DEBUG=all in the console, where you start mathematica. Maybe it yields some debug output which can help you.Geez
@Thyme Yes, same architecture (x86-64) and same version of Mathematica, but now that you asked, I started to wonder if the CPUs of the two machines support the same instruction set (e.g. when compiled on one system, gcc emits an instruction that only newer CPUs have). However, that would cause a crash, not a non-fatal failure, no?Lise
@Geez Thanks! This is a good tip, it's the kind of thing I was looking for.Lise
With @Ctx's suggestion, I see a concrete error, namely error: symbol lookup error: undefined symbol: _ZTVNSt7__cxx1115basic_stringbufIcSt11char_traitsIcESaIcEEE (fatal) That is very strange, because it looks to be from the libstdc++, which I thought was statically linked, and which is not listed by ldd.Lise
You missing symbol demangles to vtable for std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> > #33395434 might have some clues.Taster
@Taster The advice given there to define _GLIBCXX_USE_CXX11_ABI=0 works. But I do not understand why this matters at all. I thought that if libstdc++ was statically linked, then my library is self-contained, and whether it works should not depend on the environment in which it is run. I am probably wrong, but some explanation would be welcome :-)Lise
@Lise what happens if you link your shared library with --no-allow-shlib-undefined linker flag?Taster
@Taster No error are reported with this linker option either (regardless of the _GLIBCXX_USE_CXX11_ABI value).Lise
@Lise ok, do you want an answer? ;)Taster
@Taster Yes, but I would especially be interested in an explanation of why this happens. So this symbol (std::__cxx11::basic_stringbuf) seems to be available on one OS, but not the other. Where is it loaded from? If it is loaded from another shared library (e.g. libstdc++.so.6) then why is that library not listed by ldd? Is this because it was loaded not by my library, but the host process that uses my library as a plugin?Lise
Either way, you gave me the key to fixing this, so I would accept an answer.Lise
Maybe this happens because my library uses exceptions internally? (Externally, it presents a C interface, not a C++ one.)Lise
@Lise I do not have full explanation, but I posted an answer to the best of my knowledge.Taster
@Taster I am sorry, there was a misunderstanding. I thought that --no-allow-shlib-undefined just triggered some warnings in certain cases. The .so file I get with that option still does not work. By "no messages" I meant that specifying this option does not trigger any warnings. The ABI downgrade does work fine.Lise
OK, I will edit my answer to remove this part.Taster
@Taster I found this answer that seems to say that even if I link libstdc++ statically, sometimes the symbols already loaded by the host process (i.e. a different libstdc++) will be used (instead of the symbols linked statically into my library).Lise
For highly platform-dependent questions like this, you should always state the compiler and its version number. This helps other people help you. E.g., there is an old "wontfix" Clang bug / "feature" that affects vtable generation for a class with a pure virtual d'tor and no non-pure virtual methods (it is technically an accurate implementation of a spec bug in the Itanium ABI, hence "feature").Williams
do you use std string as function parameter or a return value? That would explain itFourinhand
T
4

I can't explain why your .so library doesn't link all the used symbols statically (and instead leaves them as undefined), but I can offer practical suggestion on how to fix the issue at hand.

You can stop linking libstdc++ statically into your plugin, and instead use the one available to host system. This doesn't work for you because of ABI incompatibility between build and target platforms. You can downgrade an ABI in use for your plugin by specifying macro _GLIBCXX_USE_CXX11_ABI=0.

Taster answered 13/11, 2018 at 17:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.