Why is library API + compiler ABI enough to ensure compatibility between objects with different versions of gcc?

Asked 22/8, 2018 at 23:14 Answered 2/9, 2018 at 14:52

I've encountered a case where I may want to use a C++ shared object library compiled with one version of gcc with some code that will be compiled with another version of gcc. In particular, I want to use methods that return some STL containers like std::string and std::map.

The gcc website and many old stackoverflow posts (e.g. here) discuss this issue. My current understanding is that

Most of the concern and most of the posts on this issue are about cross-compatibility between .so files and .dll files. This is very difficult, due to different compiler ABIs.
For cross-compatibility between .so files compiled with different versions of gcc (at least with gcc version >= 3.4), all you need to ensure is that the standard library API hasn't changed (and, if it has, there is dual ABI support).

My question has to do with how this works at a machine level. It seems like it is possible that gcc can change the header implementing std::string, even if the library API has not changed, in order to make it more efficient or for other reasons. If so, then two different pieces of code are compiled with two different std::string headers, and are basically defining two different classes with the same name. How can we be guaranteed that, when we pass a std::string from code that uses one header to code that uses another, the object won't be mangled or misread somehow?

For example, suppose that I have the following files:

// File a.h:

#ifndef FILE_A
#define FILE_A

#include <string>

class X {
  public:
    std::string f();
};

#endif  // FILE_A


// File a.cpp:

#include "a.h"

std::string X::f() {
  return "hello world";
}


// File b.cpp:

#include <iostream>
#include <string>
#include "a.h"

int main() {
  std::string x = X().f();
  std::cout << x << std::endl;
}

(The only purpose of the class X here is to introduce a bit more name-mangling into the shared object library while I am testing how this works.)

Now I compile these as follows:

/path/to/gcc/version_a/bin/g++ -fPIC -shared a.cpp -o liba.so
/path/to/gcc/version_b/bin/g++ -L. -la -o b b.cpp

When I execute b, then b has a definition of std::string that comes from the header in version_b. But the object that is produced by X().f() relies on machine code that was compiled using a copy of the header that came from version_a of gcc.

I don't understand very much about the low-level mechanics of compilers, linkers, and machine instructions. But it seems to me like we are breaking a fundamental rule here, which is that the definition of a class has to be the same every time it is used, and if not, we have no guarantee that the scenario above will work.

Edit: I think that the main resolution to my confusion is that the phrase "library API" means something much more general in this context than it does in the uses of the term "API" that I am used to. The gcc documentation seems to indicate, in a very vague way, that pretty much any change to the include files that implement the standard library can be considered a change in the library API. See the discussion in the comments on Mohan's answer for details.

Foulmouthed answered 22/8, 2018 at 23:14 Comment(1)

"basically defining two different classes with the same name" to maintain compatibility, gcc barely modifies the classes, only in ways it knows won't cause problems. Some bugs are kept for decades because fixing them would break the ABI. – Percent 30/8, 2018 at 16:0

It seems like it is possible that gcc can change the header implementing std::string

It can't make arbitrary changes. That would (as you surmise) break things. But only some changes to std::string will affect the memory layout of the class, and those are the ones that matter.

For an example of an optimisation that wouldn't affect the memory layout: they could change the code inside

size_t string::find (const string& str, size_t pos = 0) const;

to use a more efficient algorithm. That wouldn't change the memory layout of the string.

In fact, if you temporarily ignore the fact that everything is templated and so has to be in header files, you can imagine string as being defined in a .h file and implemented in a .cpp file. The memory layout is determined only from the contents of the header file. Anything in the .cpp file could be safely changed.

An example of something they couldn't do is to add a new data member to string. That would definitely break things.

You mentioned the dual ABI case. What happened there is that they needed to make a breaking change, and so they had to introduce a new string class. One of the classes is std::string and the other std::_cxx11::string. (Messy things happen under the hood so most users don't realise they are using std::_cxx11::string on newer versions of the compiler/standard library.)

Midwest answered 2/9, 2018 at 14:10 Comment(8)

Sorry for the long delay in response; I've been on vacation. I think that this answer mostly makes sense except for one major terminology question: What does "library API" actually mean, then? This must have a different meaning in the context of gcc and ABI compatibility than it does in the wider world of programming. To me, with the definition of "API" that I am used to, it is completely possible to add a private data member to a class and not change the API! – Foulmouthed 6/9, 2018 at 15:25

My previous comment is the main remaining conceptual issue, but just for completeness, let me recap my understanding at this point. In order for things to work, we really only need the data members to be the same, because objects actually only contain data members. Methods are implemented somewhere else by the compiler. Generally speaking, it doesn't really matter if one copy of a string calls one compiled version of a method and another copy calls another version, as long as those two versions do the same thing. So the only point that things can go wrong is in copying the data members. – Foulmouthed 6/9, 2018 at 15:28

@Foulmouthed I can't swear I have the terminology down. But I would say that in C++, the library API changes if you add a private data member. It's also not quite the case that you need just the data members to be the same (and in the same order). Two different compilers are allowed to generate different memory layouts for the same struct with the same members, and indeed a single compiler can generate different memory layouts if you fiddle with the switches that affect packing. Also, there is something called the virtual function table which is generated for a class and ... – Midwest 6/9, 2018 at 16:32

the contents of that table will change if you add a new virtual function. Have a read of en.wikipedia.org/wiki/Virtual_method_table . That will also affect binary compatibility. There are probably other cases as well; I don't think about this often enough to be able to list things exhaustively. – Midwest 6/9, 2018 at 16:34

It's hard to fit too much information into comments, so let me give you some links and you can ask more specific questions after reading them. opensource.apple.com/source/gcc/gcc-5026.1/libstdc++-v3/docs/… (just read the very first section, up to "versioning") – Midwest 6/9, 2018 at 16:35

#99797 – Midwest 6/9, 2018 at 16:36

#2171677 – Midwest 6/9, 2018 at 16:37

I had seen the first link before, but when I just re-read it, I noticed a very interesting definition: "The C++ Standard Library has many include files, types defined in those include files, specific named functions, and other behavior. The text of these behaviors, as written in source include files, is called the Application Programing Interface, or API." If I am reading this correctly, it basically means that any change to the source include files may be considered a change to the API. This definition is very poor and vague, but this resolves my confusion at least. – Foulmouthed 10/9, 2018 at 16:9

GCC has to do whatever it takes so that our programs work. If using different implementations of std::string in different translation units means our programs are broken, then gcc is not allowed to do that.

This is applicable to any given version of GCC.

GCC goes out of its way to remain backwards compatible. That is, it strives that the above remains applicable across different version of GCC and not just within a given version. It however cannot guarantee that all its versions up to eternity will remain compatible. When there's no longer a possibility to keep backward compatibility, an ABI change is introduced.

Since the big GCC-5 ABI change, it is introduced in such a way so that it tries to deliberately break your builds if you combine old and new binaries. It does so by renaming std::string and std::list classes at the binary level. This propagates to all functions and templates that have std::string or std::list parameters. If you try to pass e.g. an std::string between translation units compiled against incompatible ABI versions, your program will fail to link. The mechanism is not 100% foolproof but it catches many common cases.

The alternative would be to silently produce broken executables, which no one wants.

The dual ABI is a way for the newer versions of GCC standard library binary to remain compatible with older executables. Basically it has two versions of everything that involves std::string and std::list, with different symbol names for the linker, so older programs that use the old versions of names can still be loaded and ran.

There's also a compilation flag that allows the newer versions of GCC to produce binaries compatible with the older ABI (and incompatible with newer binaries produced without the compatibility flag). It is not recommended to use it unless you absolutely have to.

Leaflet answered 2/9, 2018 at 14:52 Comment(2)

This is a great explanation of the dual ABI system, but it doesn't really address the conceptual issue that I have with how all of this works at the binary level. – Foulmouthed 6/9, 2018 at 15:29

@Foulmouthed The dual ABI mechanism makes reasonably sure that either version_a and version_b are binary compatible, or your code will fail to build. If you are using two incompatible versions with no dual ABI mechanism in place, then you are breaking the rules and your code won't work. If you still have a conceptual issue, you may want to explain in greater detail what it is exactly, because I have no idea whatsoever. – Leaflet 6/9, 2018 at 15:52