std::regex and dual ABI
Asked Answered
C

1

14

Today I have found an interesting case of the dual libstdc++ ABI affecting compatibility of libraries.

Long story short, I have two libraries that both use std::regex internally. One is built with the CXX11 ABI and one is not. When these two libraries are linked together in one executable, it crashes on startup (before main is entered).

The libraries are unrelated and do not expose interfaces that mention any std:: types. I thought such libraries should be immune to dual ABI issues. Apparently not!

The issue can be reproduced easily this way:

// file.cc
#include <regex>
static std::regex foo("(a|b)");

// main.cc
int main() {}

// build.sh
g++ -o new.o file.cc
g++ -o old.o file.cc -D_GLIBCXX_USE_CXX11_ABI=0 
g++ -o main main.cc new.o old.o
./main

And the output is:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

The issue persists whatever I do. file.cc can be made into two separate source files, compiled into separate shared libraries, the two std::regex objects may have different names, they can be made global, static or automatic (one will need to call corresponding functions from main then). None of this helps.

Apparently (this is what comes out of my short investigation) the libstdc++ regex compiler has some kind of internal static data that stores std::string, and when two ABI-incompatible pieces of code try to use that data, it gets conflicting ideas about the layout of std::string objects.

So my questions are:

  • Is there a workaround for this problem?
  • Should this be considered a bug in libstdc++?

The problem is reproducible in several versions of g++/libstdc++ (I tried a few from 5.4 to 7.1). It doesn't occur with libc++.

Cardwell answered 17/7, 2018 at 13:14 Comment(0)
F
5

The problem stems to the origin of why libstdc++ has dual ABI. From this two important statements: (1) it was specifically introduced to conform with the new 11th standard in regards to how string (and other that is not relevant for this discussion) works; (2) _GLIBCXX_USE_CXX11_ABI works independently of a dialect, and used to compile C++03 and C++11 together.

regex module was introduced in the 11th standard, and uses strings internally. So you build your c++-11 (or higher) template basic_regex code with _GLIBCXX_USE_CXX11_ABI=0. That means you are using c++-11 regex object with a pre-c++-11 implementation of strings.

Should that work? Depending on how regex uses strings, if it does rely on new implementation (e.g. forbidden copy-on-write), then no, otherwise yes. What can happen? Anything.

To the bottom of it, you should not use _GLIBCXX_USE_CXX11_ABI=0 on any new code that uses post-c++-03 dialect (i.e. c++-11,14,17,...), because it introduces implementations that are not compatible with the new guarantees on standard objects, particularly std::string.

Can I use _GLIBCXX_USE_CXX11_ABI=0 with std>=c++-11? GCC developers took care that you can run new stuff with an old ABI, it benefits with a possibility of having new features running with old shared libraries. However that might not be a good idea, also because the code is in a new standard however the standard library does not conform to this standard, might turn out badly later. You problem is kind of an example of that. That you can by mix two ABI and here we are it is not working.

_GLIBCXX_USE_CXX11_ABI=0 is really usable if you call, for example, foo(std::string const&) defined in some .so library, compiled with an old ABI. Then in your new source file you would like to compile this source with an old ABI. But all other sources you would keep with a new ABI.

The problem is reproducible in several versions of g++/libstdc++ (I tried a few from 5.4 to 7.1). It doesn't occur with libc++.

libc++ does not have this duality, i.e. single string implementation.

I do not give a clear answer where this exception is coming from or why. I only might guess that there is some shared global resource related to regex, string, or locale that is not distinguished clearly between ABIs. And different ABIs work with it differently what can result in anything, e.g. exception, segment fault, any unexpected behavior. IMHO, I prefer to stick with the rules, I mentioned above, that are most closely reflect the intent of _GLIBCXX_USE_CXX11_ABI and dual ABI.

Fallow answered 17/7, 2018 at 15:23 Comment(4)
"So you build your c++-11 (or higher) template basic_regex code with _GLIBCXX_USE_CXX11_ABI=0". This is not a problem in and by itself. It works perfectly well when the entire program is built with _GLIBCXX_USE_CXX11_ABI=0. If as you say std::regex is not usable with the old ABI, libstdc++ should immediately #error if _GLIBCXX_USE_CXX11_ABI is defined to be 0 when <regex> is included. I didn't know libc++ doesn't have dual ABIs, thanks!Cardwell
"This is not a problem in and by itself." @n.m. I have updated the answer. IMHO, that exactly is a problem, not the eventual, but the root.Fallow
"there is some shared global resource related to regex" This is my guess too. When I run this is the debugger, I see a call to new with huge size argument that actually makes sense when viewed as a pointer. So this is consistent with two versions of std::string layouts overlapping.Cardwell
Yup, I saw exactly the same it comes from _M_mutate that is exactly the function that is relative to using copy-on-write in an old ABI, what, simply speaking, is using a shared memory between two independent string objects.Fallow

© 2022 - 2024 — McMap. All rights reserved.