Providing a C API to your C++ library and strict aliasing
Asked Answered
C

1

25

A common pattern when providing a C API is to forward declare some opaque types in your public header which are passed to your API methods and then reinterpret_cast them into your defined C++ types once inside the translation unit (and therefore back in C++ land).

Using LLVM as an example:

In Types.h this typedef is declared:

typedef struct LLVMOpaqueContext *LLVMContextRef;

LLVMOpaqueContext is not referenced anywhere else in the project.

In Core.h the following method is declared:

LLVMContextRef LLVMContextCreate(void);

Which is defined in Core.cpp:

LLVMContextRef LLVMContextCreate() {
  return wrap(new LLVMContext());
}

wrap (and unwrap) is defined by a macro in CBindingWrapping.h:

#define DEFINE_SIMPLE_CONVERSION_FUNCTIONS(ty, ref)     \
  inline ty *unwrap(ref P) {                            \
    return reinterpret_cast<ty*>(P);                    \
  }                                                     \
                                                        \
  inline ref wrap(const ty *P) {                        \
    return reinterpret_cast<ref>(const_cast<ty*>(P));   \
}

And used in LLVMContext.h:

DEFINE_SIMPLE_CONVERSION_FUNCTIONS(LLVMContext, LLVMContextRef)

So we see that the C API basically takes a pointer to an LLVMOpaqueContext and casts it into an llvm::LLVMContext object to perform whatever method is called on it.

My question is: isn't this in violation of the strict aliasing rules? If not, why not? And if so, how can this type of abstraction at the public interface boundary be acheived legally?

Cursed answered 12/3, 2018 at 11:25 Comment(0)
L
22

It's not a strict aliasing violation. To start with, strict aliasing is about accessing an object via a glvalue of the wrong type.

In your question, you create a LLVMContext, and then use a LLVMContext lvalue to access it. No illegal aliasing there.

The only issue which may arise is if the the pointer conversion doesn't yield back the same pointer. But that too is not a problem, since reinterpret_cast is guaranteed to give back the same pointer in a round-trip conversion. So long as the pointer type we convert to and back from is to suitably aligned data (i.e. not stricter than the original type).

Whether or not it's a good or bad way to go about things is debatable. I personally would not bother with LLVMOpaqueContext and return a struct LLVMContext*. It's still an opaque pointer, and it doesn't matter that the C header declares it with struct while the type definition is with class. The two are interchangeable up to the point of the type definition.

Landahl answered 12/3, 2018 at 11:31 Comment(22)
@HolyBlackCat You said "all pointers (apart from function pointers) are guaranteed to have same size and representation". Since when?Jacey
@Jacey I'm not sure about representation, but more or less sure about the size. I'll look it up.Kamakura
Even if they were, the idea that you can form invalid references as long as you don't dereference them, is a myth.Tapping
@LightnessRacesinOrbit - I think the myth is perpetuated by a poor choice of name. "Strict accessing" just doesn't have the same ring to it as "strict aliasing". And the later doesn't imply an access is required to even cause a problem.Landahl
@StoryTeller: Because it isn'tTapping
@LightnessRacesinOrbit - Correct me if I'm wrong, but even this expression statement *p; for a pointer p is formally an access due to an lvalue-to-rvalue converison. So long as you just lug a pointer around, you aren't doing anything overly sinister.Landahl
@StoryTeller I can't talk about C++, but in C void *p = malloc(42); free(p); if (p) definitely is UB.Jacey
@Jacey - Invalid pointer values are a thing in C++ as well. But I think language-lawyer-wise there's a difference with regards to the the cause of the UB between a strict aliasing violation and invalid addresses being used (like you use them in your snippet).Landahl
@Jacey In C++, after deallocation the pointer becomes invalid, which forbids it to be dereferenced but not read. Regardless, the UB coming from invalid pointers isn't strict aliasing, which as mentioned is caused due to accessing an object with a type different from the objectNorthwestwards
@StoryTeller: Doesn't matter why it's UBTapping
@StoryTeller: Regarding struct vs. class: That is true, but Clang, with -Wall, will warn when the keywords are used inconsistently. This could be solved with an #ifdef __cplusplus I guess, to avoid the need for a #pragma.Gainsborough
@ArneVogel - I imagine Clang picked it up due to its interoperability with MSVC. I know Microsoft have their mangled names affected based of off that. But I don't think this constrained case of returning a pointer is likely to cause a problem, so the warning can be turned off in the TU that defines the class. Thanks for bringing it up however. I didn't actually know Clang behaved this way until now.Landahl
*p doesn't do an l-to-r conversion on the pointee in C++ unless you have a pointer to volatile.Mancuso
@PasserBy But then you compare the (now invalid) pointer value with NULL. Is it not UB?Roseannaroseanne
@PasserBy : No, even reading an invalid pointer value is UB. (If you have a segment+offset architecture, and loading an invalid segment descriptor causes a trap, then just testing for NULL can blow up.)Albuminoid
@Kamakura : All pointers to struct are the same size and representation, but that is not true of other pointers. In particular, char* and void* have been different sizes to other pointers in historic implementations.Albuminoid
@MartinBonner Do you have a standard reference for that?Kamakura
@Kamakura : 3.9.2 p3 in n4296 "A pointer to cv-qualified (3.9.3) or cv-unqualified void can be used to point to objects of unknown type. Such a pointer shall be able to hold any object pointer. An object of type cv void* shall have the same representation and alignment requirements as cv char*." (my emphasis). Note that this dispensation is specific to char* and void*.Albuminoid
@MartinBonner In C++14 [basic.stc.dynamic.deallocation] "Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior." It was UB back in C++11Northwestwards
@MartinBonner Also, in regards to lvalue-to-rvalue conversion [conv.lval] "if the object to which the glvalue refers contains an invalid pointer value, the behavior is implementation-defined."Northwestwards
@PasserBy Oo! That's interesting. Presumably an implementation is free to define the behaviour as "may terminate the program without warning".Albuminoid
@MartinBonnersupportsMonica: More interesting would be whether an "implementation-defined" action would be allowed to raise a signal before all observable actions preceding the pointer action in question have occurred, or cause the program to terminate without warning at any arbitrary time after the action occurred. So far as I can tell, older standards sought to use the phrase "Undefined Behavior" rather than "Implementation-Defined" behavior in all cases where such loosey-goosey semantics might be appropriate, including those where 99% of implementations should behave identically.Lorraine

© 2022 - 2024 — McMap. All rights reserved.