When do we break binary compatibility

Asked 10/5, 2016 at 21:43 Answered 10/5, 2016 at 22:7

I was under the impression that whenever you do one of these:

Add a new public virtual method virtual void aMethod();
Add a new public non-virtual method void aMethod();
Implement a public pure-virtual method from an interface virtual void aMethod override;

Was actually breaking binary compatibility, meaning that if a project had build on a previous version of the DLL, it would not be able to load it now that there is new methods available.

From what I have tested using Visual Studio 2012, none of these break anything. Dependency Walker reports no error and my test application was calling the appropriate method.

DLL:

class EXPORT_LIB MyClass {
public:
  void saySomething();
}

Executable:

int _tmain(int argc, _TCHAR* argv[])
{
  MyClass wTest;
  wTest.saySomething();
  return 0;
}

The only undefined behavior I found was if MyClass was implementing an pure-virtual interface and from my executable, I was calling one of the pure-virtual method and then I added a new pure-virtual method before the one used by my executable. In this case, Dependency Walker did not report any error but at runtime, it was actually calling the wrong method.

class IMyInterface {
public:
  virtual void foo();
}

In the executable

IMyInterface* wTest = new MyClass();
wTest->foo();

Then I change the interface without rebuilding my executable

class IMyInterface {
public:
  virtual void bar();
  virtual void foo();
}

It is now quietly calling bar() instead of foo().

Is it safe to do all of my three assumptions?

EDIT:

Doing this

class EXPORT_LIB MyClass {
public:
  virtual void saySomething();
}

Exec

MyClass wTest;
wTest.saySomething();

Then rebuild DLL with this:

class EXPORT_LIB MyClass {
public:
  virtual void saySomething2();
  virtual void saySomething();
  virtual void saySomething3();
}

Is calling the appropriate saySomething()

Impuissant answered 10/5, 2016 at 21:43 Comment(13)

Please clarify "binary compatibility". Do you mean that all the data and functions are in the exact same place in the file? – Claudiaclaudian 10/5, 2016 at 21:47

@ThomasMatthews You build something against a DLL. You change that DLL but don't rebuild your own executable. It now crash because your app cannot load the newest version of the DLL. – Impuissant 10/5, 2016 at 21:50

If you change anything in the class (apart from white space). Then its not compatable. – Mascle 10/5, 2016 at 21:52

@LokiAstari But why is my test all running ok without any weird behavior? – Impuissant 10/5, 2016 at 21:53

@Ceros: Imposible to say. But UB can do anything. Even appear to work. – Mascle 10/5, 2016 at 21:54

@LokiAstari so it just luck that my test cases did not do anything spectacular. – Impuissant 10/5, 2016 at 21:56

Changing size of virtual method table definitely breaks compatibility. Thus, adding a virtual method, no matter public or private, requires recompilation – Lobster 10/5, 2016 at 21:57

@Lobster you mean that it is now a case of UB. Because I was able to add multiple virtual method without rebuilding my executable, and it was not crashing either. – Impuissant 10/5, 2016 at 22:3

@Impuissant - yeah, Crazy Eddie has it correct, below: in VC++ you can extend the virtual table at the end by adding virtual methods after all other virtual methods - but those additions obviously do affect derived classes breaking their binary compatibility. – Helper 10/5, 2016 at 22:4

@Helper I did add virtual methods anywhere in my class declaration inside my DLL without any issue. It was appearing to work correctly. Was it luck and it still UB? – Impuissant 10/5, 2016 at 22:5

consider using a subclass of the class with own virtual methods, and re-try your tests. – Lobster 10/5, 2016 at 22:6

@Impuissant - that's surprising to me. I would not have expected it to work. – Helper 10/5, 2016 at 22:8

@Helper Ok I believe you meant that VC++ is reorganizing the virtual table for me, independently of where I did put my new method declarations. Excluding the reorganization of my pure-virtual interface. – Impuissant 10/5, 2016 at 22:29

Breaking binary compatibility doesn't always result in the DLL not loading, in many cases you'll end up with memory corruption which may or may not be immediately obvious. It depends a lot on the specifics of what you've changed and how things were and now are laid out in memory.

Binary compatibility between DLLs is a complex subject. Lets start by looking at your three examples;

Add a new public virtual method virtual void aMethod();

This almost certainly will result in undefined behaviour, it's very much compiler dependant but most compilers will use some form of vtable for virtual methods, so adding new ones will change the layout of that table.

Add a new public non-virtual method void aMethod();

This is fine for a global function or a member function. A member function is essentially just a global function with a hidden 'this' argument. It doesn't change the memory layout of anything.

Implement a public pure-virtual method from an interface virtual void aMethod override;

This won't exactly cause any undefined behaviour but as you've found, it won't do what you expect. Code that was compiled against the previous version of the library won't know this function has been overridden, so will not call the new implementation, it'll carry on calling the old impl. This may or may not be a problem depending on your use case, it shouldn't cause any other side effects. However I think your mileage could vary here depending on what compiler you're using. So it's probably best to avoid this.

What will stop a DLL from being loaded is if you change the signature of an exported function in any way (including changing parameters and scope) or if you remove a function. As then the dynamic linker won't be able to find it. This only applies if the function in question is being used as the linker only imports functions that are referenced in the code.

There are also many more ways to break binary compatibility between dlls, which are beyond the scope of this answer. In my experience they usually follow a theme of changing the size or layout of something in memory.

Edit: I just remembered that there is an excellent article on the KDE Wiki on binary compatibility in C++ including a very good list of do's and don'ts with explanations and work arounds.

Bantamweight answered 10/5, 2016 at 22:7 Comment(4)

@Di Paolo I added many virtual method anywhere, not just at the end, of my class declaration and it was still working "appears to" correctly. Also, if I remove a method that is not used by the caller, everything seems fine too. I know that if you remove a method that was actually in use, then this is another story – Impuissant 10/5, 2016 at 22:12

You application may appear to work correctly, in fact it may even work correctly, that is the nature of undefined behaviour & memory corruption. The point is that by adding or removing virtual methods you're modifying layout of a vtable, so your application could end up reading or writing to the wrong places in memory. Since your example app is so simple you may just get lucky and not write over any memory that's being used. In a more complex app you could see subtle errors, corruption, or access violations. Finally if the method isn't being used, there is no dependency to break. – Bantamweight 10/5, 2016 at 22:21

I get it. "Finally if the method isn't being used, there is no dependency to break." isn't it going to change the vtable as well, just like if you was adding new virtual methods? – Impuissant 10/5, 2016 at 22:23

If you remove a virtual method yes. If you remove a global or member method, then I think it'll only cause a problem if that method is being used outside the dll. – Bantamweight 10/5, 2016 at 22:26

C++ doesn't say.

Visual Studio generally follows COM rules, allowing you to add virtual methods to the end of your most derived class unless they are overloads.

Any non-static data member will change the binary layout as well.

Non-virtual functions don't affect binary compatibility.

Templates make a huge mess because of name mangling.

Your best bet to retain binary compatibility is to use both the pimpl idiom and the nvi idiom quite liberally.

Demavend answered 10/5, 2016 at 21:57 Comment(7)

Non-virtual functions can affect binary compatibility, if you change their signature. – Soke 10/5, 2016 at 21:58

Also inline definitions - if changed - won't be updated where they were compiled into calling code - this is not just for members but also for top-level functions that use the changed class (e.g., operators). – Helper 10/5, 2016 at 22:1

@Crazy Eddie I did add virtual methods before and after a method that was called from my executable without any issue. It was appearing to work correctly. – Impuissant 10/5, 2016 at 22:7

Silently calling the wrong function doesn't seem to qualify as, "working correctly," to me...but to each their own I guess. – Demavend 10/5, 2016 at 22:15

@Helper - inlines aren't part of the binary interface. Anything they use is, so they break encapsulation, but they themselves are not. – Demavend 10/5, 2016 at 22:16

@CrazyEddie I meant virtual method from the class itself, not from a pure virtual interface. – Impuissant 10/5, 2016 at 22:17

@CrazyEddie - you are of course correct but treating this question more as a practical question of what can I change and expect it to work/break and consider we general program against header files, even across DLLs, I thought it was fair to point out that inlines, while they don't break layout of either the class or the virt table will in fact "break compatibility" I the sense that your classes in the DLL won't work the way you expect against code that hasn't been recompiled. The "breaks encapsulation" vs. "binary compatibility" distinction is subtle ... – Helper 10/5, 2016 at 22:21

Recommended topics

Hot tags