Strange C++ pattern to reduce compilation time

T

4

5

I found at OpenSource code of Tizen Project the pattern that could shorten the compilation time of project. It is used in many places of the project.

As an example I picked one class names ClientSubmoduleSupport. It is short one. Here are their sources: client_submode_support.h, client_submode_support.cpp.

As you can see at client_submode_support.h it is defined a ClientSubmoduleSupport and client_submode_support.cpp there is defined ClientSubmoduleSupportImplementation class that do job for ClientSubmoduleSupport.

Do you know that pattern? I'm curious of pros and cons of this approach.

Tuatara answered 19/12, 2015 at 10:41 Comment(1)

A few pros and cons here: herbsutter.com/gotw/_100 – Crown 19/12, 2015 at 11:36

L

7

This pattern is called "Bridge", also known as the "Pimpl idiom".

Intent: "decouple an abstraction from its implementation so that the two can vary independently"

Souce: The "Gang of Four" Design Patterns Book

Lenticel answered 19/12, 2015 at 10:45 Comment(0)

C

4

As sergej already mentioned, it's the Pimpl idiom which is also a subset of the Bridge design pattern.

But I wanted to lend a C perspective to this topic. I was surprised as I got more into C++ that this had such a name to it, as a similar practice was applied in C with similar pros and cons (but one extra pro due to something lacking in C).

C Perspective

In C, it's fairly common practice to have an opaque pointer to a forward-declared struct, like so:

// Foo.h:
#ifndef FOO_H
#define FOO_H

struct Foo* foo_create(void);
void foo_destroy(struct Foo* foo);
void foo_do_something(struct Foo* foo);

#endif

// Foo.c:
#include "Foo.h"

struct Foo 
{
    // ...
};

struct Foo* foo_create(void)
{
    return malloc(sizeof(struct Foo));
}

void foo_destroy(struct Foo* foo)
{
    free(foo);
}

void foo_do_something(struct Foo* foo)
{
    // Do something with foo's state.
}

This carries similar pros/cons to the Pimpl yet with one additional pro for C. In C, there is no private specifier for structs, making this the sole way to hide information and prevent access to struct internals from the outside world. So there it became both a means of hiding and preventing access to internals.

In C++ there is that nice private specifier allowing us to prevent access to internals, yet we can't hide their visibility completely from the outside world unless we use something like a Pimpl which basically wraps this kind of C idea of opaque pointers to a forward-declared UDT in the form of a class with one or more constructors and a destructor.

Efficiency

Perhaps one of the most glaring cons independent of a unique context is that this kind of representation splits what could be a single contiguous memory block into two chunks, one for the pointer and another for the data fields, like so:

[Opaque Pointer]-------------------------->[Internal Data Fields]

... this is often described as introducing an additional level of indirection, but it's not the indirection that's the performance issue here so much as the degradation to the locality of reference, and the additional compulsory cache misses and page faults when heap-allocating and accessing these internals for the first time.

_{With this representation, we can also no longer simply allocate
everything we need on the stack. Only the pointer can be allocated to
the stack, while the internals must be allocated on the heap.}

The performance cost associated with this tends to be the most pronounced if we're storing an array of a bunch of these handles (in C, the opaque pointer itself, in C++, an object containing one). In such a case, we end up with an array of, say, a million pointers which could potentially point all over the place, and we end up paying for it in the form of increased page faults and cache misses and heap (free store) allocation/deallocation overhead.

_{This can end up leaving us with performance analogical to Java storing a generic list of a million instances of user-defined types and processing them sequentially (runs and hides).}

Efficiency: Fixed Allocator

One way to significantly mitigate (but not eliminate) this cost is to use, say, an O(1) fixed allocator which provides a more contiguous memory layout for the internals. This can help significantly in cases where we're working with an array of Foos, e.g., by using an allocator which allows the Foo internals to be stored with a (more) contiguous memory layout (improving locality of reference).

Efficiency: Bulk Interface

An approach that embraces a very different design mindset is to start modeling your public interfaces at a coarser level to to be Foo aggregates (an interface for a container of Foo instances), and hide the ability to even instantiate Foo individually from the outside world. This is only appropriate in some scenarios, but in such cases, we can mitigate the cost down to a single pointer indirection for the entire container, which starts to become practically free if the public interface consists of high-level algorithms operating on many hidden Foo objects at once.

As a blatant example of this (though hopefully no one ever does this), we don't want to use a Pimpl strategy to hide the details of a single pixel of an image. Instead we want to model our interface at the whole image level which consists of a bunch of pixels and public operations which apply to a bunch of pixels. Same kind of idea with a single particle vs. a particle system, or even possibly a single sprite in a video game. We can always bulk up our interfaces if we find ourselves with performance hotspots due to modeling things at too granular of a level and paying memory or abstraction penalties (dynamic dispatch, e.g.) for it.

_{"If you want peak performance, you gotta get pumped! Bulk up those interfaces! Get to the choppa!" -- Imaginary Arnie advice after putting a screwdriver through someone's jugular.}

Lighter Headers

As can be seen, these practices completely hide away the internals of a class or struct from the outside world. From a compile-time and header standpoint, this also serves as a decoupling mechanism.

When the internals of a Foo are no longer visible to the outside world through a header file, build times go down immediately just for having a smaller header. Perhaps more significantly, the internals of Foo may require other header files to be included, like Bar.h. By hiding away the internals, we no longer need Foo.h to include Bar.h (only Foo.cpp would include it). Since Bar.h might also include other headers with a cascading effect, this can dramatically reduce the amount of work required for the preprocessor, and make our header file substantially more lightweight than it was before the use of a Pimpl.

So while Pimpls have some runtime cost, they reduce build time cost. In even the most performance-critical fields, a majority of a complex codebase will favor productivity more than the utmost runtime efficiency. From a productivity perspective, lengthy build times can be killer, so trading a slight performance degradation at runtime for build performance can be a good trade-off.

Cascading Changes

Furthermore, by hiding the visibility of the internals of Foo, changes made to it no longer impact its header file. This allows us to now simply change Foo.cpp, e.g., to change the internals of Foo, with only this one source file that needs to be recompiled in such cases. This also relates to build times, but specifically in the context of small (possibly very small) changes where having to recompile all kinds of things might be a real PITA.

_{As a bonus, this might also improve the sanity of all of your teammates in a team setting if they don't have to recompile everything for some small change to the private details of some class.}

With this, everyone can potentially get their work done at a faster pace, leaving more time in their schedule to visit their favorite bar and get hammered and so forth.

API and ABI

One less obvious pro (but quite significant in an API context) is when you are exposing an API for plugin developers (including third parties writing source code outside of your control), e.g. In such a case, if you expose the internal state of a class or struct in a way such that the handles accessed by the plugins include these internals directly, we end up with a very fragile ABI. Binary dependencies might start to resemble this nature:

[Plugin Developer]----------------->[Internal Data Fields]

One of the biggest problems here is that if you make any changes to these internal states, the ABI for the internals break which plugins directly depend upon to work. Practical result: now we end up with a bunch of plugin binaries written by possibly all kinds of people for our product which no longer work until new versions are published for the new ABI.

Here an opaque pointer (Pimpl included) introduces an intermediary which protects us from such ABI breakages.

[Plugin Developer]----->[Opaque Pointer]----->[Internal Data Fields]

... and that can go a very long way towards backwards plugin compatibility when you're now free to change private internals without risking such plugin breakages.

Pros and Cons

Here is a summary of the pros and cons along with a few additional, minor ones:

Pros:

Results in lightweight headers.
Mitigates cascading build changes. Internals can be changed while impacting only one compilation unit (aka translation unit, i.e. source file) as opposed to many.
Hides internals which can be beneficial even from an aesthetic/documentation perspective (don't show clients using a public interface more than they need to see in order to use it).
Prevents clients from depending on fragile ABIs which would break the moment a single internal detail is modified, mitigating cascading breakages to binaries as a result of an ABI change.

Cons:

Runtime efficiency (mitigated by bulkier interfaces or efficient fixed allocators).
Minor: slightly more boilerplate code to read/write for implementors (though no duplication of any non-trivial logic).
Cannot be applied towards class templates which require their complete definition to be visible at the site at which the code is generated.

TL;DR

So anyway, above is a brief introduction to this idiom along with some history and parallels to practices predating it in C.

Cleres answered 19/12, 2015 at 15:24 Comment(0)

H

0

You would use this pattern mainly when you're writing code for a library that's used by 3rd party developers and you cannot change the API, ever. It gives you the freedom the change the underlying implementation of a function without requiring your customers to recompile their code when they use the new version of your library.

(I've seen API stability requirements being written into legal contracts)

Hamachi answered 19/12, 2015 at 11:27 Comment(1)

The Pimpl idiom mostly about binary compatibility (ABI) and not API. – Crown 19/12, 2015 at 11:35

Q

0

The use of this pattern to reduce compilation times is discussed extensively in J. Lakos. "Large-Scale C++ Software Design" (Addison-Wesley, 1996).

There's also some discussion by Herb Sutter on the merits of this approach here.

Quinquereme answered 19/12, 2015 at 11:30 Comment(0)