As sergej
already mentioned, it's the Pimpl
idiom which is also a subset of the Bridge
design pattern.
But I wanted to lend a C perspective to this topic. I was surprised as I got more into C++ that this had such a name to it, as a similar practice was applied in C with similar pros and cons (but one extra pro due to something lacking in C).
C Perspective
In C, it's fairly common practice to have an opaque pointer to a forward-declared struct
, like so:
// Foo.h:
#ifndef FOO_H
#define FOO_H
struct Foo* foo_create(void);
void foo_destroy(struct Foo* foo);
void foo_do_something(struct Foo* foo);
#endif
// Foo.c:
#include "Foo.h"
struct Foo
{
// ...
};
struct Foo* foo_create(void)
{
return malloc(sizeof(struct Foo));
}
void foo_destroy(struct Foo* foo)
{
free(foo);
}
void foo_do_something(struct Foo* foo)
{
// Do something with foo's state.
}
This carries similar pros/cons to the Pimpl
yet with one additional pro for C. In C, there is no private
specifier for structs
, making this the sole way to hide information and prevent access to struct
internals from the outside world. So there it became both a means of hiding and preventing access to internals.
In C++ there is that nice private
specifier allowing us to prevent access to internals, yet we can't hide their visibility completely from the outside world unless we use something like a Pimpl
which basically wraps this kind of C idea of opaque pointers to a forward-declared UDT in the form of a class
with one or more constructors and a destructor.
Efficiency
Perhaps one of the most glaring cons independent of a unique context is that this kind of representation splits what could be a single contiguous memory block into two chunks, one for the pointer and another for the data fields, like so:
[Opaque Pointer]-------------------------->[Internal Data Fields]
... this is often described as introducing an additional level of indirection, but it's not the indirection that's the performance issue here so much as the degradation to the locality of reference, and the additional compulsory cache misses and page faults when heap-allocating and accessing these internals for the first time.
With this representation, we can also no longer simply allocate
everything we need on the stack. Only the pointer can be allocated to
the stack, while the internals must be allocated on the heap.
The performance cost associated with this tends to be the most pronounced if we're storing an array of a bunch of these handles (in C, the opaque pointer itself, in C++, an object containing one). In such a case, we end up with an array of, say, a million pointers which could potentially point all over the place, and we end up paying for it in the form of increased page faults and cache misses and heap (free store) allocation/deallocation overhead.
This can end up leaving us with performance analogical to Java storing a generic list of a million instances of user-defined types and processing them sequentially (runs and hides).
Efficiency: Fixed Allocator
One way to significantly mitigate (but not eliminate) this cost is to use, say, an O(1) fixed allocator which provides a more contiguous memory layout for the internals. This can help significantly in cases where we're working with an array of Foos
, e.g., by using an allocator which allows the Foo
internals to be stored with a (more) contiguous memory layout (improving locality of reference).
Efficiency: Bulk Interface
An approach that embraces a very different design mindset is to start modeling your public interfaces at a coarser level to to be Foo
aggregates (an interface for a container of Foo
instances), and hide the ability to even instantiate Foo
individually from the outside world. This is only appropriate in some scenarios, but in such cases, we can mitigate the cost down to a single pointer indirection for the entire container, which starts to become practically free if the public interface consists of high-level algorithms operating on many hidden Foo
objects at once.
As a blatant example of this (though hopefully no one ever does this), we don't want to use a Pimpl
strategy to hide the details of a single pixel of an image. Instead we want to model our interface at the whole image level which consists of a bunch of pixels and public operations which apply to a bunch of pixels. Same kind of idea with a single particle vs. a particle system, or even possibly a single sprite in a video game. We can always bulk up our interfaces if we find ourselves with performance hotspots due to modeling things at too granular of a level and paying memory or abstraction penalties (dynamic dispatch, e.g.) for it.
"If you want peak performance, you gotta get pumped! Bulk up those interfaces! Get to the choppa!" -- Imaginary Arnie advice after putting a screwdriver through someone's jugular.
Lighter Headers
As can be seen, these practices completely hide away the internals of a class
or struct
from the outside world. From a compile-time and header standpoint, this also serves as a decoupling mechanism.
When the internals of a Foo
are no longer visible to the outside world through a header file, build times go down immediately just for having a smaller header. Perhaps more significantly, the internals of Foo
may require other header files to be included, like Bar.h
. By hiding away the internals, we no longer need Foo.h
to include Bar.h
(only Foo.cpp
would include it). Since Bar.h
might also include other headers with a cascading effect, this can dramatically reduce the amount of work required for the preprocessor, and make our header file substantially more lightweight than it was before the use of a Pimpl
.
So while Pimpls
have some runtime cost, they reduce build time cost. In even the most performance-critical fields, a majority of a complex codebase will favor productivity more than the utmost runtime efficiency. From a productivity perspective, lengthy build times can be killer, so trading a slight performance degradation at runtime for build performance can be a good trade-off.
Cascading Changes
Furthermore, by hiding the visibility of the internals of Foo
, changes made to it no longer impact its header file. This allows us to now simply change Foo.cpp
, e.g., to change the internals of Foo
, with only this one source file that needs to be recompiled in such cases. This also relates to build times, but specifically in the context of small (possibly very small) changes where having to recompile all kinds of things might be a real PITA.
As a bonus, this might also improve the sanity of all of your teammates in a team setting if they don't have to recompile everything for some small change to the private details of some class.
With this, everyone can potentially get their work done at a faster pace, leaving more time in their schedule to visit their favorite bar and get hammered and so forth.
API and ABI
One less obvious pro (but quite significant in an API context) is when you are exposing an API for plugin developers (including third parties writing source code outside of your control), e.g. In such a case, if you expose the internal state of a class
or struct
in a way such that the handles accessed by the plugins include these internals directly, we end up with a very fragile ABI. Binary dependencies might start to resemble this nature:
[Plugin Developer]----------------->[Internal Data Fields]
One of the biggest problems here is that if you make any changes to these internal states, the ABI for the internals break which plugins directly depend upon to work. Practical result: now we end up with a bunch of plugin binaries written by possibly all kinds of people for our product which no longer work until new versions are published for the new ABI.
Here an opaque pointer (Pimpl
included) introduces an intermediary which protects us from such ABI breakages.
[Plugin Developer]----->[Opaque Pointer]----->[Internal Data Fields]
... and that can go a very long way towards backwards plugin compatibility when you're now free to change private internals without risking such plugin breakages.
Pros and Cons
Here is a summary of the pros and cons along with a few additional, minor ones:
Pros:
- Results in lightweight headers.
- Mitigates cascading build changes. Internals can be changed while impacting only one compilation unit (aka translation unit, i.e. source file) as opposed to many.
- Hides internals which can be beneficial even from an aesthetic/documentation perspective (don't show clients using a public interface more than they need to see in order to use it).
- Prevents clients from depending on fragile ABIs which would break the moment a single internal detail is modified, mitigating cascading breakages to binaries as a result of an ABI change.
Cons:
- Runtime efficiency (mitigated by bulkier interfaces or efficient fixed allocators).
- Minor: slightly more boilerplate code to read/write for implementors (though no duplication of any non-trivial logic).
- Cannot be applied towards class templates which require their complete definition to be visible at the site at which the code is generated.
TL;DR
So anyway, above is a brief introduction to this idiom along with some history and parallels to practices predating it in C.