Why aren't my include guards preventing recursive inclusion and multiple symbol definitions?
Asked Answered
L

3

82

Two common questions about include guards:

  1. FIRST QUESTION:

    Why aren't include guards protecting my header files from mutual, recursive inclusion? I keep getting errors about non-existing symbols which are obviously there or even weirder syntax errors every time I write something like the following:

    "a.h"

    #ifndef A_H
    #define A_H
    
    #include "b.h"
    
    ...
    
    #endif // A_H
    

    "b.h"

    #ifndef B_H
    #define B_H
    
    #include "a.h"
    
    ...
    
    #endif // B_H
    

    "main.cpp"

    #include "a.h"
    int main()
    {
        ...
    }
    

    Why do I get errors compiling "main.cpp"? What do I need to do to solve my problem?


  1. SECOND QUESTION:

    Why aren't include guards preventing multiple definitions? For instance, when my project contains two files that include the same header, sometimes the linker complains about some symbol being defined multiple times. For instance:

    "header.h"

    #ifndef HEADER_H
    #define HEADER_H
    
    int f()
    {
        return 0;
    }
    
    #endif // HEADER_H
    

    "source1.cpp"

    #include "header.h"
    ...
    

    "source2.cpp"

    #include "header.h"
    ...
    

    Why is this happening? What do I need to do to solve my problem?

Ladonna answered 16/2, 2013 at 11:55 Comment(11)
I don't see how this is different than #554182 and #14425762Epidote
@LuchianGrigore: The first Q&A is not directly related to include guards, or at least IMO it does not explain why include guards give troubles with dependencies. The second one does address one of the two questions (the second one), but in a less extensive and detailed way. I wanted to group these two Q&As about include guards together because it seems to me they are tightly related.Ladonna
I can't see why this should be an FAQ. If you feel like I removed that tag wrongly, feel free to discuss this in the C++ chat room.Knecht
@sbi: I'm fine with you removing the tag, no problem. I just though that since it's a frequently asked question about C++, it should be tagged as faq-c++.Ladonna
@Andy: The Stackoverflow C++ FAQ tag was brought into life to address the problem of recurring C++ questions on Stackoverflow. From my POV, this isn't such a question. ICBWT.Knecht
@sbi: Well, in the last few days I've seen at least 4 questions on SO from beginners puzzled by multiple definitions or mutual inclusions, so from my POV it is a recurring question. That's why I've bothered to write this whole thing in the first place: why would I write a Q&A for beginners otherwise? But of course, I understand that everyone has a subjective perception of what's "frequent", and my perception might not match yours. Although I still believe this should be tagged as c++-faq, I have nothing against a higher-rep user with more experience to enforce his view.Ladonna
@Andy I hardly answer any questions anymore nowadays, so I could be wrong. As I said: Take this to the chat. If you find a majority there, I'm fine with the tag.Knecht
@sbi: As I wrote, I'm fine with it, I don't really mind and I don't want to bother people in the chat with something they likely don't care about. However, if it is really the case that you no longer answer questions here or follow which ones are frequently asked, then I think you should have done the poll yourself before removing the tag.Ladonna
seems like a FAQ to meTiemannite
Seems like a FAQ to me too, but it is already answered (granted the explanation here is a lot more thorough and better). Consider closing the other two questions as dupes of this one.Epidote
Consider using #pragma once instead of header gaurds. Its not part of the standard, but is supported by pretty much any compiler you would use (and many you wont) en.wikipedia.org/wiki/Pragma_once#PortabilityAmal
L
136

FIRST QUESTION:

Why aren't include guards protecting my header files from mutual, recursive inclusion?

They are.

What they are not helping with is dependencies between the definitions of data structures in mutually-including headers. To see what this means, let's start with a basic scenario and see why include guards do help with mutual inclusions.

Suppose your mutually including a.h and b.h header files have trivial content, i.e. the ellipses in the code sections from the question's text are replaced with the empty string. In this situation, your main.cpp will happily compile. And this is only thanks to your include guards!

If you're not convinced, try removing them:

//================================================
// a.h

#include "b.h"

//================================================
// b.h

#include "a.h"

//================================================
// main.cpp
//
// Good luck getting this to compile...

#include "a.h"
int main()
{
    ...
}

You'll notice that the compiler will report a failure when it reaches the inclusion depth limit. This limit is implementation-specific. Per Paragraph 16.2/6 of the C++11 Standard:

A #include preprocessing directive may appear in a source file that has been read because of a #include directive in another file, up to an implementation-defined nesting limit.

So what's going on?

  1. When parsing main.cpp, the preprocessor will meet the directive #include "a.h". This directive tells the preprocessor to process the header file a.h, take the result of that processing, and replace the string #include "a.h" with that result;
  2. While processing a.h, the preprocessor will meet the directive #include "b.h", and the same mechanism applies: the preprocessor shall process the header file b.h, take the result of its processing, and replace the #include directive with that result;
  3. When processing b.h, the directive #include "a.h" will tell the preprocessor to process a.h and replace that directive with the result;
  4. The preprocessor will start parsing a.h again, will meet the #include "b.h" directive again, and this will set up a potentially infinite recursive process. When reaching the critical nesting level, the compiler will report an error.

When include guards are present, however, no infinite recursion will be set up in step 4. Let's see why:

  1. (same as before) When parsing main.cpp, the preprocessor will meet the directive #include "a.h". This tells the preprocessor to process the header file a.h, take the result of that processing, and replace the string #include "a.h" with that result;
  2. While processing a.h, the preprocessor will meet the directive #ifndef A_H. Since the macro A_H has not yet been defined, it will keep processing the following text. The subsequent directive (#defines A_H) defines the macro A_H. Then, the preprocessor will meet the directive #include "b.h": the preprocessor shall now process the header file b.h, take the result of its processing, and replace the #include directive with that result;
  3. When processing b.h, the preprocessor will meet the directive #ifndef B_H. Since the macro B_H has not yet been defined, it will keep processing the following text. The subsequent directive (#defines B_H) defines the macro B_H. Then, the directive #include "a.h" will tell the preprocessor to process a.h and replace the #include directive in b.h with the result of preprocessing a.h;
  4. The compiler will start preprocessing a.h again, and meet the #ifndef A_H directive again. However, during previous preprocessing, macro A_H has been defined. Therefore, the compiler will skip the following text this time until the matching #endif directive is found, and the output of this processing is the empty string (supposing nothing follows the #endif directive, of course). The preprocessor will therefore replace the #include "a.h" directive in b.h with the empty string, and will trace back the execution until it replaces the original #include directive in main.cpp.

Thus, include guards do protect against mutual inclusion. However, they can't help with dependencies between the definitions of your classes in mutually-including files:

//================================================
// a.h

#ifndef A_H
#define A_H

#include "b.h"

struct A
{
};

#endif // A_H

//================================================
// b.h

#ifndef B_H
#define B_H

#include "a.h"

struct B
{
    A* pA;
};

#endif // B_H

//================================================
// main.cpp
//
// Good luck getting this to compile...

#include "a.h"
int main()
{
    ...
}

Given the above headers, main.cpp will not compile.

Why is this happening?

To see what's going on, it is enough to go through steps 1-4 again.

It is easy to see that the first three steps and most of the fourth step are unaffected by this change (just read through them to get convinced). However, something different happens at the end of step 4: after replacing the #include "a.h" directive in b.h with the empty string, the preprocessor will start parsing the content of b.h and, in particular, the definition of B. Unfortunately, the definition of B mentions class A, which has never been met before exactly because of the inclusion guards!

Declaring a member variable of a type which has not been previously declared is, of course, an error, and the compiler will politely point that out.

What do I need to do to solve my problem?

You need forward declarations.

In fact, the definition of class A is not required in order to define class B, because a pointer to A is being declared as a member variable, and not an object of type A. Since pointers have fixed size, the compiler won't need to know the exact layout of A nor to compute its size in order to properly define class B. Hence, it is enough to forward-declare class A in b.h and make the compiler aware of its existence:

//================================================
// b.h

#ifndef B_H
#define B_H

// Forward declaration of A: no need to #include "a.h"
struct A;

struct B
{
    A* pA;
};

#endif // B_H

Your main.cpp will now certainly compile. A couple of remarks:

  1. Not only breaking the mutual inclusion by replacing the #include directive with a forward declaration in b.h was enough to effectively express the dependency of B on A: using forward declarations whenever possible/practical is also considered to be a good programming practice, because it helps avoiding unnecessary inclusions, thus reducing the overall compilation time. However, after eliminating the mutual inclusion, main.cpp will have to be modified to #include both a.h and b.h (if the latter is needed at all), because b.h is no more indirectly #included through a.h;
  2. While a forward declaration of class A is enough for the compiler to declare pointers to that class (or to use it in any other context where incomplete types are acceptable), dereferencing pointers to A (for instance to invoke a member function) or computing its size are illegal operations on incomplete types: if that is needed, the full definition of A needs to be available to the compiler, which means the header file that defines it must be included. This is why class definitions and the implementation of their member functions are usually split into a header file and an implementation file for that class (class templates are an exception to this rule): implementation files, which are never #included by other files in the project, can safely #include all the necessary headers to make definitions visible. Header files, on the other hand, won't #include other header files unless they really need to do so (for instance, to make the definition of a base class visible), and will use forward-declarations whenever possible/practical.

SECOND QUESTION:

Why aren't include guards preventing multiple definitions?

They are.

What they are not protecting you from is multiple definitions in separate translation units. This is also explained in this Q&A on StackOverflow.

Too see that, try removing the include guards and compiling the following, modified version of source1.cpp (or source2.cpp, for what it matters):

//================================================
// source1.cpp
//
// Good luck getting this to compile...

#include "header.h"
#include "header.h"

int main()
{
    ...
}

The compiler will certainly complain here about f() being redefined. That's obvious: its definition is being included twice! However, the above source1.cpp will compile without problems when header.h contains the proper include guards. That's expected.

Still, even when the include guards are present and the compiler will stop bothering you with error message, the linker will insist on the fact that multiple definitions being found when merging the object code obtained from the compilation of source1.cpp and source2.cpp, and will refuse to generate your executable.

Why is this happening?

Basically, each .cpp file (the technical term in this context is translation unit) in your project is compiled separately and independently. When parsing a .cpp file, the preprocessor will process all the #include directives and expand all macro invocations it encounters, and the output of this pure text processing will be given in input to the compiler for translating it into object code. Once the compiler is done with producing the object code for one translation unit, it will proceed with the next one, and all the macro definitions that have been encountered while processing the previous translation unit will be forgotten.

In fact, compiling a project with n translation units (.cpp files) is like executing the same program (the compiler) n times, each time with a different input: different executions of the same program won't share the state of the previous program execution(s). Thus, each translation is performed independently and the preprocessor symbols encountered while compiling one translation unit will not be remembered when compiling other translation units (if you think about it for a moment, you will easily realize that this is actually a desirable behavior).

Therefore, even though include guards help you preventing recursive mutual inclusions and redundant inclusions of the same header in one translation unit, they can't detect whether the same definition is included in different translation unit.

Yet, when merging the object code generated from the compilation of all the .cpp files of your project, the linker will see that the same symbol is defined more than once, and since this violates the One Definition Rule. Per Paragraph 3.2/3 of the C++11 Standard:

Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program; no diagnostic required. The definition can appear explicitly in the program, it can be found in the standard or a user-defined library, or (when appropriate) it is implicitly defined (see 12.1, 12.4 and 12.8). An inline function shall be defined in every translation unit in which it is odr-used.

Hence, the linker will emit an error and refuse to generate the executable of your program.

What do I need to do to solve my problem?

If you want to keep your function definition in a header file that is #included by multiple translation units (notice, that no problem will arise if your header is #included just by one translation unit), you need to use the inline keyword.

Otherwise, you need to keep only the declaration of your function in header.h, putting its definition (body) into one separate .cpp file only (this is the classical approach).

The inline keyword represents a non-binding request to the compiler to inline the function's body directly at the call site, rather than setting up a stack frame for a regular function call. Although the compiler doesn't have to fulfill your request, the inline keyword does succeed in telling the linker to tolerate multiple symbol definitions. According to Paragraph 3.2/5 of the C++11 Standard:

There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (Clause 14), non-static function template (14.5.6), static data member of a class template (14.5.1.3), member function of a class template (14.5.1.1), or template specialization for which some template parameters are not specified (14.7, 14.5.5) in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements [...]

The above Paragraph basically lists all the definitions which are commonly put in header files, because they can be safely included in multiple translation units. All other definitions with external linkage, instead, belong in source files.

Using the static keyword instead of the inline keyword also results in suppressing linker errors by giving your function internal linkage, thus making each translation unit hold a private copy of that function (and of its local static variables). However, this eventually results in a larger executable, and the use of inline should be preferred in general.

An alternative way of achieving the same result as with the static keyword is to put function f() in an unnamed namespace. Per Paragraph 3.5/4 of the C++11 Standard:

An unnamed namespace or a namespace declared directly or indirectly within an unnamed namespace has internal linkage. All other namespaces have external linkage. A name having namespace scope that has not been given internal linkage above has the same linkage as the enclosing namespace if it is the name of:

— a variable; or

a function; or

— a named class (Clause 9), or an unnamed class defined in a typedef declaration in which the class has the typedef name for linkage purposes (7.1.3); or

— a named enumeration (7.2), or an unnamed enumeration defined in a typedef declaration in which the enumeration has the typedef name for linkage purposes (7.1.3); or

— an enumerator belonging to an enumeration with linkage; or

— a template.

For the same reason mentioned above, the inline keyword should be preferred.

Ladonna answered 16/2, 2013 at 11:55 Comment(12)
Nice. Somewhere around the discussion of the two flavors of ODR, I would add point out that the quoted 3.2/3 lists the definitions which we commonly put in header files and all other definitions with external linkage belong in source files. And then a plain language checklist for "which sort of ODR applies to my definition?"Extrauterine
@aschepler: Do you mean 3.2/4 ("A type T must be complete if...") or rather 3.2/5 ("There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (Clause 14), [...], and provided the definitions satisfy the following requirements [...]")? I think it would be useful to mention both, on the other hand it would be hard to do so shortly enough, and with a long explanation the focus will shift away from include guards, which is the subject of this Q&A. Maybe a new FAQ entry linked to this one?Ladonna
I meant the "There can be more than one definition of...". It's labeled "3.2/3" in this post, but it's strange that there are two different paragraphs labeled "3.2/3".Extrauterine
@aschepler: That was indeed a typo, should have been 3.2/5. I fixed it and added a remark. Thank you.Ladonna
@AndyProwl -- The usual answer for that is sociopathy. Don't it let get you down. Great post ... +1Aftertaste
To decide if a header is being read multiple times in a large project, can I go with gcc -MMD and produce a .d file and watch out for multiple occurrences of the same header file?Unveil
I would appreciate if you can also comment on #pragma once. I have large project to analyze, and apparently, include guards are there in one file, however, gcc -MMD is listing a header file more than once. I then put a #pragma once in the header, and gcc -MMD no longer lists it more than once. What are your comments on this behavior?Unveil
@venkrao: Unfortunately I do not know that GCC option, so I cannot comment on it. What I can tell about #pragma once is that you should prefer it to traditional include guards - it is less error-prone and more concise.Ladonna
I want you to know that your answer was so long I actually considered whether it was worth the effort to scroll all the way back up to up vote it... I did.Nador
@Andrew: Thank you, I'm glad you found the energy :DLadonna
@AndyProwl wanna put +100Lydie
@AndyProwl thank you for taking your time and writing such a helpful and extensive explanation, +1Clarkin
B
-1

fiorentinoing's answer is echoed in Git 2.24 (Q4 2019), where a similar code cleanup is taking place in the Git codebase.

See commit 2fe4439 (03 Oct 2019) by René Scharfe (rscharfe).
(Merged by Junio C Hamano -- gitster -- in commit a4c5d9f, 11 Oct 2019)

treewide: remove duplicate #include directives

Found with:

git grep '^#include ' '*.c' | sort | uniq -d
Boxhaul answered 13/10, 2019 at 16:42 Comment(0)
F
-2

First of all you should be 100% sure that you have no duplicates in "include guards".

With this command

grep -rh "#ifndef" * 2>&1 | uniq -c | sort -rn | awk '{print $1 " " $3}' | grep -v "^1\ "

you will 1) highlight all include guards, get unique row with counter per include name, sort the results, print only counter and include name and remove the ones that are really unique.

HINT: this is equivalent to get the list of duplicated include names

Fluid answered 21/11, 2018 at 13:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.