What are the rules about using an underscore in a C++ identifier?
Asked Answered
K

6

1062

It's common in C++ to name member variables with some kind of prefix to denote the fact that they're member variables, rather than local variables or parameters. If you've come from an MFC background, you'll probably use m_foo. I've also seen myFoo occasionally.

C# (or possibly just .NET) seems to recommend using just an underscore, as in _foo. Is this allowed by the C++ standard?

Kantos answered 23/10, 2008 at 7:2 Comment(2)
The glibc manual page about that can be found at gnu.org/software/libc/manual/html_node/Reserved-Names.html Edit: see also opengroup.org/onlinepubs/009695399/functions/xsh_chap02_02.htmlAnemophilous
Just to note that the ignorance of these rules does not necessarily imply that your code will not compile or run, but it is likely that your code will not be portable to different compilers and version, since it cannot be guaranteed that there will not be name clashes . To back this up I know of certain implementation of an important system that has been using as a naming convention the _ capital letter everywhere. There where no errors due to this. Of course it is bad practice.Exsiccate
S
961

The rules (which did not change in C++11):

  • Reserved in any scope, including for use as implementation macros:
    • identifiers beginning with an underscore followed immediately by an uppercase letter
    • identifiers containing adjacent underscores (or "double underscore")
  • Reserved in the global namespace:
    • identifiers beginning with an underscore
  • Also, everything in the std namespace is reserved. (You are allowed to add template specializations, though.)

From the 2003 C++ Standard:

17.4.3.1.2 Global names [lib.global.names]

Certain sets of names and function signatures are always reserved to the implementation:

  • Each name that contains a double underscore (__) or begins with an underscore followed by an uppercase letter (2.11) is reserved to the implementation for any use.
  • Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.165

165) Such names are also reserved in namespace ::std (17.4.3.1).

The C++ language is based on the C language (1.1/2, C++03), and C99 is a normative reference (1.2/1, C++03), so it's useful to know the restrictions from the 1999 C Standard (although they do not apply to C++ directly):

7.1.3 Reserved identifiers

Each header declares or defines all identifiers listed in its associated subclause, and optionally declares or defines identifiers listed in its associated future library directions subclause and identifiers which are always reserved either for any use or for use as file scope identifiers.

  • All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.
  • All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces.
  • Each macro name in any of the following subclauses (including the future library directions) is reserved for use as specified if any of its associated headers is included; unless explicitly stated otherwise (see 7.1.4).
  • All identifiers with external linkage in any of the following subclauses (including the future library directions) are always reserved for use as identifiers with external linkage.154
  • Each identifier with file scope listed in any of the following subclauses (including the future library directions) is reserved for use as a macro name and as an identifier with file scope in the same name space if any of its associated headers is included.

No other identifiers are reserved. If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved identifier as a macro name, the behavior is undefined.

If the program removes (with #undef) any macro definition of an identifier in the first group listed above, the behavior is undefined.

154) The list of reserved identifiers with external linkage includes errno, math_errhandling, setjmp, and va_end.

Other restrictions might apply. For example, the POSIX standard reserves a lot of identifiers that are likely to show up in normal code:

  • Names beginning with a capital E followed a digit or uppercase letter:
  • may be used for additional error code names.
  • Names that begin with either is or to followed by a lowercase letter
  • may be used for additional character testing and conversion functions.
  • Names that begin with LC_ followed by an uppercase letter
  • may be used for additional macros specifying locale attributes.
  • Names of all existing mathematics functions suffixed with f or l are reserved
  • for corresponding functions that operate on float and long double arguments, respectively.
  • Names that begin with SIG followed by an uppercase letter are reserved
  • for additional signal names.
  • Names that begin with SIG_ followed by an uppercase letter are reserved
  • for additional signal actions.
  • Names beginning with str, mem, or wcs followed by a lowercase letter are reserved
  • for additional string and array functions.
  • Names beginning with PRI or SCN followed by any lowercase letter or X are reserved
  • for additional format specifier macros
  • Names that end with _t are reserved
  • for additional type names.

While using these names for your own purposes right now might not cause a problem, they do raise the possibility of conflict with future versions of that standard.


Personally I just don't start identifiers with underscores. New addition to my rule: Don't use double underscores anywhere, which is easy as I rarely use underscore.

After doing research on this article I no longer end my identifiers with _t as this is reserved by the POSIX standard.

The rule about any identifier ending with _t surprised me a lot. I think that is a POSIX standard (not sure yet) looking for clarification and official chapter and verse. This is from the GNU libtool manual, listing reserved names.

CesarB provided the following link to the POSIX 2004 reserved symbols and notes 'that many other reserved prefixes and suffixes ... can be found there'. The POSIX 2008 reserved symbols are defined here. The restrictions are somewhat more nuanced than those above.

Sawbuck answered 23/10, 2008 at 7:2 Comment(48)
Just a note - with the exception of numbering, what Martin quoted from the draft standard is exactly what's in the C++03 standard (17.4.3.1.2).Westing
Your summary doesn't say the same thing as the quote from the StandardGrandioso
global names are different from "any identifier"Win
@Adam Mitz: Global names also covers MACROS. Which will splatter your identifiers into a mush. This is what I was trying to convey.Regeniaregensburg
Here is the official chapter and verse, please add to your already excellent answer: opengroup.org/onlinepubs/009695399/functions/xsh_chap02_02.html (and notice that many other reserved prefixes and suffixes you didn't mention can be found there).Anemophilous
The C++ standard doesn't "import" the C one, does it? They import certain headers, but not the language as a whole, or naming rules, as far as I know. But yeah, the _t one surprised me as well. But since it's C, it can only apply to the global ns. Should be safe to use _t inside classes as I read itEdison
@jalf: The C++ standard is defined in terms of the C standard. Basically it says the C++ is C with these differences and additions.Regeniaregensburg
Martin, in the answer you say "This at least means they are not macros.." which i read as "global names are not macros", which i also think they are not. macros are not members of ::, and are thus not global. but in the comment you say "global names also covers MACROS".Doley
what is your final opinion on that? i've seen you added that thing into the answer after you did your comment. so do you have the same opinion as me with that macros are not global names?Doley
Where does the C++ standard distinguish between things reserved "for the compiler" and things reserved "for the OS and libraries", please? I've seen where it reserves names to the implementation, but not where it specifies any distinction between a "compiler", "OS" and "libraries" as components of the implementation.Nutter
The C++ Standard doesn't "import" the C Standard. It references the C Standard. The C++ library introduction says "The library also makes available the facilities of the Standard C Library". It does that by including headers of the C Standard library with appropriate changes, but not by "importing" it. The C++ Standard has an own set of rules that describes the reserved names. If a name reserved in C should be reserved in C++, that is the place to say this. But the C++ Standard doesn't say so. So i don't believe that things reserved in C are reserved in C++ - but i could well be wrong.Doley
This is what I found about the "_t" issue: n1256 (C99 TC3) says: "Typedef names beginning with int or uint and ending with _t" are reserved. I think that still allows using names like "foo_t" - but i think these are then reserved by POSIX.Doley
From the C++ standard 1.1. <quote>C++ is a general purpose programming language based on the C programming language as described in ISO/IEC 9899:1990 Programming languages — C (1.2). In addition to the facilities provided by C, C++ provides additional data types</quote>. My reading of this is that anything reserved in C is also reserved in C++ unless otherwise explicitly stated otherwise.Regeniaregensburg
As noted in the main article the '_t' suffix is reserved only by the POSIX standard not the C standard.Regeniaregensburg
So 'tolerance' is reserved by POSIX as it starts with 'to' + a lowercase letter? I bet a lot of code breaks this rule!Congruent
@Sjoerd: Probably. Though I am sure that you will be fine as long as lerance does not become a real verb that can be applied to characters. Also note it is only reserved in global scope (C)_or the standard namespace (C++) so you can have function variables with this name without breaking the rule.Regeniaregensburg
@ReubenMorais: No. Read the Posix documentation.Regeniaregensburg
GNU getopt_long() is an offender of all rules: it defines macros no_argument, required_argument and optional_argument.Posturize
@MaximYegorushkin: No rules broken. These identifiers are reserved for the implementation. getopt_long() is part of the GNU implementation of compilers and standard libraries.Regeniaregensburg
@LokiAstari, "The C++ standard is defined in terms of the C standard. Basically it says the C++ is C with these differences and additions." Nonsense! C++ only references the C standard in [basic.fundamental] and the library. If what you say is true, where does C++ say that _Bool and _Imaginary don't exist in C++? The C++ language are defined explicitly, not in terms of "edits" to C, otherwise the standard could be much shorter!Psychosurgery
@JonathanWakely: I was referring to the second paragraph in the standard: <quote>C++ is a general purpose programming language based on the C programming language as described in ISO/IEC 9899:1999 Programming languages — C (hereinafter referred to as the C standard). In addition to the facilities provided by C, C++ provides additional data types, classes, templates, exceptions, namespaces, operator overloading, function name overloading, references, free store management operators, and additional library facilities.</quote>Regeniaregensburg
If you interpret me above statement to mean something different, I apologies for being in-exact.Regeniaregensburg
@LokiAstari, that's a very general statement describing the scope of the language, it doesn't mean everything in C is imported into C++. The C++ language (not library) is precisely defined by its own standard, not by reference to another, except for one reference in [basic.fundamental].Psychosurgery
@JonathanWakely: <quote>In addition to the facilities provided by C, C++ provides additional ....</quote>. But you also have to take the comment in the context of the discussion as a whole. We are talking about "Reserved Names" or more particularly "underscores". Thus what I was trying to convey is that reserved names in C are also reserved in C++. litb disagrees with that interpretation and I know he reads the standard very carefully. But this is a conversation resolved over a year ago.Regeniaregensburg
In C++ I only see [lex.name] and for global names [global.names]. Can you explain how the fact that C++ is based on the C standard and C99 is a normative reference make C99 rules apply to C++. thanksOvercloud
see [intro.refs] from the standard it describes what that means. See here to get a copyRegeniaregensburg
@LokiAstari: I think your statement is backward. One of the facilities of C is that you can use the identifiers it doesn't reserve; so if we're going to consider it literally relevant that C++ includes "the facilities provided by C", then the identifiers reserved by C++ would actually have to be (at most) a subset of those reserved by C, not a superset. (But in fact, you and I both know that C++ does reserve some identifiers that C does not, so apparently the "facilities provided by C" statement is not literally relevant.)Sadducee
@ruakh: I provide above the quote from the C standard 7.1.3 Reserved identifiers. Please re-read.Regeniaregensburg
@LokiAstari: The problem is -- what would the question be? "Does C++ leave everything undefined that C leaves undefined?" is too tendentious (I assume you can't be going that far), whereas "Are all identifiers reserved in C, also reserved in C++?" would be closed as a dupe of this one. Should I just quote the section of the C++ spec that you quote, and ask what its normative consequences are?Sadducee
It might be useful to know that most of the POSIX reserved symbols are only reserved when including the corresponding include file, i..e "int stringptr" is "legal" until you include <string.h>.Operculum
@LokiAstari I understand that these kind of standards are necessary for C++. But for instance in Java, there are only a few reserved field names (e.g. serialVersionUID), and certainly no standards like variables ending in _t are reserved, because the language was designed such that everything is in a namespace. Are you saying that any language that can be compiled to machine code on multiple platforms would need to have these reserved variable name standards?Meyeroff
I wonder if there would be any problem specifying that a particular prefix was reserved for macros defined by future language versions, with a proviso that implementations must either process them in accordance with a C standard or leave them undefined. That would make it possible for code using certain new features to work on old compilers by defining macros to emulate them. For example, if __CPP_EITHER(x,y) took two expressions or statements and allowed a compiler to choose between them in arbitrary fashion (hopefully depending upon which could be compiled more efficiently), then...Stedfast
...code using that directive could work on existing implementations by simply #ifndef __CPP_EITHER/#define __CPP_EITHER(x,y) x/#endif, but an implementation that understood the directive could use it to improve code generation in cases where it could tell y would be more efficient than x (in cases where it couldn't tell, it could simply use x).Stedfast
@Congruent roughly, yes. It says that any implementation can define a new ctype function tofoo for any identifier foo beginning with a letter, including lerance. If that happens and it causes a clash with your own global, well, you were warned. The practical impact to you is small, but it gives POSIX and implementers breathing room to add stuff without endless quibbling.Chemmy
The rules may be better updated to reflect the fact that reserved name rules are moved from library (Clause 17) to core language (Clause 2) in current C++ standard working draft.Melchior
It should be noted that compilers will Not check if any these reservation rules are violated, so if they are used in code it may work today but break( potentially in a subtle way) the next time some innocuous seeming upgrade or patch is applied.Militate
If these rules are broken, does it cause undefined behaviour?Hankypanky
@MaxBarraclough Yes. Which could mean nothing happens. See Section 5.10 Identifiers. Paragraph 3 In addition, some identifiers are reserved for use by C++ implementations and **shall not be used otherwise**; no diagnostic is required.Regeniaregensburg
@MaxBarraclough The important term here is Shall Not. If you look at C++ Section 3 Terms and definitions For the purposes of this document, the terms and definitions given in ISO/IEC 2382-1:1993, the terms, definitions, and symbols given in ISO 80000-2:2009, and the following apply. You can search for these terms here: iso.org/obp/ui => is required to be not .Regeniaregensburg
@MaxBarraclough Thus if you break this condition your code is non conforming. If we then read Section 4 General principles paragraph 2.3 If a program contains a violation of a rule for which no diagnostic is required, this document places no requirement on implementations with respect to that program.Regeniaregensburg
@MaxBarraclough And finally. Looking at Section 3.27 undefined behavior behavior for which this document imposes no requirements.Regeniaregensburg
@MartinYork: What are there requirements for a conforming C program? In every version of the Standard I've seen, violation of a constraint would mean a program isn't strictly conforming, but implementations are allowed to document extensions that waive constraints, and a program that runs on such an implementation would be conforming even though it violates a constraint.Stedfast
@Stedfast I rarely use C so I don't know.Regeniaregensburg
@MartinYork: Does the C++ Standard define a concept of performance for programs, or merely implementations? I seem to recall the prologue states that any reference to things programs may or may not do is purely meant to be interpreted with regards to the requirements for implementations.Stedfast
@MartinYork: The distinction is important because implementations are allowed to extend the language so expand the range of programs they can process usefully, and such expansion can include programs that violate constraints. Violating a constraint doesn't make a program non-conforming (since there is no such concept), but instead means that implementations need not process the program meaningfully if they don't wish to do so.Stedfast
@Stedfast Why are you asking in the comments (this is not the correct place for this discussion). Seems like you should ask this as a question. People with knowledge will then try and answer.Regeniaregensburg
I don't believe this is spelled out in the standard, but does an identifier with a "triple underscore" (___) always count as having a double underscore? I... believe it should? But empirical evidence on my end shows that some people may find a triple underscore to be acceptable.Dill
But the actual wording is: Each identifier that contains a double underscore __ If you have a triple it contains a double!Regeniaregensburg
C
222

The rules to avoid collision of names are both in the C++ standard (see Stroustrup book) and mentioned by C++ gurus (Sutter, etc.).

Personal rule

Because I did not want to deal with cases, and wanted a simple rule, I have designed a personal one that is both simple and correct:

When naming a symbol, you will avoid collision with compiler/OS/standard libraries if you:

  • never start a symbol with an underscore
  • never name a symbol with two consecutive underscores inside.

Of course, putting your code in an unique namespace helps to avoid collision, too (but won't protect against evil macros)

Some examples

(I use macros because they are the more code-polluting of C/C++ symbols, but it could be anything from variable name to class name)

#define _WRONG
#define __WRONG_AGAIN
#define RIGHT_
#define WRONG__WRONG
#define RIGHT_RIGHT
#define RIGHT_x_RIGHT

Extracts from C++0x draft

From the n3242.pdf file (I expect the final standard text to be similar):

17.6.3.3.2 Global names [global.names]

Certain sets of names and function signatures are always reserved to the implementation:

— Each name that contains a double underscore _ _ or begins with an underscore followed by an uppercase letter (2.12) is reserved to the implementation for any use.

— Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.

But also:

17.6.3.3.5 User-defined literal suffixes [usrlit.suffix]

Literal suffix identifiers that do not start with an underscore are reserved for future standardization.

This last clause is confusing, unless you consider that a name starting with one underscore and followed by a lowercase letter would be Ok if not defined in the global namespace...

Crawly answered 23/10, 2008 at 7:27 Comment(8)
@Meysam : __WRONG_AGAIN__ contains two consecutive underscores (two at the beginning, and two at the end), so this is wrong according to the standard.Crawly
@BЈовић : WRONG__WRONG contains two consecutive underscores (two in the middle), so this is wrong according to the standardCrawly
putting your code in an unique namespace helps to avoid collision, too: but this is still not enough, since the identifier may collide with a keyword regardless of scope (e.g. __attribute__ for GCC).Centrality
Why is there any problem of having two consecutive underscores in the middle according to the standard? User-defined literal suffixes apply to literal values like 1234567L or 4.0f; IIRC this refers to ohttp://en.cppreference.com/w/cpp/language/user_literalAntiperistalsis
Why is there any problem of having two consecutive underscores in the middle according to the standard? Because the standard say those are reserved. This is not an advice on good or bad style. It's a decision from the standard. Why they decided this? I guess the first compilers already used such conventions informally before standardization.Crawly
I was unable to find the [global.names] clause or something similar in the current draft of the standard (eel.is/c++draft). It seems to have been removed.Ginnie
@Crawly it was originally so that compilers would always have an easy way to mangle names. In modern times it's not as useful, but retained for backwards compatibility.Gnash
@CoffeeTableEspresso: I'm still puzzled as to why any implementation would require that no source-code names contain double underscores. Even if an existing implementation exported double-underscore names itself and forbade them in source code, such an implementation could add support for such names without breaking linker compatibility with any existing object files by e.g. specifying that any run of N underscores in a source-code name would be replaced by 2N+1 underscores in the linker name.Stedfast
K
50

From MSDN:

Use of two sequential underscore characters ( __ ) at the beginning of an identifier, or a single leading underscore followed by a capital letter, is reserved for C++ implementations in all scopes. You should avoid using one leading underscore followed by a lowercase letter for names with file scope because of possible conflicts with current or future reserved identifiers.

This means that you can use a single underscore as a member variable prefix, as long as it's followed by a lower-case letter.

This is apparently taken from section 17.4.3.1.2 of the C++ standard, but I can't find an original source for the full standard online.

See also this question.

Kantos answered 23/10, 2008 at 7:6 Comment(9)
I found a similar text in n3092.pdf (the draft of C++0x standard) at section: "17.6.3.3.2 Global names"Crawly
Interestingly, this seems to be the only answer which has direct, concise answer to the question.Byre
@hyde: Actually, it isn't, since it's skipping the rule to not to have any identifiers with a leading underscore in the global namespace. See Roger's answer. I'd be very wary of citations of MS VC docs as an authority on the C++ standard.Wellbalanced
@Wellbalanced I was referring to "you can use a single underscore as a member variable prefix, as long as it's followed by a lower-case letter" in this answer, which answers the question on the question text directly and concisely, without being drowned in a wall of text.Byre
First, I still consider the lack of any hint that the same rule does not apply to the global namespace a failure. What's worse, though, is that adjacent underscores are forbidden not only at the beginning of, but anywhere in, an identifier. So this answer isn't merely omitting a fact, but actually makes at least one actively wrong claim. As I said, referring to the MSVC docs is something I wouldn't do unless the question is solely about VC.Wellbalanced
@sbi: The internal-double-underscore rule was designed to reserve such identifiers for type-mangled names, but I would think names with double underscores could have been accommodated by saying that any occurrences of __ generated by type-based mangling would be __x, and then saying that any occurrences of __ in the specified name would be replaced with __y before such mangling.Stedfast
What about single underscore as a complete member variable name?Sweeten
@Wellbalanced There is irony in that VC complies to ISO C++ which reserves names with single underscore as well, renaming some of posix functions at same time, e.g. _dup() instead of dup()Heresiarch
While this answer is not wrong, it also is not about the actual question: This answer ONLY applies to the implementation of MSVC. The c++-standard says that __ is reserved for implementations, MSVC says that they only use it at the beginning of identifiers and you are free to use it elsewhere. Writing such code would mean it is not portable but compliant with MSVC.Broadleaf
O
29

As for the other part of the question, it's common to put the underscore at the end of the variable name to not clash with anything internal.

I do this even inside classes and namespaces because I then only have to remember one rule (compared to "at the end of the name in global scope, and the beginning of the name everywhere else").

Obsessive answered 14/11, 2008 at 20:3 Comment(0)
A
2

Yes, underscores may be used anywhere in an identifier. I believe the rules are: any of a-z, A-Z, _ in the first character and those +0-9 for the following characters.

Underscore prefixes are common in C code -- a single underscore means "private", and double underscores are usually reserved for use by the compiler.

Althorn answered 23/10, 2008 at 7:5 Comment(5)
They are common in libraries. They should not be common in user code.Regeniaregensburg
People do write libraries in C, you know.Althorn
"Yes, underscores may be used anywhere in an identifier." This is wrong for global identifiers. See Roger's answer.Wellbalanced
@Wellbalanced According to the C and C++ standards, yes, semantically, global identifiers with leading underscores are reserved. They are syntactically valid identifiers though, and the compiler won't stop you from naming a function _Foo, though by doing so you're relying on nonstandard implementation details and thus risk having your code broken by future versions of the language/standard library implementation/OS.Provoke
@BenW: TTBOMK, the C++ standard simply says that global identifiers starting with an underscore are not allowed, without making any distinction between syntax and semantic. (Also any identifiers starting with an underscore followed by a capital letter, and an identifiers with two consecutive underscores.)Wellbalanced
F
0

Firstly, the rules in current working draft are laid out in [lex.name] p3:

In addition, some identifiers appearing as a token or preprocessing-token are reserved for use by C++ implementations and shall not be used otherwise; no diagnostic is required.

  • Each identifier that contains a double underscore __ or begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.
  • Each identifier that begins with an underscore is reserved to the implementation for use as a name in the global namespace.

Furthermore, the standard library reserves all names defined in namespace std and some zombie names; see [reserved.names.general].

What about POSIX?

As the accepted answer has pointed out, there may be other parts of the implementation, like the POSIX standard, which limit the identifiers you can use.

Each identifier with file scope described in the header section is reserved for use as an identifier with file scope in the same name space if the header is included.

ANY Header [reserves] Suffix _t

- POSIX 2008 Standard, 2.2.2

In C++, almost all problems associated with POSIX can be avoided through namespaces. This is also why the C++ standard can add tons of symbols like std::enable_if_t without breaking POSIX compatibility.

Visualization

int x;      // OK
int x_;     // OK
int _x;     // RESERVED
int x__;    // RESERVED (OK in C)
int __x;    // RESERVED
int _X;     // RESERVED
int assert; // RESERVED (macro name)
int x_t;    // RESERVED (only by POSIX)

namespace {
int y;      // OK
int y_;     // OK
int _y;     // OK
int y__;    // RESERVED (OK in C, ignoring namespaces)
int __y;    // RESERVED
int _Y;     // RESERVED
int assert; // RESERVED (macro name)
int y_t;    // OK
}

The above rules for y apply to both named and unnamed namespaces. Either way, in the following namespace, the rules of the global namespace no longer apply (see [namespace.unnamed]).

The above rules for y also apply to identifiers in classes, functions, etc.; anything but global scope.

Even though assert isn't used like a function-style macro here, the name is reserved. This is also why proposal P2884 contemplates making it a keyword in C++26, with some success so far.

Recommended Practice

To be safe, always avoid double underscores, and always avoid nams with leading underscores. The latter are okay in some cases, but it's difficult to memorize these rules, and it's better to be safe than sorry.

What about _ in itself?

Some people use _ to indicate that some variable or function parameter isn't used. However, you can avoid this with:

void foo(T _) { /* ... */ }
// replace with:
void foo(T) { /* ... */ }

std::scoped_lock _{mutex};
// replace with:
std::scoped_lock lock{mutex};

You can also cast a parameter p to void like (void)p, if this is about silencing warnings about p being unused, and you need C compatibility. See Why cast unused return values to void?.

Fall answered 3/9, 2023 at 20:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.