What are all the common undefined behaviours that a C++ programmer should know about? [closed]

Asked 15/12, 2008 at 6:55 Answered 26/6, 2012 at 16:10

Solved c++undefined undefined-behavior c++-faq

200

What are all the common undefined behaviours that a C++ programmer should know about?

Say, like:

a[i] = i++;

Callean answered 15/12, 2008 at 6:55 Comment(11)

Are you sure. That looks well defined. – Musette 15/12, 2008 at 7:10

6.2.2 Evaluation Order [expr.evaluation] in The C++ programming language say so.I dont have any other reference – Callean 15/12, 2008 at 7:24

He's right.. just looked at 6.2.2 in The C++ Programming Language and it says v[i] = i++ is undefined – Indocile 15/12, 2008 at 7:24

I would imagine because the the comiler make execute the i++ before or after calculating the memory location of v[i]. sure, i is always going to be assigned there. but it could write to either v[i] or v[i+1] depending on order of operations.. – Ulrikeulster 15/12, 2008 at 7:44

Martin, the variable will get updated some time after it is read on the right side. That may before or after the variable is read to evaluate the left side. – Stegman 15/12, 2008 at 7:47

@Rob: isn't that what I had just said? – Ulrikeulster 15/12, 2008 at 7:49

As an example: it could either go: (psuedo asm): "lea edi, [v + i]; mov [edi], i; inc i" or "mov eax, i; inc i; lea edi, [v + i]; mov [edi], eax" – Evan Teran – Ulrikeulster 15/12, 2008 at 8:5

All that The C++ Programming Language says is "The order of operations of subexpressions within an expression is undefined. In particular, you cannot assume that the expression is evaluated left to right." – Indocile 15/12, 2008 at 8:14

OK. After reading n2521 Section 5.2.6 I think I have it. See me answer below. – Musette 15/12, 2008 at 8:46

It seems to be a rite-of-passage these days for a moderator to force-close a popular question, even ones (like this one) that are gray-area (and thus should be decided by the community at large, not a single overzealous moderator). – Whipperin 26/7, 2013 at 21:21

Exactly, this seems to me a perfect example of good contribution to SO. – Salema 26/10, 2013 at 21:42

232

Pointer

Dereferencing a NULL pointer
Dereferencing a pointer returned by a "new" allocation of size zero
Using pointers to objects whose lifetime has ended (for instance, stack allocated objects or deleted objects)
Dereferencing a pointer that has not yet been definitely initialized
Performing pointer arithmetic that yields a result outside the boundaries (either above or below) of an array.
Dereferencing the pointer at a location beyond the end of an array.
Converting pointers to objects of incompatible types
Using memcpy to copy overlapping buffers.

Buffer overflows

Reading or writing to an object or array at an offset that is negative, or beyond the size of that object (stack/heap overflow)

Integer Overflows

Signed integer overflow
Evaluating an expression that is not mathematically defined
Left-shifting values by a negative amount (right shifts by negative amounts are implementation defined)
Shifting values by an amount greater than or equal to the number of bits in the number (e.g. int64_t i = 1; i <<= 72 is undefined)

Types, Cast and Const

Casting a numeric value into a value that can't be represented by the target type (either directly or via static_cast)
Using an automatic variable before it has been definitely assigned (e.g., int i; i++; cout << i;)
Using the value of any object of type other than volatile or sig_atomic_t at the receipt of a signal
Attempting to modify a string literal or any other const object during its lifetime
Concatenating a narrow with a wide string literal during preprocessing

Function and Template

Not returning a value from a value-returning function (directly or by flowing off from a try-block)
Multiple different definitions for the same entity (class, template, enumeration, inline function, static member function, etc.)
Infinite recursion in the instantiation of templates
Calling a function using different parameters or linkage to the parameters and linkage that the function is defined as using.

OOP

Cascading destructions of objects with static storage duration
The result of assigning to partially overlapping objects
Recursively re-entering a function during the initialization of its static objects
Making virtual function calls to pure virtual functions of an object from its constructor or destructor
Referring to nonstatic members of objects that have not been constructed or have already been destructed

Source file and Preprocessing

A non-empty source file that doesn't end with a newline, or ends with a backslash (prior to C++11)
A backslash followed by a character that is not part of the specified escape codes in a character or string constant (this is implementation-defined in C++11).
Exceeding implementation limits (number of nested blocks, number of functions in a program, available stack space ...)
Preprocessor numeric values that can't be represented by a long int
Preprocessing directive on the left side of a function-like macro definition
Dynamically generating the defined token in a #if expression

To be classified

Calling exit during the destruction of a program with static storage duration

Trump answered 15/12, 2008 at 6:55 Comment(8)

Hm... NaN (x / 0) and Infinity (0 / 0) were covered by the IEE 754, if C++ was designed later, why does it record x / 0 as undefined? – Yacano 5/4, 2011 at 13:22

Re: "A backslash followed by a character that is not part of the specified escape codes in a character or string constant." That's UB in C89 (§3.1.3.4) and C++03 (which incorporates C89), but not in C99. C99 says that "the result is not a token and a diagnostic is required" (§6.4.4.4). Presumably C++0x (which incorporates C89) will be the same. – Astronomy 7/6, 2011 at 4:31

The C99 standard has a list of undefined behaviors in appendix J.2. It would take some work to adapt this list to C++. You'd have to change the references to the correct C++ clauses rather than the C99 clauses, remove anything irrelevant, and also check whether all those things really are undefined in C++ as well as C. But it provides a start. – Hancock 7/6, 2011 at 8:13

Quite a useful and impressive list. I need to add this one, as I was just burnt by it: enum size is not defined; it only needs to be large enough to contain an int. "Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined, but shall be capable of representing the values of all the members of the enumeration." – Tolu 9/8, 2012 at 15:30

@Yacano - not all floating point units are IEE754 compatible. If C++ required IEE754 compliance, compilers would need to test and handle the case where the RHS is zero via an explicit check. By making the behaviour undefined, the compiler can avoid that overhead by saying "if you use a non IEE754 FPU, you won't get IEEE754 FPU behaviour". – Fiscus 1/9, 2012 at 14:18

"Shifting values by an amount greater than the log2-size of the container (e.g. __int64 i = (37 << 72) is undefined)". This is a bit misleading, and not always true. That is only undefined behavior if sizeof(int) * CHAR_BIT <= 72 is true. The object you put the result of 37 << 72 in is irrelevant, the expression itself is what invokes undefined behavior. For instance, uint8_t n = (1 << 15); is perfectly fine. You first evaluate 1 << 15, which is 2**15. The result is the then implicitly cast from an int to a uint8_t, which means that n == 0 (due to modular arithmetic) – Bedivere 21/2, 2013 at 17:0

"Evaluating an expression whose result is not in the range of the corresponding types" .... integer overflow is well-defined for UNSIGNED integral types, just not signed ones. – Hoagland 9/8, 2013 at 20:6

I was just about to make the same comment as @nacitarsevaht. – Harlie 19/8, 2013 at 22:7

The order that function parameters are evaluated is unspecified behavior. (This won't make your program crash, explode, or order pizza... unlike undefined behavior.)

The only requirement is that all parameters must be fully evaluated before the function is called.

This:

// The simple obvious one.
callFunc(getA(),getB());

Can be equivalent to this:

int a = getA();
int b = getB();
callFunc(a,b);

Or this:

int b = getB();
int a = getA();
callFunc(a,b);

It can be either; it's up to the compiler. The result can matter, depending on the side effects.

Musette answered 15/12, 2008 at 7:15 Comment(4)

The order is unspecified, not undefined. – Stegman 15/12, 2008 at 7:55

I hate this one :) I lost a day of work once tracking down one of these cases... anyways learned my lesson and haven't fallen again fortunately – Ravid 15/12, 2008 at 8:29

@Rob: I would argue with you about the change in meaning here, but I know the standards committee is very picky on the exact definition of these two words. So I'll just change it :-) – Musette 15/12, 2008 at 8:34

I got lucky on this one. I got bitten by it when I was in college and had a professor who took one look at it and told me my problem in about 5 seconds. No telling how much time I would have wasted debugging otherwise. – Phenix 15/12, 2008 at 16:6

The compiler is free to re-order the evaluation parts of an expression (assuming the meaning is unchanged).

From the original question:

a[i] = i++;

// This expression has three parts:
(a) a[i]
(b) i++
(c) Assign (b) to (a)

// (c) is guaranteed to happen after (a) and (b)
// But (a) and (b) can be done in either order.
// See n2521 Section 5.17
// (b) increments i but returns the original value.
// See n2521 Section 5.2.6
// Thus this expression can be written as:

int rhs  = i++;
int lhs& = a[i];
lhs = rhs;

// or
int lhs& = a[i];
int rhs  = i++;
lhs = rhs;

Double Checked locking. And one easy mistake to make.

A* a = new A("plop");

// Looks simple enough.
// But this can be split into three parts.
(a) allocate Memory
(b) Call constructor
(c) Assign value to 'a'

// No problem here:
// The compiler is allowed to do this:
(a) allocate Memory
(c) Assign value to 'a'
(b) Call constructor.
// This is because the whole thing is between two sequence points.

// So what is the big deal.
// Simple Double checked lock. (I know there are many other problems with this).
if (a == null) // (Point B)
{
    Lock   lock(mutex);
    if (a == null)
    {
        a = new A("Plop");  // (Point A).
    }
}
a->doStuff();

// Think of this situation.
// Thread 1: Reaches point A. Executes (a)(c)
// Thread 1: Is about to do (b) and gets unscheduled.
// Thread 2: Reaches point B. It can now skip the if block
//           Remember (c) has been done thus 'a' is not NULL.
//           But the memory has not been initialized.
//           Thread 2 now executes doStuff() on an uninitialized variable.

// The solution to this problem is to move the assignment of 'a'
// To the other side of the sequence point.
if (a == null) // (Point B)
{
    Lock   lock(mutex);
    if (a == null)
    {
        A* tmp = new A("Plop");  // (Point A).
        a = tmp;
    }
}
a->doStuff();

// Of course there are still other problems because of C++ support for
// threads. But hopefully these are addresses in the next standard.

Musette answered 15/12, 2008 at 7:35 Comment(18)

what is mean by sequence point? – Callean 15/12, 2008 at 8:25

en.wikipedia.org/wiki/Sequence_point – Musette 15/12, 2008 at 8:36

Ooh... that's nasty, especially since I've seen that exact structure recommended in Java – Unbend 15/12, 2008 at 14:44

Note that some compilers do define the behaviour in this situation. In VC++ 2005+, for example, if a is volatile, the needed memory bariers are set up to prevent instruction reordering so that double-checked locking works. – Cobalt 23/6, 2009 at 16:11

Martin York: <i>// (c) is guaranteed to happen after (a) and (b)</i> Is it? Admittedly in that particular example the only scenario where it could matter would be if 'i' was a volatile variable mapped to a hardware register, and a[i] (old value of 'i') was aliased to it, but is there any guarantee that the increment will happen before a sequence point? – Zennie 14/8, 2010 at 20:18

@supercat: Yes it is guaranteed. Both sides of the = must be evaluated before it can be evaluated. – Musette 12/10, 2010 at 14:22

@Martin York: The expression "i=i++;" is commonly given as an example of undefined behavior in C (I know the behavior is defined in some other languages like Java). If the side effects of the increment are not guaranteed to be over and done with by the time an "i=i++;" assignment occurs, why would they be guaranteed to be complete by the time "a[i]=i++;" occurs? – Zennie 12/10, 2010 at 15:13

@supercat: OK I see where you are coming from. You are correct the effect of the operator ++ (on i) is not guaranteed to be done by that point. – Musette 12/10, 2010 at 15:32

a[i] = i++; is undefined behavior anyway. The compiler is free to do more than just reorder it, it can ignore it or do the thing with the nose. i is modified on the RHS, and on the LHS it is read other than for the purpose of determining the value to be written. Game over (unless i is a class type with operator++ overload, then any object modifications inside that overload are safely wrapped in sequence points, and it's just unspecified order not UB). – Hancock 7/6, 2011 at 8:18

@Steve: Absolutely correct. I was just trying to explain why it needs to be undefined (and some probable implementation explanations). – Musette 7/6, 2011 at 14:23

@Martin: not sure I agree that this explains why it has to be undefined. There are other situations in which the order of operations is unspecified, without invoking UB. For example in the case I mentioned where i is a class type that imitates an integer with suitable operator++(int) and operator int(), your two either/or options are both still possible, but there's no UB. That said, maybe in saying "undefined behavior", the questioner means any situation that behavior isn't completely nailed down, as opposed to the standard's definition of UB. – Hancock 7/6, 2011 at 16:22

Your double-checked lock example is vicious. ಥ_ಥ – Worshipful 24/9, 2011 at 23:30

Hi, I was wandering what other problems for the second version of sigleton implementation. Seems like it is perfect. – Midgard 20/9, 2013 at 15:2

@zoujyjs: If you want to create a singelton with pointers you are doing it wrong. See C++ Singleton design pattern – Musette 20/9, 2013 at 15:16

Yeah, I read about that also in "Effective C++" Item 04. And that one is guaranteed thread-safe after c++0x. But why not pointers? And if not what if I just return the dereference of the pointers? Because singleton is supposed to be there all through the life of a process, I didn't need to destory it with a explicit pointer. – Midgard 20/9, 2013 at 15:31

Particularly, I want to know if the code above is considered a fairly good implementation of the DCLP. – Midgard 20/9, 2013 at 15:32

@zoujyjs: No its not good in C++03. There are issues with threading and double checked locking in C++ that makes it impossible to do correctly; there is a famous paper by Sutter on it. I believe it is fixed for C++11 but you will need to use new features not documented here. – Musette 20/9, 2013 at 18:39

@zoujyjs: Here it is: C++ and The Perils of Double-Checked Locking Actually by Meyers and Alexandrescu – Musette 20/9, 2013 at 18:49

Assigning to a constant after stripping constness using const_cast<>:

const int i = 10; 
int *p =  const_cast<int*>( &i );
*p = 1234; //Undefined

Callean answered 15/12, 2008 at 7:36 Comment(0)

My favourite is "Infinite recursion in the instantiation of templates" because I believe it's the only one where the undefined behaviour occurs at compile time.

Consensus answered 15/12, 2008 at 8:17 Comment(5)

Done this before, but I don't see how its undefined. Its quite obvious your doing an infinite recursion in afterthought. – Ravid 15/12, 2008 at 8:25

The problem is that the compiler cannot examine your code and decide precisely whether it will suffer from infinite recursion or not. It's an instance of the halting problem. See: stackoverflow.com/questions/235984/… – Consensus 15/12, 2008 at 9:17

Yeah its definitely a halting problem – Ravid 16/12, 2008 at 1:44

it made my system crash because of swapping caused by too little memory. – Ey 28/12, 2008 at 11:13

Preprocessor constants that don't fit into an int is also compile time. – Lap 17/8, 2010 at 21:1

Besides undefined behaviour, there is also the equally nasty implementation-defined behaviour.

Undefined behaviour occurs when a program does something the result of which is not specified by the standard.

Implementation-defined behaviour is an action by a program the result of which is not defined by the standard, but which the implementation is required to document. An example is "Multibyte character literals", from Stack Overflow question Is there a C compiler that fails to compile this?.

Implementation-defined behaviour only bites you when you start porting (but upgrading to new version of compiler is also porting!)

Tuberculate answered 15/12, 2008 at 14:49 Comment(0)

Variables may only be updated once in an expression (technically once between sequence points).

int i =1;
i = ++i;

// Undefined. Assignment to 'i' twice in the same expression.

Musette answered 15/12, 2008 at 7:22 Comment(2)

Infact at least once between two sequence points. – Latticed 12/8, 2010 at 13:10

@Prasoon: I think you meant : at most once between two sequence points. :-) – Mystagogue 7/6, 2011 at 9:24

A basic understanding of the various environmental limits. The full list is in section 5.2.4.1 of the C specification. Here are a few;

127 parameters in one function deﬁnition
127 arguments in one function call
127 parameters in one macro deﬁnition
127 arguments in one macro invocation
4095 characters in a logical source line
4095 characters in a character string literal or wide string literal (after concatenation)
65535 bytes in an object (in a hosted environment only)
15nesting levels for #includedﬁles
1023 case labels for a switch statement (excluding those for anynested switch statements)

I was actually a bit surprised at the limit of 1023 case labels for a switch statement, I can forsee that being exceeded for generated code/lex/parsers fairly easially.

If these limits are exceeded, you have undefined behavior (crashes, security flaws, etc...).

Right, I know this is from the C specification, but C++ shares these basic supports.

Peseta answered 19/3, 2011 at 5:3 Comment(2)

If you hit these limits, you've got more problems than undefined behavior. – Yacano 22/4, 2011 at 2:0

You could EASILY exceed 65535 bytes in an object, such as an STD::vector – Cutlerr 13/9, 2013 at 4:15

The only type for which C++ guarantees a size is char. And the size is 1. The size of all other types is platform dependent.

Heins answered 15/12, 2008 at 7:6 Comment(9)

Isn't that what <cstdint> is for? It defines types such as uint16_6 et cetera. – Lurie 15/12, 2008 at 7:54

Yes, but the size of most types, say long, is not well defined. – Heins 15/12, 2008 at 8:4

also cstdint isn't part of the current c++ standard yet. see boost/stdint.hpp for a currently portable solution. – Ulrikeulster 15/12, 2008 at 8:7

That's not undefined behaviour. The standard says that conforming platform defines the sizes, rather than the standard defining them. – Consensus 15/12, 2008 at 8:16

Also not that the standard does not define how much 1 byte is. It is at least 8 bits, but anything above that is allowed, so a C++-byte is not necessarily equal to a real life byte / anyways, I vote you up as this does not deserve a down-vote. – Bodycheck 2/2, 2010 at 8:37

@JaredPar: This is not strictly true. A long is guaranteed to be at least 32 bits. – Boutin 26/6, 2012 at 16:23

@JohnDibling i looked through that post and the linked ones but I couldn't find a definitive source for it being 32 bits. One user proclaimed such but there was no reference to a spec line which made the garuantee – Heins 29/6, 2012 at 17:57

@JaredPar: It's a complex post with a lot of threads of conversation, so I summed it all up here. The bottom line is this: "5. In order to represent -2147483647 and +2147483647 in binary, you need 32 bits." – Boutin 29/6, 2012 at 18:17

For similar reasons, an int is at least 16 bits, and a long long is at least 64 (added to the standard in C++11). – Bedivere 21/2, 2013 at 17:26

Namespace-level objects in a different compilation units should never depend on each other for initialization, because their initialization order is undefined.

Callean answered 22/12, 2009 at 5:28 Comment(0)

Using memcpy to copy between overlapping memory regions. For example:

char a[256] = {};
memcpy(a, a, sizeof(a));

The behavior is undefined according to the C Standard, which is subsumed by the C++03 Standard.

7.21.2.1 The memcpy function

Synopsis

1/ #include void *memcpy(void * restrict s1, const void * restrict s2, size_t n);

Description

2/ The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined. Returns 3 The memcpy function returns the value of s1.

7.21.2.2 The memmove function

Synopsis

1 #include void *memmove(void *s1, const void *s2, size_t n);

Description

2 The memmove function copies n characters from the object pointed to by s2 into the object pointed to by s1. Copying takes place as if the n characters from the object pointed to by s2 are first copied into a temporary array of n characters that does not overlap the objects pointed to by s1 and s2, and then the n characters from the temporary array are copied into the object pointed to by s1. Returns

3 The memmove function returns the value of s1.

Boutin answered 26/6, 2012 at 16:10 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++