What are all the common undefined behaviours that a C++ programmer should know about?
Say, like:
a[i] = i++;
What are all the common undefined behaviours that a C++ programmer should know about?
Say, like:
a[i] = i++;
NULL
pointermemcpy
to copy overlapping buffers.int64_t i = 1; i <<= 72
is undefined)int i; i++; cout << i;
)volatile
or sig_atomic_t
at the receipt of a signallong int
#if
expressionsizeof(int) * CHAR_BIT <= 72
is true. The object you put the result of 37 << 72
in is irrelevant, the expression itself is what invokes undefined behavior. For instance, uint8_t n = (1 << 15);
is perfectly fine. You first evaluate 1 << 15
, which is 2**15. The result is the then implicitly cast from an int
to a uint8_t
, which means that n == 0
(due to modular arithmetic) –
Bedivere The order that function parameters are evaluated is unspecified behavior. (This won't make your program crash, explode, or order pizza... unlike undefined behavior.)
The only requirement is that all parameters must be fully evaluated before the function is called.
This:
// The simple obvious one.
callFunc(getA(),getB());
Can be equivalent to this:
int a = getA();
int b = getB();
callFunc(a,b);
Or this:
int b = getB();
int a = getA();
callFunc(a,b);
It can be either; it's up to the compiler. The result can matter, depending on the side effects.
The compiler is free to re-order the evaluation parts of an expression (assuming the meaning is unchanged).
From the original question:
a[i] = i++;
// This expression has three parts:
(a) a[i]
(b) i++
(c) Assign (b) to (a)
// (c) is guaranteed to happen after (a) and (b)
// But (a) and (b) can be done in either order.
// See n2521 Section 5.17
// (b) increments i but returns the original value.
// See n2521 Section 5.2.6
// Thus this expression can be written as:
int rhs = i++;
int lhs& = a[i];
lhs = rhs;
// or
int lhs& = a[i];
int rhs = i++;
lhs = rhs;
Double Checked locking. And one easy mistake to make.
A* a = new A("plop");
// Looks simple enough.
// But this can be split into three parts.
(a) allocate Memory
(b) Call constructor
(c) Assign value to 'a'
// No problem here:
// The compiler is allowed to do this:
(a) allocate Memory
(c) Assign value to 'a'
(b) Call constructor.
// This is because the whole thing is between two sequence points.
// So what is the big deal.
// Simple Double checked lock. (I know there are many other problems with this).
if (a == null) // (Point B)
{
Lock lock(mutex);
if (a == null)
{
a = new A("Plop"); // (Point A).
}
}
a->doStuff();
// Think of this situation.
// Thread 1: Reaches point A. Executes (a)(c)
// Thread 1: Is about to do (b) and gets unscheduled.
// Thread 2: Reaches point B. It can now skip the if block
// Remember (c) has been done thus 'a' is not NULL.
// But the memory has not been initialized.
// Thread 2 now executes doStuff() on an uninitialized variable.
// The solution to this problem is to move the assignment of 'a'
// To the other side of the sequence point.
if (a == null) // (Point B)
{
Lock lock(mutex);
if (a == null)
{
A* tmp = new A("Plop"); // (Point A).
a = tmp;
}
}
a->doStuff();
// Of course there are still other problems because of C++ support for
// threads. But hopefully these are addresses in the next standard.
=
must be evaluated before it can be evaluated. –
Musette a[i] = i++;
is undefined behavior anyway. The compiler is free to do more than just reorder it, it can ignore it or do the thing with the nose. i
is modified on the RHS, and on the LHS it is read other than for the purpose of determining the value to be written. Game over (unless i
is a class type with operator++
overload, then any object modifications inside that overload are safely wrapped in sequence points, and it's just unspecified order not UB). –
Hancock i
is a class type that imitates an integer with suitable operator++(int)
and operator int()
, your two either/or options are both still possible, but there's no UB. That said, maybe in saying "undefined behavior", the questioner means any situation that behavior isn't completely nailed down, as opposed to the standard's definition of UB. –
Hancock Assigning to a constant after stripping const
ness using const_cast<>
:
const int i = 10;
int *p = const_cast<int*>( &i );
*p = 1234; //Undefined
My favourite is "Infinite recursion in the instantiation of templates" because I believe it's the only one where the undefined behaviour occurs at compile time.
Besides undefined behaviour, there is also the equally nasty implementation-defined behaviour.
Undefined behaviour occurs when a program does something the result of which is not specified by the standard.
Implementation-defined behaviour is an action by a program the result of which is not defined by the standard, but which the implementation is required to document. An example is "Multibyte character literals", from Stack Overflow question Is there a C compiler that fails to compile this?.
Implementation-defined behaviour only bites you when you start porting (but upgrading to new version of compiler is also porting!)
Variables may only be updated once in an expression (technically once between sequence points).
int i =1;
i = ++i;
// Undefined. Assignment to 'i' twice in the same expression.
A basic understanding of the various environmental limits. The full list is in section 5.2.4.1 of the C specification. Here are a few;
I was actually a bit surprised at the limit of 1023 case labels for a switch statement, I can forsee that being exceeded for generated code/lex/parsers fairly easially.
If these limits are exceeded, you have undefined behavior (crashes, security flaws, etc...).
Right, I know this is from the C specification, but C++ shares these basic supports.
The only type for which C++ guarantees a size is char
. And the size is 1. The size of all other types is platform dependent.
long
is guaranteed to be at least 32 bits. –
Boutin int
is at least 16 bits, and a long long
is at least 64 (added to the standard in C++11). –
Bedivere Namespace-level objects in a different compilation units should never depend on each other for initialization, because their initialization order is undefined.
Using memcpy
to copy between overlapping memory regions. For example:
char a[256] = {};
memcpy(a, a, sizeof(a));
The behavior is undefined according to the C Standard, which is subsumed by the C++03 Standard.
Synopsis
1/ #include void *memcpy(void * restrict s1, const void * restrict s2, size_t n);
Description
2/ The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined. Returns 3 The memcpy function returns the value of s1.
Synopsis
1 #include void *memmove(void *s1, const void *s2, size_t n);
Description
2 The memmove function copies n characters from the object pointed to by s2 into the object pointed to by s1. Copying takes place as if the n characters from the object pointed to by s2 are first copied into a temporary array of n characters that does not overlap the objects pointed to by s1 and s2, and then the n characters from the temporary array are copied into the object pointed to by s1. Returns
3 The memmove function returns the value of s1.
© 2022 - 2024 — McMap. All rights reserved.