5. Common pitfalls when using arrays.
5.1 Pitfall: Trusting type-unsafe linking.
OK, you’ve been told, or have found out yourself, that globals (namespace
scope variables that can be accessed outside the translation unit) are
Evil™. But did you know how truly Evil™ they are? Consider the
program below, consisting of two files [main.cpp] and [numbers.cpp]:
// [main.cpp]
#include <iostream>
extern int* numbers;
int main()
{
using namespace std;
for( int i = 0; i < 42; ++i )
{
cout << (i > 0? ", " : "") << numbers[i];
}
cout << endl;
}
// [numbers.cpp]
int numbers[42] = {1, 2, 3, 4, 5, 6, 7, 8, 9};
In Windows 7 this compiles and links fine with both MinGW g++ 4.4.1 and
Visual C++ 10.0.
Since the types don't match, the program crashes when you run it.
In-the-formal explanation: the program has Undefined Behavior (UB), and instead
of crashing it can therefore just hang, or perhaps do nothing, or it
can send threating e-mails to the presidents of the USA, Russia, India,
China and Switzerland, and make Nasal Daemons fly out of your nose.
In-practice explanation: in main.cpp
the array is treated as a pointer, placed
at the same address as the array. For 32-bit executable this means that the first
int
value in the array, is treated as a pointer. I.e., in main.cpp
the
numbers
variable contains, or appears to contain, (int*)1
. This causes the
program to access memory down at very bottom of the address space, which is
conventionally reserved and trap-causing. Result: you get a crash.
The compilers are fully within their rights to not diagnose this error,
because C++11 §3.5/10 says, about the requirement of compatible types
for the declarations,
[N3290 §3.5/10]
A violation of this rule on type identity does not require a diagnostic.
The same paragraph details the variation that is allowed:
… declarations for an array object can specify array types that
differ by the presence or absence of a major array bound (8.3.4).
This allowed variation does not include declaring a name as an array in one
translation unit, and as a pointer in another translation unit.
5.2 Pitfall: Doing premature optimization (memset
& friends).
Not written yet
5.3 Pitfall: Using the C idiom to get number of elements.
With deep C experience it’s natural to write …
#define N_ITEMS( array ) (sizeof( array )/sizeof( array[0] ))
Since an array
decays to pointer to first element where needed, the
expression sizeof(a)/sizeof(a[0])
can also be written as
sizeof(a)/sizeof(*a)
. It means the same, and no matter how it’s
written it is the C idiom for finding the number elements of array.
Main pitfall: the C idiom is not typesafe. For example, the code
…
#include <stdio.h>
#define N_ITEMS( array ) (sizeof( array )/sizeof( *array ))
void display( int const a[7] )
{
int const n = N_ITEMS( a ); // Oops.
printf( "%d elements.\n", n );
}
int main()
{
int const moohaha[] = {1, 2, 3, 4, 5, 6, 7};
printf( "%d elements, calling display...\n", N_ITEMS( moohaha ) );
display( moohaha );
}
passes a pointer to N_ITEMS
, and therefore most likely produces a wrong
result. Compiled as a 32-bit executable in Windows 7 it produces …
7 elements, calling display...
1 elements.
- The compiler rewrites
int const a[7]
to just int const a[]
.
- The compiler rewrites
int const a[]
to int const* a
.
N_ITEMS
is therefore invoked with a pointer.
- For a 32-bit executable
sizeof(array)
(size of a pointer) is then 4.
sizeof(*array)
is equivalent to sizeof(int)
, which for a 32-bit executable is also 4.
In order to detect this error at run time you can do …
#include <assert.h>
#include <typeinfo>
#define N_ITEMS( array ) ( \
assert(( \
"N_ITEMS requires an actual array as argument", \
typeid( array ) != typeid( &*array ) \
)), \
sizeof( array )/sizeof( *array ) \
)
7 elements, calling display...
Assertion failed: ( "N_ITEMS requires an actual array as argument", typeid( a ) != typeid( &*a ) ), file runtime_detect
ion.cpp, line 16
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
The runtime error detection is better than no detection, but it wastes a little
processor time, and perhaps much more programmer time. Better with detection at
compile time! And if you're happy to not support arrays of local types with C++98,
then you can do that:
#include <stddef.h>
typedef ptrdiff_t Size;
template< class Type, Size n >
Size n_items( Type (&)[n] ) { return n; }
#define N_ITEMS( array ) n_items( array )
Compiling this definition substituted into the first complete program, with g++,
I got …
M:\count> g++ compile_time_detection.cpp
compile_time_detection.cpp: In function 'void display(const int*)':
compile_time_detection.cpp:14: error: no matching function for call to 'n_items(const int*&)'
M:\count> _
How it works: the array is passed by reference to n_items
, and so it does
not decay to pointer to first element, and the function can just return the
number of elements specified by the type.
With C++11 you can use this also for arrays of local type, and it's the type safe
C++ idiom for finding the number of elements of an array.
5.4 C++11 - C++20 pitfall: Using a constexpr
array size function.
With C++11 and later, it's natural to implement an array size function as follows:
// Similar in C++03, but not constexpr.
template< class Type, std::size_t N >
constexpr std::size_t size( Type (&)[N] ) { return N; }
This yields the amount of elements in an array as a compile time constant. This function has even been standardized as std::size
in C++17.
For example, size()
can be used to declare an array of the same size as another:
// Example 1
void foo()
{
int const x[] = {3, 1, 4, 1, 5, 9, 2, 6, 5, 4};
int y[ size(x) ] = {};
}
But consider this code using the constexpr
version:
// Example 2
template< class Collection >
void foo( Collection const& c )
{
constexpr int n = size( c ); // error prior to C++23
// ...
}
int main()
{
int x[42];
foo( x );
}
The pitfall: until C++23 using the reference c
n a constant expression is not allowed, and all major compilers reject this code. From the C++20 standard, [expr.const] p5.12:
An expression E
is a core constant expression unless the evaluation of E
, following the rules of the abstract machine, would evaluate one of the following:
- [...]
- an id-expression that refers to a variable or data member of reference type unless the reference has a preceding initialization and either
- it is usable in constant expressions or
- its lifetime began within the evaluation of E;
c
is neither usable in a constant expression nor did its lifetime begin within constexpr int n = ...
, so evaluating c
is not a core constant expression. These restrictions have been lifted for C++23 by P2280: Using unknown pointers and references in constant expressions. c
is treated a reference binding to an unspecified object ([expr.const] p8).
5.4.1 Workaround: C++20-compatible constexpr
size function
std::extent< decltype( c ) >::value;
is not a viable workaround because it would fail if Collection
was not an array.
To deal with collections that can be non-arrays one needs the overloadability of an
size
function, but also, for compile time use one needs a compile time
representation of the array size. And the classic C++03 solution, which works fine
also in C++11 and C++14, is to let the function report its result not as a value
but via its function result type. For example like this:
// Example 3 - OK (not ideal, but portable and safe)
#include <array>
#include <cstddef>
// No implementation, these functions are never evaluated.
template< class Type, std::size_t N >
auto static_n_items( Type (&)[N] )
-> char(&)[N]; // return a reference to an array of N chars
template< class Type, std::size_t N >
auto static_n_items( std::array<Type, N> const& )
-> char(&)[N];
#define STATIC_N_ITEMS( c ) ( sizeof( static_n_items( c )) )
template< class Collection >
void foo( Collection const& c )
{
constexpr std::size_t n = STATIC_N_ITEMS( c );
// ...
}
int main()
{
int x[42];
std::array<int, 43> y;
foo( x );
foo( y );
}
About the choice of return type for static_n_items
: this code doesn't use std::integral_constant
because with std::integral_constant
the result is represented
directly as a constexpr
value, reintroducing the original problem.
About the naming: part of this solution to the constexpr
-invalid-due-to-reference
problem is to make the choice of compile time constant explicit.
Until C++23, a macro like the STATIC_N_ITEMS
above yields portability,
e.g. to the clang and Visual C++ compilers, retaining type safety.
Related: macros do not respect scopes, so to avoid name collisions it can be a
good idea to use a name prefix, e.g. MYLIB_STATIC_N_ITEMS
.
std::array
s,std::vector
s andgsl::span
s - I would frankly expect an FAQ on how to use arrays in C++ to say "By now, you can start considering just, well, not using them." – Zr