Here's an attempt to provide a complete guide to the sizeof
operator and its many quirks. Warning: this post may contain heavy "language-lawyering".
Formal syntax and valid forms
sizeof
is a keyword in C and the syntax is defined in C17 6.5.3 as:
sizeof
unary-expression
sizeof
(
type-name )
Meaning that there are two possible ways to use it: sizeof op
or sizeof(op)
. In the former case, the operand has to be an expression (for example sizeof my_variable
) and in the latter case it has to be a type (for example sizeof(int)
).
When we use sizeof
, we almost always use a parenthesis. Always using parenthesis is considered good practice (and Linus Torvalds famously once had one of his usual childish tantrums about it). But which form of sizeof
we use depends on if we pass an expression or a type. So even when we use paranthesis around an expression, we actually don't use the second version then, but the former. Example:
int x;
printf("%zu\n", sizeof(x));
In this case we are passing an expression to sizeof
. The expression is (x)
and the parenthesis is a regular ("primary expression") parenthesis that we may use around any expression in C - it does not belong to the sizeof
operator in this case.
"The operand to sizeof cannot be a type-cast" - precedence and associativity or...?
Following the above explanation, whenever we write sizeof (int) * p
, this gets interpreted as the second form with a type name. Why?
Why isn't very obvious at all, this is in fact dang subtle. It is easy to get tricked by "operator precedence tables" like the one you link. It states that the cast operator like sizeof
is a unary operator with right-to-left associativity. But this isn't actually true when digging through the dirty details of C grammar.
There is actually no such thing as a precedence table in the C standard, nor does it define associativity explicitly. Instead operator precedence is decided (as complicated as humanly possible) by a long chain of syntax definitions in chapter 6.5. In each sub chapter, the operator group refers to the previous and sometimes next operator group in the formal syntax, thereby stating that the current group has lower precedence than the previous. For 6.5.3 unary operators, it goes like:
unary-expression:
postfix-expression
++
unary-expression
--
unary-expression
unary-operator cast-expression
sizeof
unary-expression
sizeof
(
type-name )
_Alignof
(
type-name )
unary-operator: one of
& * + - ˜ !
Translated from standardese to English, this grammar goo is to be read roughly as:
"Here is the group of unary expressions. They are the prefix ++
and --
operators, or one of the unary operators (listed separately), or sizeof
in the two different forms, or _Alignof
. They may follow a postfix expression, meaning that any postfix expression (or operator groups even higher up the syntax chain) has higher precedence then the unary operators. They may be followed by a cast expression, which thereby has lower precedence than the unary operators."
So depending on how you put it, there's actually a subtle error in the link or maybe they could have explained this better (I'm not sure if I even just managed myself, so I don't blame them really). Outside the formal C standard, the concept of "right-to-left associativity" doesn't work unless the cast operator is listed as part of the unary operators in that table even though it actually has lower precedence in the grammar.
So anyway, the sizeof
(
type-name)
operator is a unary expression and takes precedence in the grammar above the cast operator. And that's why the compiler will not treat this as the two operators sizeof
and (cast)
, but as the operator sizeof(type)
followed by the binary multiplication operator.
And so sizeof (int) * p
turns into equivalent of (sizeof(int)) * p
, sizeof
with binary multiplication, which is probably nonsense and perhaps the actual intent here was to dereference a pointer p
, cast and then take the size.
We could however write something like sizeof ((int)*p))
and then the parsing order is: parenthesis, then (because of unary operator right-to-left associativity) de-reference, then cast, then sizeof.
What is the type returned by sizeof
?
It returns a special large, unsigned integer type size_t
(C17 6.5.3.4/5) generally regarded as "large enough" to hold the largest object allowed in the system. The type is commonly used whenever we wish to take the size of something, like when iterating through an array.
For example you might see some code on SO in the form for(size_t i=0; i<n; i++)
when iterating through an array, since this is the most correct type "large enough" to contain the size of an array. (int
might be too small and besides it is signed too and we can't have negative sizes.)
size_t
is found in stddef.h
, which in turn is included by a lot of other standard headers like stdio.h
. It can hold values up to SIZE_MAX
defined in stdint.h
.
size_t
is printed with printf
by using the %zu
conversion specifier, hence my previous example printf("%zu\n", sizeof(x));
.
Compile-time or run-time?
sizeof
is normally a compile-time operator meaning that the operand does not get evaluated. With one exception and that is variable-length arrays (VLA), where the size is simply not known at compile-time.
C17 6.5.3.4/2:
The sizeof
operator yields the size (in bytes) of its operand, which may be an expression or the
parenthesized name of a type. The size is determined from the type of the operand. The result
is an integer. If the type of the operand is a variable length array type, the operand is evaluated;
otherwise, the operand is not evaluated and the result is an integer constant.
Most of the time this doesn't matter. However, we can cook up some artificial example like this:
#include <stdio.h>
int main (void)
{
int size;
scanf("%d",&size); // enter 2
int arr[5][size];
printf("%zu ", sizeof(size++)); // size++ not executed
printf("%d ", size); // print 2
printf("%zu ", sizeof(arr[size++])); // size++ is executed
printf("%d ", size);
}
When I try this out and enter 2, it prints 4 2 8 3
:
- 4 because that's the size of an
int
on this system.
- 2 because the operand
size++
was not executed/evaluated.
- 8 because the
2 * sizeof(int)
is 8.
- 3 because the operand
arr[size++]
was executed/evaluated, since arr[n]
results in a VLA operand.
This behavior of which operand that gets evaluated or not is well-defined and guaranteed.
Hence a popular trick int* ptr = malloc(n * sizeof *ptr);
. In case *ptr
would get evaluated, it's an uninitialized pointer that we definitely can't dereference and it would have been undefined behavior. But since it is guaranteed not to get evaluated, the trick is safe.
An exception to "array decay"
sizeof
is one of the few operands that is an exception to the rule of "array decay":
C17 6.3.2.1/3
Except when it is the operand of the sizeof
operator, or the unary &
operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue.
sizeof
is used in C's definition of a byte
The size of a byte in C is defined as per C17 3.6
3.6
byte
addressable unit of data storage large enough to hold any member of the basic character set of the execution environment
and then 6.5.3.4/4:
When sizeof
is applied to an operand that has type char
, unsigned char
, or signed char
, (or a qualified version thereof) the result is 1
.
For this reason it doesn't make much sense to write things like malloc(n * sizeof(char)
because sizeof(char)
is by definition guaranteed to always be 1.
(The number of bits in a char
is however not guaranteed to be 8.)
*p
is fixed to be achar
which will always have size 1. BTW: The result ofsizeof
operator is of typesize_t
which is printed using a%zu
format specifier, not%i
. – Rainiesizeof((int)*p)
, the operand ofsizeof
isn't a type cast; it is((int)*p)
, i.e. a type cast inside parentheses. That's a difference, which is very relevant for howsizeof
interprets its arguments. – Scrivenersizeof
is always evaluated at compile time except for variable-length arrays, even forc + c
where the result is is anint
:sizeof(c + c) == sizeof(int)
whilesizeof(3.14 * 2) == sizeof(double)
– Bummersizeof NULL
returns 8 I assume this is when you use the extra ()s in macro definitions.#define NULL ((void* )0x0)
– Alduinodouble
(can be a variable as well), then the returned value will also bedouble
– Alduinosizeof
, and thatsizeof X
≠sizeof (X)
(at least in some cases)? – Scrivenersizeof
first expects(type-name)
and if such is not present it expects an expression or a single object, just like everywhere else. – Alduinosizeof NULL
may be valid and evaluate to 8 for you, and the macroNULL
may expand to((void *)0)
for you, but the C language does not guarantee any of these things. It does not guarantee even thatsizeof(NULL)
evaluates to the size of any pointer type. It sounds like you may have been reading your system headers, but if you want to write portable software then don't do that. Rely on documentation instead, especially the language specification. – Rubbery