How is the sizeof operator evaluated
Asked Answered
A

4

8

My project demands a complete understanding of how the sizeof operator works. The C standard specification in this regard is vague and it will be dangerous to rely on my interpretations of it. I am particularly interested in when and how the sizeof ought to be processed.

  1. My previous knowledge suggested that it is a compile-time operator, which I never questioned, because I never abused sizeof too much.

    However:

    int size = 0;
    scanf("%i", &size);
    printf("%i\n", sizeof(int[size]));
    

    This for instance cannot be evaluated at compile time by any meaning.

    char c = '\0';
    char*p = &c;
    printf("%i\n", sizeof(*p));
    

    I do not remember the exact code that produces U/B, but here, *p is an actual expression (RTL unary dereference). By presumption, does it mean that sizeof(c+c) is a way to force compile-time evaluation by means of the expression or will it be optimized by the compiler?

  2. Does sizeof return a value of type int, is it a size_t (ULL on my platform), or is it implementation-defined?

  3. This article states that "The operand to sizeof cannot be a type-cast", which is incorrect. Type-casting has the same precedence as the sizeof operator, meaning in a situation where both are used, they are simply evaluated right to left. sizeof(int) * p probably does not work, because if the operand is a type in braces, this is handled first, but sizeof((int)*p) works just fine.

I am asking for a little technical elaboration on how sizeof is implemented. That can be of use to anyone who doesn't want to spread misinformation, inaccuracies or as in my case - work on a project that is directly dependent on it.

Alduino answered 2/2, 2023 at 12:42 Comment(10)
Your second snippet can easily be evaluated at compile time. The type of *p is fixed to be a char which will always have size 1. BTW: The result of sizeof operator is of type size_t which is printed using a %zu format specifier, not %i.Rainie
@Rainie That is true (regarding your first remark) That was not the correct example I think.Alduino
Regarding your point (3), the article is correct: in your example sizeof((int)*p), the operand of sizeof isn't a type cast; it is ((int)*p), i.e. a type cast inside parentheses. That's a difference, which is very relevant for how sizeof interprets its arguments.Scrivener
sizeof is always evaluated at compile time except for variable-length arrays, even for c + c where the result is is an int: sizeof(c + c) == sizeof(int) while sizeof(3.14 * 2) == sizeof(double)Bummer
For what it's worth, this question hinges on the, IMO, gigantic difference between VLA's and every other type in C. It is certainly why they were not in the language at first; I believe it also explains why they are optional, and still not as universally popular among all C programmers.Isosteric
@KonradRudolph They should make it clear that this behavior is only because of the sizeof-specific syntax which will result in a syntax with different meaning. All in all, since sizeof NULL returns 8 I assume this is when you use the extra ()s in macro definitions. #define NULL ((void* )0x0)Alduino
@DavidRanieri That is true, so long as one of the operands is a double (can be a variable as well), then the returned value will also be doubleAlduino
@Alduino The note that you are citing does (attempt to) make this clear. However, it is not meant to be understood in isolation. You also need to read and understand en.cppreference.com/w/c/language/sizeof. In particular, are you aware that there are two forms of sizeof, and that sizeof Xsizeof (X) (at least in some cases)?Scrivener
@KonradRudolph I simply assume that sizeof first expects (type-name) and if such is not present it expects an expression or a single object, just like everywhere else.Alduino
@Edenia, note that sizeof NULL may be valid and evaluate to 8 for you, and the macro NULL may expand to ((void *)0) for you, but the C language does not guarantee any of these things. It does not guarantee even that sizeof(NULL) evaluates to the size of any pointer type. It sounds like you may have been reading your system headers, but if you want to write portable software then don't do that. Rely on documentation instead, especially the language specification.Rubbery
C
12

1. My previous knowledge suggested that it is a compile-time operator, which I never questioned, because I never abused sizeof too much…

C 2018 6.5.3.4 2 specifies the behavior of sizeof and says:

… If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

In your example with sizeof(int[size]), the type of int[size] is a variable length array type, so the operand is evaluated1, effectively computing the size during program execution.

In your example with sizeof(*p), the type of *p is not a variable length array type, so the operand is not evaluated. The fact that p may point to an object of automatic storage duration that is created during program execution is irrelevant; the type of *p is known during compilation, so *p is not evaluated, and the result of sizeof is an integer constant.

2. Does sizeof return a value of type int, is it a size_t (ULL on my platform), or is it implementation-defined.

C 2018 6.5.3.4 5 says “The value of the result of both operators [sizeof and _Alignof] is implementation-defined, and its type (an unsigned integer type) is size_t, defined in <stddef.h> (and other headers).”

3. This article states that "The operand to sizeof cannot be a type-cast", which is incorrect. Type-casting has the same precedence as the sizeof operator, meaning in a situation where both are used, they are simply evaluated right to left. sizeof(int) * p probably does not work, because if the operand is a type in braces, this is handled first, but sizeof((int)*p) works just fine.

The article means the operand cannot directly be a cast-expression (C 2018 6.5.4) in the form ( type-name ) cast-expression, due to how the formal grammar of C is structured. Formally, an expression operand to sizeof is a unary-expression (6.5.3) in the grammar, and a unary-expression can, through a chain of grammar productions, be a cast-expression inside parentheses.

Footnote

1 We often think of a type-name (a specification of a type, such as int [size]) as more of a passive declaration than an executable statement or expression, but C 2018 6.8 4 tells us “There is also an implicit full expression in which the non-constant size expressions for a variably modified type are evaluated…”

Carrollcarronade answered 2/2, 2023 at 12:51 Comment(3)
By the way, is it safe to say that the value returned from an operation has the type of the bigger of the two operands, instead of displaying truth tables everywhere?Alduino
No, @Edenia, that is not safe to take as a general rule. For example, it does not hold for arithmetic operations when both operands have integer types with conversion rank less than that of int (basically, types smaller than int). It is not always true of shift operations, it is never true of valid pointer difference operations, and it's not clear how it even applies to pointer addition and the indexing operator ([]).Rubbery
I just tested, a char + a short returns an int basically. You are right it cannot produce a value less than that of 4 bytes. It's just funny I can test that using sizeof.Alduino
A
8

The semantics of sizeof() per the (draft) C11 standard:

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

Note "If the type of the operand is a variable length array type, the operand is evaluated". The means that the size of a VLA is computed at run time.

"otherwise, the operand is not evaluated and the result is an integer constant" means the result is evaluated at compile time.

The return type is size_t. Full stop:

The value of the result of both operators (sizeof() and _Alignof()) is implementation-defined, and its type (an unsigned integer type) is size_t, defined in <stddef.h> (and other headers).

Note that the type is size_t. Don't use unsigned long nor unsigned long long nor anything else. Always use size_t.

Apterygial answered 2/2, 2023 at 12:51 Comment(0)
B
5

You're overthinking things a bit.

Yes, when the operand of sizeof is a variable-length array expression, then that has to be evaluated at run time - otherwise, it's a compile-time operation and the operand is not evaluated.

printf("%i\n", sizeof(*p));

I do not remember the exact code that produces U/B, but here, *p is an actual expression (RTL unary dereference).

Doesn't matter - the expression *p is not evaluated as part of the sizeof operation. All that matters is the type of *p, which is known at translation. This is a perfectly valid idiom for dynamic memory allocation:

size_t size = some_value();
int *p = malloc( sizeof *p * size );

By presumption, does it mean that sizeof(c+c) is a way to force compile-time evaluation by means of the expression or will it be optimized by the compiler?

Again, the expression c+c won't be evaluated - all that matters is the type.

Does sizeof return a value of type int, is it a size_t (ULL on my platform), or is it implementation-defined.

size_t. That's stated explicitly in the language definition:

6.5.3.4 The sizeof and _Alignof operators
...
5 The value of the result of both operators is implementation-defined, and its type (an unsigned integer type) is size_t, defined in <stddef.h> (and other headers).
C 2011 Online Draft

This article states that "The operand to sizeof cannot be a type-cast", which is incorrect. Type-casting has the same precedence as the sizeof operator, meaning in a situation where both are used, they are simply evaluated right to left. sizeof(int) * p probably does not work, because if the operand is a type in braces, this is handled first, but sizeof((int)*p) works just fine.

What that article is saying is that an operand that's a cast-expression won't be parsed correctly. The syntax for sizeof is

unary-expression:
    ...
    sizeof unary-expression
    sizeof ( type-name )

and the syntax for a cast-expression is

cast-expression:
    unary-expression
    ( type-name ) cast-expression

If you write an expression like

sizeof (int) *p;

it won't be parsed as

sizeof ((int) *p);

Instead, it will be parsed as

(sizeof (int)) *p;

and interpreted as a multiplicative-expression:

multiplicative-expression * cast-expression

IOW, the compiler will think you're trying to multiply the result of sizeof (int) to the value of p (which should result in a diagnostic). If you wrap the cast-expression in parentheses, then it's parsed correctly.

Type-casting has the same precedence as the sizeof operator

That is not correct. Unary expressions (including sizeof expressions) have higher precedence than cast expressions. That's why sizeof (int) *p is parsed as (sizeof (int)) *p.

Biped answered 2/2, 2023 at 15:13 Comment(2)
According to the article they are in the same precedence group. Type-casting is also unary. What am I missing?Alduino
@Edenia: Cast expressions are not unary expressions - they are described in a different subclause (6.5.4) from unary expressions (6.5.3) and have a separate production rule in the grammar. The subclauses under 6.5 are listed in descending order of precedence (i.e, primary expressions in 6.5.1 have higher precedence than postfix operators in 6.5.2, which have higher precedence than unary operators in 6.5.3, etc.).Biped
K
3

Here's an attempt to provide a complete guide to the sizeof operator and its many quirks. Warning: this post may contain heavy "language-lawyering".


Formal syntax and valid forms

sizeof is a keyword in C and the syntax is defined in C17 6.5.3 as:

sizeof unary-expression
sizeof ( type-name )

Meaning that there are two possible ways to use it: sizeof op or sizeof(op). In the former case, the operand has to be an expression (for example sizeof my_variable) and in the latter case it has to be a type (for example sizeof(int)).

When we use sizeof, we almost always use a parenthesis. Always using parenthesis is considered good practice (and Linus Torvalds famously once had one of his usual childish tantrums about it). But which form of sizeof we use depends on if we pass an expression or a type. So even when we use paranthesis around an expression, we actually don't use the second version then, but the former. Example:

int x;
printf("%zu\n", sizeof(x));

In this case we are passing an expression to sizeof. The expression is (x) and the parenthesis is a regular ("primary expression") parenthesis that we may use around any expression in C - it does not belong to the sizeof operator in this case.


"The operand to sizeof cannot be a type-cast" - precedence and associativity or...?

Following the above explanation, whenever we write sizeof (int) * p, this gets interpreted as the second form with a type name. Why?

Why isn't very obvious at all, this is in fact dang subtle. It is easy to get tricked by "operator precedence tables" like the one you link. It states that the cast operator like sizeof is a unary operator with right-to-left associativity. But this isn't actually true when digging through the dirty details of C grammar.

There is actually no such thing as a precedence table in the C standard, nor does it define associativity explicitly. Instead operator precedence is decided (as complicated as humanly possible) by a long chain of syntax definitions in chapter 6.5. In each sub chapter, the operator group refers to the previous and sometimes next operator group in the formal syntax, thereby stating that the current group has lower precedence than the previous. For 6.5.3 unary operators, it goes like:

unary-expression:

postfix-expression
++ unary-expression
-- unary-expression
unary-operator cast-expression
sizeof unary-expression
sizeof ( type-name )
_Alignof ( type-name )

unary-operator: one of
& * + - ˜ !

Translated from standardese to English, this grammar goo is to be read roughly as:

"Here is the group of unary expressions. They are the prefix ++ and -- operators, or one of the unary operators (listed separately), or sizeof in the two different forms, or _Alignof. They may follow a postfix expression, meaning that any postfix expression (or operator groups even higher up the syntax chain) has higher precedence then the unary operators. They may be followed by a cast expression, which thereby has lower precedence than the unary operators."

So depending on how you put it, there's actually a subtle error in the link or maybe they could have explained this better (I'm not sure if I even just managed myself, so I don't blame them really). Outside the formal C standard, the concept of "right-to-left associativity" doesn't work unless the cast operator is listed as part of the unary operators in that table even though it actually has lower precedence in the grammar.

So anyway, the sizeof (type-name) operator is a unary expression and takes precedence in the grammar above the cast operator. And that's why the compiler will not treat this as the two operators sizeof and (cast), but as the operator sizeof(type) followed by the binary multiplication operator.

And so sizeof (int) * p turns into equivalent of (sizeof(int)) * p, sizeof with binary multiplication, which is probably nonsense and perhaps the actual intent here was to dereference a pointer p, cast and then take the size.

We could however write something like sizeof ((int)*p)) and then the parsing order is: parenthesis, then (because of unary operator right-to-left associativity) de-reference, then cast, then sizeof.


What is the type returned by sizeof?

It returns a special large, unsigned integer type size_t (C17 6.5.3.4/5) generally regarded as "large enough" to hold the largest object allowed in the system. The type is commonly used whenever we wish to take the size of something, like when iterating through an array.

For example you might see some code on SO in the form for(size_t i=0; i<n; i++) when iterating through an array, since this is the most correct type "large enough" to contain the size of an array. (int might be too small and besides it is signed too and we can't have negative sizes.)

size_t is found in stddef.h, which in turn is included by a lot of other standard headers like stdio.h. It can hold values up to SIZE_MAX defined in stdint.h.

size_t is printed with printf by using the %zu conversion specifier, hence my previous example printf("%zu\n", sizeof(x));.


Compile-time or run-time?

sizeof is normally a compile-time operator meaning that the operand does not get evaluated. With one exception and that is variable-length arrays (VLA), where the size is simply not known at compile-time.

C17 6.5.3.4/2:

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

Most of the time this doesn't matter. However, we can cook up some artificial example like this:

#include <stdio.h>

int main (void)
{
  int size;
  scanf("%d",&size); // enter 2
  int arr[5][size];

  printf("%zu ", sizeof(size++)); // size++ not executed
  printf("%d ", size); // print 2

  printf("%zu ", sizeof(arr[size++])); // size++ is executed
  printf("%d ", size);
}

When I try this out and enter 2, it prints 4 2 8 3:

  • 4 because that's the size of an int on this system.
  • 2 because the operand size++ was not executed/evaluated.
  • 8 because the 2 * sizeof(int) is 8.
  • 3 because the operand arr[size++] was executed/evaluated, since arr[n] results in a VLA operand.

This behavior of which operand that gets evaluated or not is well-defined and guaranteed.

Hence a popular trick int* ptr = malloc(n * sizeof *ptr);. In case *ptr would get evaluated, it's an uninitialized pointer that we definitely can't dereference and it would have been undefined behavior. But since it is guaranteed not to get evaluated, the trick is safe.


An exception to "array decay"

sizeof is one of the few operands that is an exception to the rule of "array decay":

C17 6.3.2.1/3

Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue.


sizeof is used in C's definition of a byte

The size of a byte in C is defined as per C17 3.6

3.6
byte
addressable unit of data storage large enough to hold any member of the basic character set of the execution environment

and then 6.5.3.4/4:

When sizeof is applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

For this reason it doesn't make much sense to write things like malloc(n * sizeof(char) because sizeof(char) is by definition guaranteed to always be 1.

(The number of bits in a char is however not guaranteed to be 8.)

Kirst answered 2/2, 2023 at 16:25 Comment(6)
very well done for the elaborate unraveling of this otherwise not immediately comprehensible subject. I actually did not know that expressions passed to sizeof aren't later evaluated (not including VLAs). I cannot say that I see why they couldn't be, but what I can say is that this is at least one instance that proves sizeof should NOT be treated as a function (That's a common misconception in beginners and I don't agree with Linux on that part). On a different note - what would be the rule that dictates which is the type returned by an expression passed as an operand to sizeof ?Alduino
Also, for sizeof(*p) it still has to evaluate the deref expression for it to display the size of the object pointed by p so I suspect my understanding of "evaluation" is also confused.Alduino
"In the former case, the operand has to be an expression (for example sizeof my_variable)" I guess it is also worth adding that it expects a UNARY expression (either a single operand or a unary subexpression), unless braces are present. Because sizeof 1+1 is 5 not 4Alduino
@Alduino Any sub-expression that is an operand to sizeof (or any other integer constant expression evaluated at compile-time) is treated just like any other C expression in terms of type promotion or allowed types to certain operators etc. As for "evaluation" it generally means "going to get executed" - sizeof need not "evaluate" the operand as in determining if it has any side effects etc since it won't get executed (except for the VLA case). The compiler knows the sizes of all types and objects at compile-time, so as long as it knows what type something has, it can determine the size.Kirst
@Alduino sizeof 1+1 is a simple operator precedence situation, similar to -1+1 or *ptr + 1. The only things that make sizeof an oddball operator are the two valid forms, as well as the out-of-standard concepts of precedence tables and associativity where it doesn't quite fit in, in relation to the cast operator.Kirst
And btw the formal definition of evaluation is as per 5.1.2.3: "Evaluation of an expression in general includes both value computations and initiation of side effects. Value computation for an lvalue expression includes determining the identity of the designated object." The C11 concept of value computation is perhaps not helpful here (either), but I guess you could say that sizeof performs value computation but without initiating side effects.Kirst

© 2022 - 2024 — McMap. All rights reserved.