What is the correct output of sizeof("string")?
Asked Answered
A

5

39

On a microcontroller, in order to avoid loading settings from a previous firmware build, I also store the compilation time, which is checked at loading.

The microcontroller project is build with 'mikroC PRO for ARM' from MikroElektronika.

Being easier to debug, I programmed the code with minGW on my PC and, after checking it left and right put, it into microC.

The code using that check failed to work properly. After an evening of frustrating debugging I, found sizeof("...") yielding different values on the two platforms and causing a buffer overflow as a consequence.

But now I don't know whose fault is it.

To re-create the problem, use following code:

#define SAVEFILECHECK_COMPILE_DATE __DATE__ " " __TIME__

char strA[sizeof(SAVEFILECHECK_COMPILE_DATE)];
char strB[] = SAVEFILECHECK_COMPILE_DATE;

printf("sizeof(#def): %d\n", (int)sizeof(SAVEFILECHECK_COMPILE_DATE));
printf("sizeof(strA): %d\n", (int)sizeof(strA));
printf("sizeof(strB): %d\n", (int)sizeof(strB));

On MinGW it returns (as expected):

sizeof(#def): 21
sizeof(strA): 21
sizeof(strB): 21

However, on 'mikroC PRO for ARM' it returns:

sizeof(#def): 20
sizeof(strA): 20
sizeof(strB): 21

This difference caused a buffer overflow down the line (overwriting byte zero of a pointer – ouch).

21 is the answer I expect: 20 chars and the '\0' terminator.

Is this one of the 'it depends' things in C or is there a violation of the sizeof operator behavior?

Acanthopterygian answered 15/1 at 12:45 Comment(23)
Looks like a mikroC bug to me. (And I'd trust GCC over a noname C compiler anytime.)Nurseryman
__DATE__ and __TIME__ are not well defined size strings. Result may depend on current time/data and preferred system locale. Just print contents of those strings to see what actually happen.Fafnir
@MarekR For OP they have different sizes during the same compilation.Nurseryman
@MarekR I think you mean "%zu", "%zd" is for ssize_tCantharides
Also print the string itself along with the size. That might give a hint. Tell us what happens.Iceberg
Can't repro this on compilers available on godbolt: godbolt.org/z/c65qYoPY8Fafnir
Could not reproduce either with multiple compilers. It looks like a compiler bug.Iceberg
@Iceberg It’s unlikely that the compiler forgets to write the \0 byte at the end of the string literals: such a bug would have been caught immediately. It’s probably simply not accounted for by sizeof.Protection
@KonradRudolph if the \0 is not accounted for by sizeof, isn't that a compiler bug?Iceberg
@Iceberg Absolutely, this is 100% a compiler bug. I’m just saying that printing the literal value is probably not going to be informative.Protection
So does sizeof("string") (incorrectly) produce 6 on this microC compiler?Sapwood
I agree with other commentators that this is almost certainly a compiler bug. But it may be a good language-lawyer question (if that tag were added). The C Standard(s) that I have looked at specify that the nul terminator is added to string literals in Translation Phase 7, immediately before they are used to initialize the corresponding arrays. However, I can't see any explicit mention in the Standard(s) about when the sizeof operator is evaluated. I'm guessing that the string literal expression must be treated as an array but I can't find that written anywhere ...Gillan
... the dodgy compiler seems to get it right when explicitly initialising an array with the literal, in the char strB[] = SAVEFILECHECK_COMPILE_DATE; line.Gillan
a little addition: char strC[sizeof(SAVEFILECHECK_COMPILE_DATE)] = SAVEFILECHECK_COMPILE_DATE; works without warning in minGW but in microC Errors with codes 388 & 325 both Too many initializersAcanthopterygian
That last addition, while interesting, isn't really surprising (at least, to me). What's happening is that the sizeof(SAVEFILECHECK_COMPILE_DATE) is being evaluated before the nul terminator is added and the literal is converted to a static array. But I'm not 100% certain that the Standard actually specifies that is incorrect behaviour.Gillan
@MarekR __DATE__ and __TIME__ are well-defined by the C standard.Wormeaten
Just to add some grist to the mill: The Standard states that, in sizeof expr, the expr is not evaluated (unless it's a VLA). So, for an 'unevaluated string literal', does the Standard require adding a nul suffix?Gillan
@AdrianMole sizeof is part of the compilation, after pre-processing is done. So the string literal will be concatenated and null terminated before that, even if the operand of sizeof is not evaluated.Wormeaten
@Wormeaten - Where, in any C Standard, is the operation order of "Phase 7" defined? All that I can find says that the addition of the terminating nul is done in Phase 7, as is the evaluation of any sizeof operator. Where is the order of determination specified?Gillan
@AdrianMole Who says that the evaluation of the sizeof operator is done in translation phase 7?Wormeaten
@Wormeaten It must be in TP 7. Phases 1 thru 5 are essentially lexical pre-processing, TP 6 is just concatenation of adjacent string literals and TP 8 is external linkage.Gillan
@AdrianMole: The NUL processing has to be done before sizeof processing, because up to that point the string literal is just a character sequence and a "sequence" isn't a meaningful operand for sizeof. After NUL processing, there's an array, which is a valid operand for sizeof.Enthrall
Would it be accounted for if you manually add "\0" to the string?Antic
W
43

This is all 100% standardized. C17 6.10.8.1:

__DATE__ The date of translation of the preprocessing translation unit: a character string literal of the form "Mmm dd yyyy" ... and the first character of dd is a space character if the value is less than 10.
...
__TIME__ The time of translation of the preprocessing translation unit: a character string literal of the form "hh:mm:ss"

  • "Mmm dd yyyy" = 11
  • "hh:mm:ss" = 8
  • " " (the space you used for string literal concatenation) = 1
  • Null termination = 1

11 + 8 + 1 + 1 = 21

As for sizeof, a string literal is an array. Whenever you pass a declared array to sizeof, the array does not "decay" into a pointer to the first element, so sizeof will report the size of the array in bytes. In case of string literals, this includes the null termination, C17 6.4.5:

In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.

(Translation phase 6 is also mentioned, which is the string literal concatenation phase. I.e string literal concatenation is guaranteed to happen before null termination is added.)

So it would appear that mikroC PRO is non-conforming/bugged. There's lots of questionable embedded systems compilers out there for sure.

Wormeaten answered 15/1 at 14:15 Comment(3)
This answer and the one by dbush are good. I should have put the content of the string in the question, sorry. Thank you for your time.Acanthopterygian
@AdrianMole C17 6.5.3.4p2 (sizeof), 6.5.1p4 (string lit expr), 6.4.5p6 (string lit lex, excerpted in the answer). The literal is not "converted" to an array, rather it is an array from the first point it has any semantic meaning, just like 3 is an int, rather than being converted to one.Patty
@AdrianMole Expanded the quote from 6.4.5 with one sentence more... Hint: if you are looking for quotes from the standard, start with the quoted chapters given and read the context there.Wormeaten
U
16

Is this one of the 'it depends' things in C or is there a violation of the sizeof operator behavior?

The behavior is fully defined in the C Standard. Below are the relevant quotes from the C99 published standard, which were identical except for the section numbers in the C90 (ANSI C) version and have not been modified in essence in more recent version up to and including the upcoming C23 version:

The __DATE__ and __TIME__ macros are specified by

6.10.8 Mandatory macros

__DATE__ The date of translation of the preprocessing translation unit: a character string literal of the form "Mmm dd yyyy", where the names of the months are the same as those generated by the asctime function, and the first character of dd is a space character if the value is less than 10. If the date of translation is not available, an implementation-defined valid date shall be supplied.
__TIME__ The time of translation of the preprocessing translation unit: a character string literal of the form "hh:mm:ss" as in the time generated by the asctime function. If the time of translation is not available, an implementation-defined valid time shall be supplied.

From the above, if the time of translation is available, the macro SAVEFILECHECK_COMPILE_DATE expands to 3 string literals for a total of 11+1+8 = 20 characters, hence 21 bytes including the null terminator. If the time of translation is not available, implementation defined valid dates and times must be used, hence the behavior must be the same.

5.1.1.2 Translation phases

  1. Adjacent string literal tokens are concatenated.
  2. White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.

Hence the fact that the argument to sizeof be made of 3 adjacent string literals is irrelevant, all occurrences of the sizeof operator in your examples get a single string literal argument in phase 7, then

6.5.3.4 The sizeof operator

4  When sizeof is applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1. When applied to an operand that has array type, the result is the total number of bytes in the array.

Therefore all 3 outputs in your example must show 21 bytes. You have found a bug in the mikroc compiler: you should report it and find a work around for your current projects.

Underpinnings answered 15/1 at 14:43 Comment(8)
Workaround in my case was to only use the sizeof(destination) in memcpy and memcmp. The terminator is simply not stored and not needed.Acanthopterygian
@AdrianMole: 6.4.5 string literals In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence. This array is the argument of sizeof and the terminating null byte is part of the array, thus is counted.Underpinnings
@Underpinnings Yeah, I know that one. But where, in the Standard, does it say when the sizeof operator is evaluated? Other than for VLAs, it's a compile-time evaluation.Gillan
@AdrianMole: evaluation of compile time constants is performed by the compiler during or after expression parsing in phase 7. Evaluating the expression sizeof(SAVEFILECHECK_COMPILE_DATE) requires the argument to be determined before the sizeof operator can be applied, so this array including its null terminator must be constructed (even virtually) before sizeof can be evaluated.Underpinnings
I tend to agree with that. But the Standard doesn't specify the order of "token processing" (i.e., the "real meat" of compilation) versus the conversion of string literals into static arrays. I'm not trying to be awkward here ... just being pedantic, as the language-lawyer tag requires. I'm gonna sleep on this and then, maybe I'll post an answer.Gillan
Ultimately, I think it's a (minor) defect in the C Standard. It would be so much easier if the addition of the null terminator were placed in Translation Phase 6 (along with concatenion of adjacent literals). But it ain't!Gillan
@AdrianMole: I agree the order of operations in phase 7 is not precise enough.Underpinnings
@chqrlie: It's precise enough, although it could be presented more clearly. sizeof needs an array, not a "character sequence". The only array that ever exists for a string literal already includes the NUL.Enthrall
F
11

As others have noted, the behavior of sizeof on a string literal has long been standardized as yielding a value one larger than the length of the string represented thereby, rather than the size of the smallest character array that could be initialized using that string literal. That having been said, if one wishes to make code compatible even with compilers that adopt the latter interpretation, I'd suggest using something an expression like (1-(sizeof "")+(sizeof "stringLiteral of interst")) which would allow code to operate correctly with the quirky compilers, but avoid sacrificing compatibility with standard ones.

Folkway answered 15/1 at 21:56 Comment(8)
That's clever as heck, and ugly as sin. In this case "quicky"=="buggy" and I'm not sure it's worth doing a workaround for it.Poltroonery
@MarkRansom: If one has a project that was written for a quirky compiler, and which has been performing a useful task for years, keeping code compatible with the old compiler during migration to newer tools may make it easier to ensure that such migration doesn't have unintended effects.Folkway
Hmm. the behavior of sizeof on a string literal has long been standardized Citation required - this question has the language-lawyer tag.Gillan
@AdrianMole yes the question has the language-lawyer tag, but that was added later by someone other than the OP. I agree a citation would be nice, but that has been the behavior for as long as I've been programming C and C++.Poltroonery
@AdrianMole: Lundin's answer cites the relevant text about a string literal being converted to a character array. My point was that although one isn't likely to encounter a compiler that processes strings in any way other than what the Standard indicates if one hasn't already done so, if one does encounter a compiler that does so, then such a compiler must obviously exist; even if one needs to do something unusual to make code compatible with such a compiler, that doesn't imply that the code can't also be compatible with less unusual ones.Folkway
@MarkRansom Yeah, but still ... the fact that (almost) every compiler does it this way (and the sensible way, clearly), doesn't mean that a compiler that doesn't do it this way is wrong.Gillan
@AdrianMole: While it might possibly be useful for a compiler to include build options for compatibility with code that relies upon the non-standard behavior, it's been decades since such behavior would be considered anything other than "wrong" in any other context.Folkway
@AdrianMole I just dug out my old copy of the Annotated ANSI C Standard from 1990. 6.1.4 states "In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals." The result of sizeof isn't explicitly called out, but it seems clear to me that it should include the zero terminator.Poltroonery
B
5

This is a compiler bug. String literals, whether they consist of a single quoted sequence or multiple adjacent quoted sequences, are stored as static arrays which always contain a terminating null byte. That's not happening here, where it should.

This is specified in section 6.4.5p6 of the C standard regarding string literals:

In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. 78) The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence.

This means that sizeof(SAVEFILECHECK_COMPILE_DATE) should count both the characters in the string and the terminating null byte, but the compiler for some reason isn't including the null byte.

Bunnie answered 15/1 at 14:18 Comment(0)
C
-4
#include <stdio.h>

int main(){
    printf("%zu\n", sizeof("aa"));
}

Interestingly, in this case, "aa" not decaying to pointer, but act as char array. Since array have 3 elements (including zero terminator), output is 3.

This defines string (array of char)

#include <stdio.h>

#define SAVEFILECHECK_COMPILE_DATE __DATE__ " " __TIME__

int main(){
    printf("%zu\n", sizeof(SAVEFILECHECK_COMPILE_DATE));
}

every time you compile it is different, because __DATE__ and __TIME__.

My current result is 21, but it may change.

Same is valid for C++.

Canonist answered 15/1 at 14:15 Comment(4)
"My current result is 21, but it may change." No it may not.Wormeaten
depends of compiler and date / time formats.Canonist
No it does not, given that we speak of the C language. You either have a conforming C compiler or you have something else.Wormeaten
I used sizeof() so it would not matter if the length of the string changes and no wild numbers appear in the code.Acanthopterygian

© 2022 - 2024 — McMap. All rights reserved.