Unexpected printf output [duplicate]
Asked Answered
J

6

11

I just discovered very weird behavior from the C compiler. It's very simple code. I tried it in many online C compilers, but the result is always the same, which is driving me insane.

#include <stdio.h>

int main()
{
    char Buffer[10] = "0123456789";
    char ID[5] = "abcde";
    printf("%s",ID);

    return 0;
}

Take your time and try predict the result of the printf function. If you're a human like me, then I think the most obvious solution is "abcde", which is not correct! But if somehow you figured it out "abcde0123456789", then you're consuming electricity to live.

How, just how, is that possible? I'm only selecting the ID array to be printed, so WHY is the Buffer one printed with it too? It doesn't make sense. Even the ID array isn't big enough to fit all that data. I'm really losing my mind here.

Jehoash answered 4/9, 2022 at 23:3 Comment(8)
You always need enough space for the string terminator. Neither of your arrays are large enough to store all the characters you want and also the string terminator. You need sizes of 11 and 6 respectively.Plaque
The order the arrays are stored in memory is indeterminate. Without a null terminator, there is nothing stopping printf to continue into the bytes in memory past the abcde array which in this case is the 0123456789 array.Bedizen
Yeah but if nothing holding the printf function to continue then why it doesn't print the entire system memory xDJehoash
@Jehoash : "why it doesn't print the entire system memory" -- because printf will likely encounter a null byte (\0) sooner or later after walking off the end of your array.Twum
@absurdum probably but it should at least print a random length of a string until it encounters the null terminator not ALWAYS just after the Buffer array that I declared.Jehoash
@Jehoash It's very likely that Buffer is at the top of the local stack, and behind that, there's the return address from the call to your main function. It's also likely that this return address, as a number, is small enough to contain at least one zero in its high byte. Which acts as a terminator for printf (depending on endianness, this zero might be in memory before or after the other bytes). Then, there may be alignment issues, and there may be a "stack canary" between return address and variables, and none of this is guaranteed - but it's quite likely to find \0 somewhere.Aleasealeatory
I just discovered very weird behavior from the C compiler. Well, no. You have discovered a superficially weird — but actually perfectly normal — behavior of your undefined program. printf deals in properly-formed strings. You went out of your way to construct and give it something that wasn't a properly-formed string. Weird behavior is to be expected, and there's nothing too surprising about the particular weird behavior you saw.Pentagram
@Jehoash but it should at least... not ALWAYS just after... This is what I call the "confounding expectations" rule of undefined behavior. We sometimes casually say that "undefined behavior is random", but it's not really random. Undefined behavior can do anything. If you expected it to do the same thing every time, that's wrong, because it might do a different thing every time. But if you expected it to do something different every time, that's also wrong, because it might just happen to do the same thing every time.Pentagram
A
15

The format specification %s expects a pointer to a string: sequence of characters terminated by the zero character '\0'.

However the both arrays

char Buffer[10] = "0123456789";
char ID[5] = "abcde";

do not contain strings. So the call of printf invokes undefined behavior.

You should write

char Buffer[] = "0123456789";
char ID[] = "abcde";

or

char Buffer[11] = "0123456789";
char ID[6] = "abcde";

Pay attention to that string literals are stored as character arrays with addition zero character '\0'.

For example this declaration

char ID[] = "abcde";

in fact is equivalent to

char ID[] = { 'a', 'b', 'c', 'd', 'e', '\0' };

and this declaration

char ID[5] = "abcde";

is equivalent to

char ID[5] = { 'a', 'b', 'c', 'd', 'e' };

That is in the last case the zero character '\0' is not used as an initializer of the array ID.

If you want to output a character array that does not contain a string you can use the precision field as for example

printf( "%.5s\n", ID );

or

printf( "%.*s\n", 5, ID );

or

printf( "%.*s\n", ( int )sizeof( ID ), ID );

Also bear in mind that opposite to C in C++ such a declaration like

char ID[5] = "abcde";

is invalid. In C++ you may not ignore the terminating zero character '\0' of a string literal used as an initializer. Otherwise the number of initializers will exceed the number of initialized array elements.

Alden answered 4/9, 2022 at 23:10 Comment(3)
Ok I only have to add an extra byte or char (I really don't know the difference so far) so that the compiler or myself can put a null terminator.Jehoash
It may be worth noting here that if the strings do not need to be modified, char *ID = "abcde"; or const char *ID = "abcde"; would also serve the needed purpose.Shirl
@Jehoash Better yet, don't "add an extra byte" at all. Just say char ID[] = "abcde";, and let the computer do the counting. Counting characters is the kind of nuisancey task that computers are great at.Pentagram
S
9

The behavior of printf is undefined because it cannot treat ID as a string, aka a null terminated char array, the printf function, given the %s format specifier, relies on this null terminator to know where to stop printing. Since there is no null byte to be found it will overrun the array into adjacent memory looking for it and prints whatever is in there. It just so happens that in that region is the other char array Buffer, and that's what gets printed, it could be something else entirely including the expected result if by chance a null byte was found in the first byte of this contiguous memory, note the definition of undefined behavior:

Behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements.

Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment [...]


The majority of the compilers and respective versions I tested indeed behave as you describe and print both arrays in sequence, not all of them though. It's not a pattern you can rely on, as you can see here:

https://godbolt.org/z/1E396Y3KG (gcc with optimization)

Or here:

https://godbolt.org/z/roa6GxWvr (msvc)

The result is not always abcde0123456789.


As for the reason it has no null terminator ('\0'), it's because there is not enough room for it, if you declare the size as having an extra element it will be added automatically by the compiler:

char ID[6] = "abcde"; //will automatically append \0 to the char array
        ^

Omitting the size is actually a better practice, the compiler will deduce the needed size without you having to count the characters and therefore it's less prone to errors:

char ID[] = "abcde";
Strongminded answered 4/9, 2022 at 23:7 Comment(2)
Ok I understand, I have to add 1 to the actual length of the array and \0 to the end of the string, but in my case the result is the same in every machine, which means the result is predictable not random.Jehoash
@Jehoash The stament "the result is the same in every machine" is not accurate, as I've shown you in the llinked code.Strongminded
P
4

If you are going to go to the effort of counting the number of characters in strings, a better notation would show the world that you are aware of the required '\0';

#define NUMBER_OF_DIGITS 10

char Buffer[ NUMBER_OF_DIGITS + 1 ] = "0123456789";
char ID[ 5 + 1 ] = "abcde";

Be aware:

sizeof ID  !=  strlen( ID );
Pediform answered 5/9, 2022 at 0:38 Comment(0)
L
3

The printf() format %s assumes a NUL terminated string. But you declared ID[5] with 5 printable characters, and since you specified its size it has no NUL byte at the end. This caused printf() to overrun the allocated space of ID and by just dumb luck that ran into the allocation of Buffer. Don't do that. That invoked the demons of undefined behavior and you got "lucky" that the result was only unexpected output. Incidentally, Buffer too is initialized without a terminating NUL byte, so your printed string was terminated by whatever random thing the linker put immediately after Buffer when it allocated storage for objects in the data segment.

Lardy answered 4/9, 2022 at 23:10 Comment(2)
I would have liked to get an undefined behavior, but the result in every compiler I tried was the same, anyway I got now xD and thank you for your help.Jehoash
@Jehoash you did get undefined behaviour. "undefined behaviour" can appear to look like what results you were expecting .Nymph
S
3

Other answers are flawless and explain everything perfectly, but I'd like to show you a more practical example since you can have so much fun by playing with C. Have a look at this:

Memory address array1: 0x7ffd58160dea
Memory address array2: 0x7ffd58160de5

Content of array1: 0123456789�
Content of array2: abcde0123456789�

Memory address] Memory content:
0x7ffd58160de5] a
0x7ffd58160de6] b
0x7ffd58160de7] c
0x7ffd58160de8] d
0x7ffd58160de9] e
0x7ffd58160dea] 0
0x7ffd58160deb] 1
0x7ffd58160dec] 2
0x7ffd58160ded] 3
0x7ffd58160dee] 4
0x7ffd58160def] 5
0x7ffd58160df0] 6
0x7ffd58160df1] 7
0x7ffd58160df2] 8
0x7ffd58160df3] 9
0x7ffd58160df4] 
0x7ffd58160df5] 

Let's consider 2 char arrays array1 and array2 with different lengths and without the terminator character '\0'.

The following code saves the lowest address of the two arrays ((char*)&array1 < (char*)&array2) and saves it in startingPtr, then prints the following 100 char (byte) of memory starting from startingPtr, showing both the address and the content:

#include <stdio.h>

int main()
{
    char array1[10] = "0123456789";
    char array2[5] = "abcde";
    char* startingPtr;

    printf("Memory address array1: %p\nMemory address array2: %p\n", &array1, &array2);

    printf("\nContent of array1: %s\nContent of array2: %s\n", array1, array2);

    // Get which one has the lower address
    if ((char*)&array1 < (char*)&array2)
        startingPtr = (char*)&array1;
    else startingPtr = (char*)&array2;

    // Print memory content, starting from the memory address of the first array
    printf("\nMemory address] Memory content:\n");
    for (int i = 0; i < 100; i++)
    {
        printf("%p] %c\n", &(*(startingPtr + i)), *(startingPtr + i));
    }

    return 0;
}

Check the output there, with different compilers:

As you can notice, the output can be different for a bunch of reasons (depending on the compiler, the machine, virtual memory, etc.).

But the reason you can sometimes see the content of both the arrays is that it can happen that the Operating System allocates their variables near, in continuous memory addresses. Therefore printf("%s");, which expects an adequately formatted "string" (i.e. a char buffer with the terminator character at the end), believes that your buffer is longer than 10 or 5 characters, and prints also the following characters.

However, that's definitely not reliable, since it's undefined behaviour.


NB: the notation *(array + index) is one of the many ways you can access array elements. Since arrays are basically pointers to their first element, that means "get the value stored at memory address array + index", and is equivalent to array[index].

Sculptress answered 5/9, 2022 at 0:11 Comment(0)
B
2

In C the string should contain a null terminator. Therefore real length of the string requires at least one additional character. So your string "abcde" requires 6 characters, 5 + 1 extra. So, the following will work:

char Buffer[11] = "0123456789";
char ID[6] = "abcde";

The compiler will add '\0' automatically, since it is the part of the double-quoted strings.

Bedlamite answered 4/9, 2022 at 23:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.