What happens when a char array gets initialized from a string literal?
Asked Answered
D

3

5

As I understand it, the following code works like so:

char* cptr = "Hello World";

"Hello World" lives in the .rodata section of the program's memory. The string literal "Hello World" returns a pointer to the base address of the string, or the address of the first element in the so-called "array", since the chars are laid out sequentially in memory it would be the 'H'. This is my little diagram as I visualize the string literal getting stored in the memory:

0x4 : 'H'
0x5 : 'e'
0x6 : 'l'
0x6 : 'l'
0x7 : 'o'
0x8 : ' '
0x9 : 'W'
0xa : 'o'
0xb : 'r'
0xc : 'l'
0xd : 'd'
0xe : '\0'

So the declaration above becomes:

char* cptr = 0x4;

Now cptr points to the string literal. I'm just making up the addresses.

0xa1 : 0x4

Now how does this code work?

char cString[] = "Hello World";

I am assuming that as in the previous situation "Hello World" also degrades to the address of 'H' and 0x4.

char cString[] = 0x4;

I am reading the = as an overloaded assignment operator when it used with initialization of a char array. As I understand, at initialization of C-string only, it copies char-by-char starting at the given base address into the C-string until it hits a '\0' as the last char copied. It also allocates enough memory for all the chars. Because overloaded operators are really just functions, I assume that it's internal implementation is similar to strcpy().

I would like one of the more experienced C programmers to confirm my assumptions of how this code works. This is my visualization of the C-string after the chars from the string literal get copied into it:

0xb4 : 'H'
0xb5 : 'e'
0xb6 : 'l'
0xb6 : 'l'
0xb7 : 'o'
0xb8 : ' '
0xb9 : 'W'
0xba : 'o'
0xbb : 'r'
0xbc : 'l'
0xbd : 'd'
0xbe : '\0'

Once again, the addresses are arbitrary, the point is that the C-string in the stack is distinct from the string literal in the .rodata section in memory.

What am I trying to do? I am trying to use a char pointer to temporarily hold the base address of the string literal, and use that same char pointer (base address of string literal) to initialize the C-string.

char* cptr = "Hello World";
char cString[] = cptr;

I assume that "Hello World" evaluates to its base address, 0x4. So this code ought to look like this:

char* cptr = 0x4;
char cString[] = 0x4;

I assume that it should be no different from char cString[] = "Hello World"; since "Hello World" evaluates to its base address, and that is what is stored in the char pointer!

However, gcc gives me an error:

error: invalid initializer
char cString[] = cptr;
                 ^
  1. How come you can't use a char pointer as a tempoorary placeholder to store the base address of a string literal?
  2. How does this code work? Are my assumptions correct?
  3. Does using a string literal in the code return the base address to the "array" where the chars are stored in the memory?
Donelladonelle answered 19/6, 2018 at 22:59 Comment(3)
Arrays are not pointers. There are many, many explanations of this here on SO, and elsewhere on the net. Yes, you can assign a pointer so it points at an array. No, you can't assign an array so that it points at anything (because it's not a pointer). The initialization char cString[] = "Hello World" does not involve any pointers -- the symbol cString ends up being the address of an array of 12 characters, initialized with the characters from "Hello World".Ac
You write "so-called 'array'" as if there were something doubtful about it. There is not. The term "array" is well defined in C, and the standard explicitly specifies that part of the process of translating C source files into programs is to add a null byte to those of each string literal, and to use the resulting byte sequences to initialize static arrays just long enough to accommodate them. There's no "as if" or "like" -- string literals appearing in C source code correspond to bona fide arrays in programs.Hahn
I do not understand "bona fide". My point is that the string literal appears to be an array stored in the .rodata section of the program. However, it is not a array as I, the programmer, would declare it explicitly in the code: char array[];Donelladonelle
O
6

Your understanding of memory layout is more or less correct. But the problem you are having is one of initialization semantics in C.

The = symbol in a declaration here is NOT the assignment operator. Instead, it is syntax that specifies the initializer for a variable being instantiated. In the general case, T x = y; is not the same as T x; x = y;.

There is a language rule that a character array can be initialized from a string literal. (The string literal is not "evaluated to its base address" in this context). There is not a language rule that an array can be initialized from a pointer to the elements intended to be copied into the array.

Why are the rules like this? "Historical reasons".

Olives answered 19/6, 2018 at 23:9 Comment(0)
C
3

I am assuming that as in the previous situation "Hello World" also degrades to the address of 'H' and 0x4.

Not really: cString[] gets a completely new address in memory. Compiler allocates 12 chars to it, and initializes them with the content of "Hello World" string literal.

I assume that "Hello World" evaluates to its base address, 0x4. Does using a string literal in the code return the base address to the "array" where the chars are stored in the memory?

cString may be converted to char* later on, yielding its base address, but it remains an array in the regular contexts. In particular, if you invoke sizeof(cString) you would get the size of the array, not the size of the pointer.

How come you can't use a char pointer as a temporary placeholder to store the base address of a string literal?

You can. However, once a string literal is assigned to char *, it stops being a string literal, at least as far as the compiler is concerned. It becomes a char * pointer, no different from other char * pointers.

Note that modern C compilers combine identical string literals as an optimization, so if you write

#define HELLO_WORLD "Hello World"
...
char* cptr = HELLO_WORLD;
char cString[] = HELLO_WORLD;

and turn optimization on, the compiler would eliminate duplicate copies of the string literal.

Carse answered 19/6, 2018 at 23:10 Comment(0)
D
3

The second definition char cString[] = "Hello World"; is a shorthand for this equivalent definition:

char cString[12] = { 'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '\0' };

If this definition occurs as a global scope or with static storage, cString will be in the .data segment with the initial contents in the executable image. If it occurs un the scope of a function with automatic storage, the compiler will allocate automatic storage for the array (reserving space on the stack frame or equivalent) and generate code to perform the initialization at run-time.

Dacoity answered 19/6, 2018 at 23:23 Comment(2)
In the abstract machine, there is a string literal which is copied into cString. In terms of observable behaviour the two codes you give are the same. The options for implementing it could be described as an optimization issue.Olives
@M.M: true but copying the string literal is somewhat ambiguous, it does not precisely describe the semantics of another example char cString[12] = { "Hello" }; is equivalent to char cString[12] = { 'H', 'e', 'l', 'l', 'o', '\0', '\0', '\0', '\0', '\0', '\0', '\0' };, not char cString[12]; strcpy(cString, "Hello");Dacoity

© 2022 - 2024 — McMap. All rights reserved.