Is a char array more efficient than a char pointer in C?
Asked Answered
N

7

5

I'm trying to understand the under-the-hood difference between these two char declarations:

char* str1;
char str2[10];

I understand that using char* gives a pointer to the first character in str1, while char[10] results in an array with a length of 10 bytes (because char is a single byte... I recall that in some encodings it can be more but let's just assume one byte to keep it simple).

My question is when actually assigning values to them the data obviously has to be stored somewhere and in the case of char[10] we're telling the compiler upfront to allocate ten bytes, whereas in the case of char* we're really just saying allocate a pointer to a single byte. But what happens if the string we assign to str1 is more than a single byte, how is that allocated? How much more work is needed to appropriately allocate that? Plus, what happens if we want to reassign str1 to be something longer than what was previously assigned, how is that allocated?

Because of the uncertainty from the compiler's point of view when dealing with char pointers, is it more efficient to use a char array when I either know the length ahead of time or want to limit the length to start with?

Nimrod answered 29/9, 2021 at 12:47 Comment(10)
If you mean "What allocation happens in char *s = "Hello";", there is none. The string literal "Hello" is embedded into the executable, you just make s point to it. Generally if you want to see how this stuff works you should look at the generated assembly.Lombardo
Where did you get this syntax char[10] str2;?Seiter
Have a look at section 6 of the comp.lang.c FAQ.Chromatin
Large allocations may be done with malloc to avoid stack overflow. Small allocations may be done on the stack for better cache performance.Foliose
@haccks, I guess from C#?Phira
@Lombardo What if those declarations are part of a struct in a library that will be used later by some program?Nimrod
@Seiter It's from C, I had it wrong. Updated to correct syntaxNimrod
@Lombardo is it written in the standard how static string literals are stored?Krill
@Krill I'd at least say it's common practice for implementations to embed string literals in the executable, but you have a point, it's probably not written in the standard (I haven't checked but I'd imagine something like that wouldn't be forced or anything).Lombardo
@Nimrod Sorry, I'm having trouble understanding/visualizing what you're saying. Could you clarify?Lombardo
F
7

When discussing performance in general, allocation, access time and copy time separate things. You seem mostly concerned about allocation.

But there are lots of misconceptions here. Arrays are used for storing. Pointers are used to point at things stored elsewhere. You cannot store any data in a pointer, you can only store an address to data allocated elsewhere.

So comparing pointers or arrays is pretty much nonsense, because they are separate things. Similar to "should I live in my house at a street address or should I live in the sign stating my street address".

I understand that using char* gives a pointer to the first character in str1

No, it gives a pointer to a single character which is allocated somewhere else. Though it doesn't point anywhere meaningful until you assign an address to it. In case of arrays, it will typically get set to point at the first character of the array.

I recall that in some encodings it can be more

No, a character is per definition always 1 byte. Some exotic systems might have 16 bits per bytes or such though. This is of no concern unless you program exotic DSPs and the like. As for other character encodings, there's wchar_t which is a different topic entirely.

whereas in the case of char* we're really just saying allocate a pointer to a single byte

No, we tell it to allocate room for the pointer itself. Which is typically of a size between 2 to 8 bytes depending on address bus width of the specific system.

But what happens if the string we assign to str1 is more than a single byte, how is that allocated?

However you like. You can assign it to a read-only string literal, or a static storage duration variable, or a local automatic storage variable, or dynamically allocated variables. The pointer itself doesn't know or care.

How much more work is needed to appropriately allocate that?

It depends on what you want to allocate.

Because of the uncertainty from the compiler's point of view when dealing with char pointers

What uncertainty is that? Pointers are pointers and the compiler don't treat them much differently than other variables.

is it more efficient to use a char array when I either know the length ahead of time or want to limit the length to start with?

You need to use an array, because data cannot be stored in thin air. Again, data cannot be stored "in pointers".

Fouquet answered 29/9, 2021 at 13:13 Comment(1)
Alright I think this answer helped me to understand where I was getting confused and why my question may not have made the most sense. ThanksNimrod
W
6

But what happens if the string we assign to str1 is more than a single byte, how is that allocated?

str1 ultimately has to point to another array of char - whether it's allocated automatically, such as

char buffer[10];
char *str1 = buffer; // equivalent to &buffer[0]

or dynamically:

char *str1 = malloc( sizeof *str1 * 10 );

or through some other method. All str1 stores is the address of a char object somewhere in memory. You're not actually saving anything to str1, you're saving it to whatever str1 points to. Assume the following declarations:

char *str;
char buffer[10];

We have something like this in memory:

      char *            char
      +---+             +---+
 str: | ? |     buffer: | ? | buffer[0]
      +---+             +---+
                        | ? | buffer[1]
                        +---+
                         ...
                        +---+
                        | ? | buffer[9]
                        +---+

First, we assign the address of the first element of buffer to str1:

str = buffer;

Now our picture looks like this:

      char *            char
      +---+             +---+
 str: |   | --> buffer: | ? | buffer[0]
      +---+             +---+
                        | ? | buffer[1]
                        +---+
                         ...
                        +---+
                        | ? | buffer[9]
                        +---+

Now we can store a string in buffer using str:

strcpy( str, "foo" );

giving us

      char *            char
      +---+             +---+
 str: |   | --> buffer: |'f'| buffer[0]
      +---+             +---+
                        |'o'| buffer[1]
                        +---+
                        |'o'| buffer[2] 
                        +---+
                        | 0 | buffer[3]
                        +---+
                         ...
                        +---+
                        | ? | buffer[9]
                        +---+

"So," you're asking yourself, "why do we bother with the pointer? Why not just store the string to buffer directly? Wouldn't that be more efficient?"

Yes, normally we would just store to buffer directly and avoid the overhead of the pointer if that was an option. Sometimes, however, it isn't an option. We work through pointers in the following situations:

  • The array was allocated dynamically - in this case we have no option but to go through a pointer:
    char *str = malloc( sizeof *str * 10 );
    strcpy( str, "foo" );
    
  • The array was passed as an argument to a function - because of the decay rule, when a you pass an array as a function argument what the function actually receives is a pointer to the first element (this is true of all array types, not just character arrays):
    void foo( char *str, size_t max_size )
    {
      strncpy( str, "this is a test", max_size );
      str[max_size-1] = 0;
    }
    
  • We're using a pointer to iterate through an array of char [] or char *:
    char table[][10] = { "foo", "bar", "bletch", "blurga", "" };
    ...
    char *p = table[0];
    while ( strlen( p ) )
      printf( "%s\n", p++ );
    ...
    
    Of course, we could just use array notation and not bother with the pointer at all:
    size_t i = 0;
    while ( strlen( table[i] ) )
      printf( "%s\n", table[i++] );
    
    Sometimes using array notation makes more sense, sometimes using a pointer makes more sense - depends on the problem at hand.

  1. Unless it is the operand of the sizeof or unary & operator, or it is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of `T`" will be converted, or "decay", to an expression of type "pointer to `T`" and the value of the expression will be the address of the first element of the array.
Wildeyed answered 29/9, 2021 at 13:56 Comment(2)
Good. Clean and detailed explanation.Seiter
Thank you for such a detail explanation, I finally understood all the implications after learning c++ and scouring through the internet for 2 weeks.Barbusse
S
3

char* str1; is declaring str1 as a pointer to a char data type. It doesn't allocate memory for a byte. But compiler allocates sizeof(char*) bytes for this variable.

str1 can be used to point to any char * data type. For example, a string literal or a char array terminated with \0.

I don't know what do you mean by Is a char array more efficient than a char pointer. Both data types are different and have different use cases. Asking this question sounds like asking Is an int type more efficient than a double type? It doesn't make any sense.

On the other hand, char[10] str2; is not a valid C syntax. I guess you mean char str2[10]; and this declares str2 as an array of 10 char. This variable can store 10 char data types.

str1 and str2 are two different data types.

Seiter answered 29/9, 2021 at 12:56 Comment(0)
H
2

Ok, let's go through your question piece by piece.

char* str1; //this is a pointer
char str2[10]; //that is an array of 10 characters
char[10] str2; //that is compilation error
               //possibly your mistook C for Java

I understand that using char* gives a pointer to the first character in str1

char* str1 is a pointer to a character. It can point to contiguous memory location, e.g. a C-style string, but doesn't have to. It might point to a single character as well.

while char[10] results in an array with a length of 10 bytes (because char is a single byte... I recall that in some encodings it can be more but let's just assume one byte to keep it simple).

Yes, that is correct. Depending on where it is defined, the array can be located on the stack, or in the data segment (if it's a global); that's an implementation detail though.

My question is when actually assigning values to them the data obviously has to be stored somewhere and in the case of char[10] we're telling the compiler upfront to allocate ten bytes (...)

That's generally right.

But what happens if the string we assign to str1 is more than a single byte, how is that allocated? How much more work is needed to appropriately allocate that? Plus, what happens if we want to reassign str1 to be something longer than what was previously assigned, how is that allocated?

It really depends on the case. Given the following case:

char s1[] = "foo";
char s2[] = "bar";
char* ptr;
ptr = &s1[1]; //points to first o
ptr = &s2[2]; //points to r

nothing is really allocated. Simply the contents of ptr changes, the same way an integer would. Note that it can be dereferenced/passed as a C-style string in this case.

However, in the following one, it cannot:

char c1 = 'a';
char c2 = 'b';
char* ptr;
ptr = &c1; //points to a
ptr = &c2; //points to b

Now, in case of immediate strings:

const char* s = "foo"; //should be const char* actually

the string is stored in the binary most likely as a global const and the s points to its start. A mental model for it might be similar to:

//globals
const char someCompilerGeneratedName[] = "foo";

//then the pointer:
const char* s = &someCompilerGeneratedName[0];

//Note that arrays decay to pointers, 
//i.e. array name denotes address of its 1st element
//the one below is equivalent:
const char* s = someCompilerGeneratedName;

Now, the pointer can post also to dynamically allocated memory. But it does not have to.

So the following code

char single = 'c';
char* c1 = malloc(10*sizeof(char));
char* c2;

c2 = c1;
c2 = &single;

is perfectly valid.

From performance standpoint: measure first. There is no easy answer here.

Now if you're asking about heap vs stack allocations, that's another story. But I'd say: measure first. Heap allocations are generally believed to be slower (often they are), but oftentimes their overhead is negligible anyway.

Also, keep in mind that

*(p+2) = //whatever else

is equivalent to:

p[2] = //whatever else

so sometimes it might be just the case of readability.

Humbug answered 29/9, 2021 at 13:18 Comment(2)
"Yes, that is correct. Depending on where it is defined...", (under the second quoted section) Do you mean to say that char[10] is right? (Above that you say it is a compiler error.)Valma
@Valma I assumed char[10] to be a type signature outside the code snippet.Humbug
S
2

Arrays used in expressions is implicitly converted (with rare exceptions) to pointers to their first elements. So for example if you write

char[10] str2 = "Hello"; char* str1 = str2;

then these class of puts

puts( str2 );
puts( str1 );

will be equivalently efficient the same way is to write

for ( size_t i = 0; str2[i] != '\0'; i++ )
{
    putchar( str2[i] );
}

and

for ( size_t i = 0; str1[i] != '\0'; i++ )
{
    putchar( str1[i] );
}

A difference can occur in these declarations

char[10] str2 = "Hello";
char* str1 = "Hello";

In the first case the array str2 is initialized by a string literal and you may change the stored string as for example

str2[0] = 'h';

In the second case the pointer str1 points to a string literal that has static storage duration and may not be changed. So if you will write

str1[0] = 'h';

then this statement will invoke undefined behavior.

On the other hand, if you will write the following function

char * f( void )
{
    char str2[10] = "Hello";
    return str2;
}

then the returned pointer will be invalid because the declared array will not be alive after exiting the function.

But this function

char * f( void )
{
    char* str1 = "Hello";
    return str1;
}

will be correct because the string literal having static storage duration will be alive after exiting the function.

Also if you will declare

char* str1 = "Hello";

then the expression sizeof( str1 ) will yield the size of the pointer that is equal to either 4 or 9 dependent on the used system.

But if you will write

char str2[10] = "Hello";

then the expression sizeof( str2 ) will yield the size of the array that is equal to 10.

However this function calls

strlen( str1 );

and

strlen( str2 );

equivalently effective and the both will return the value 5 that is the length of the string "Hello".

Scratches answered 29/9, 2021 at 15:45 Comment(0)
S
1

There are enough answers explaining the miscellaneous aspects.

When C++ came first with the STL libraries, undermore string, we did have a speed and memory problem: really many strings.

So I made my own implementation of string with both:

char* ptr_to_actual_content;
char small_content[16];

ptr_to_actual_content = size < sizeof(small_content) ? small_content : malloc(size);

As now for small strings no extra allocation happened, the performance and speed gain was unbelievable huge. (By the way, NO memory leaks.)

Shwa answered 29/9, 2021 at 16:4 Comment(0)
L
1

Is a char array more efficient than a char pointer in C?

This is really an impossible question to answer. There is no inherent efficiency of an array or a pointer. One or the other may have better or worse efficiency for a particular operation, under a particular compiler, for a particular processor architecture. But there are no absolute guarantees.

What's equally important is that one or the other may have better or worse functionality for the problem at hand. Or better or worse convenience for you, the programmer. These factors are hugely important, too, likely more important than raw efficiency.

My advice to you is that you really learn the different ways of using arrays and pointers in C, and — most importantly — really understand the concept of the "correspondence between arrays and pointers". Only after gaining that understanding, I think, will you be in a position to make any meaningful distinctions between which one might be "more efficient" for a particular problem. Realize, too, that the actual efficiency differences, if any, might be immeasurably slight.

In terms of raw allocation, it's always going to be faster to allocate an array (which mostly happens at compile time, and has almost no run-time overhead [footnote]) than to call malloc to dynamically allocate some memory to point to. (But, much of the time, there will be an overwhelming preference towards dynamic memory allocation anyway, because of the intolerable nuisance of fixed, compile-time limits.)

In terms of copying, there will not generally be any difference whatsoever between any of the calls

memcpy(a, p, n);
memcpy(p, a, n);
memcpy(a1, a2, n);
memcpy(p1, p2, n);

for arrays a and pointers p. On the other hand, the best way to speed up memory copying is to not copy memory around at all, so in one sense pointers can be hugely more efficient, in that they let you do things like

p = a;

and

p1 = p2;

instead (which you obviously can't do for arrays).

Finally, there's the question of raw access. Down at the machine language level, there is going to be a significant difference between the instructions required to do

x = a[i];

versus

y = p[i];

But I can't tell you which will be faster, because it tends to vary from processor to processor and compiler to compiler.

People used to worry whether it was faster to iterate over an entire array using "array style":

for(i = 0; i < n; i++)
    sum += a[i];

or "pointer style":

for(p = a; p < &a[n], p++)
    sum += *p;

Once upon a time, one of these was likely to be significantly more efficient — although, again, the answer depended on the processor architecture, and tended to vary over time. Today, I believe that optimizing compilers are smart enough to choose the most efficient machine code to emit regardless of which way you write the C code.

Finally, as is always the case when asking "Which is more efficient?", the only way to find an actual answer is to code it up, for your problem statement and using your compiler and your processor, and perform careful measurements. There are so many factors that go into the efficiency question that it's usually impossible to make accurate predictions.


Footnote: I said that array allocation "mostly happens at compile time, and has almost no run-time overhead". There's one exception, which concerns large arrays which are local to functions. Those can take significant time to "allocate" if the OS has to assign a bunch more stack space at the last minute, and they can also fail if they're too big. So large, local arrays are generally disrecommended.

Lauren answered 30/9, 2021 at 12:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.