Asked 21/2, 2010 at 11:56 Answered 2/3, 2021 at 21:21

Solved c arrays pointers performance memory-access

Memory access through pointers is said to be more efficient than memory access through an array. I am learning C and the above is stated in K&R. Specifically they say

Any operation that can be achieved by array subscripting can also be done with pointers. The pointer version will in general be faster

I dis-assembled the following code using visual C++.(Mine is a 686 processor. I have disabled all optimizations.)

int a[10], *p = a, temp;

void foo()
{
    temp = a[0];
    temp = *p;
}

To my surprise I see that memory access through a pointer takes 3 instructions to the two taken by memory access through an array. Below is the corresponding code.

; 5    : temp = a[0];

    mov eax, DWORD PTR _a
    mov DWORD PTR _temp, eax

; 6    : temp = *p;

    mov eax, DWORD PTR _p
    mov ecx, DWORD PTR [eax]
    mov DWORD PTR _temp, ecx

Please help me understand. What am I missing here??

As pointed out by many answers and comments I had used a compile time constant as the array index thus making it arguably easier for the access through an array. Below is the assembly code with a variable as the index. I now have equal number of instructions for access through pointer and arrays. My broader questions still holds good. The memory access through a pointer is not lending itself as being more efficient.

; 7    :        temp = a[i];

    mov eax, DWORD PTR _i
    mov ecx, DWORD PTR _a[eax*4]
    mov DWORD PTR _temp, ecx

; 8    : 
; 9    :    
; 10   :        temp = *p;

    mov eax, DWORD PTR _p
    mov ecx, DWORD PTR [eax]
    mov DWORD PTR _temp, ecx

Christachristabel answered 21/2, 2010 at 11:56 Comment(5)

While i'm weak in asm, i think you should try "temp = a[k];" - where k is a variable not known at compile time. – Heilungkiang 21/2, 2010 at 12:1

@Drakosha, Thanks. I hadn't realised that. Doing as you suggested resulted in 3 instructions for memory access through arrays too(fetch index, fetch value of array element, store in temp). But I am still unable to see the efficiency. :-( – Christachristabel 21/2, 2010 at 12:57

Someone mentioned that K&R said this in the context of a long iteration. That is, in a[i++] vs *p++. I think that then is quite different to what your question asks. – Prescript 21/2, 2010 at 14:56

@Johannes, K&R hasn't really stated this in the context of an iteration. It has been mentioned in the introductory part of the sub-topic"5.3 Array and Pointers". However, the lack of that context is no inhibition to discuss from the long iteration angle. Also as you can see below, the discussion has also veered towards the said angle. – Christachristabel 22/2, 2010 at 8:14

what did you use btw to generate assembly here? – Foresaid 11/8, 2017 at 10:22

Memory access through pointers is said to be more efficient than memory access through an array.

That may have been true in the past when compilers were relatively stupid beasts. You only need to look at some of the code output by gcc in high optimisation modes to know that it is no longer true. Some of that code is very hard to understand but, once you do, its brilliance is evident.

A decent compiler will generate the same code for pointer accesses and array accesses and you should probably not be worrying about that level of performance. The people that write compilers know far more about their target architectures than we mere mortals. Concentrate more on the macro level when optimising your code (algorithm selection and so on) and trust in your tool-makers to do their job.

In fact, I'm surprised the compiler didn't optimise the entire

temp = a[0];

line out of existence since temp is over-written in the very next line with a different value and a is in no way marked volatile.

I remember an urban myth from long ago about a benchmark for the latest VAX Fortran compiler (showing my age here) that outperformed its competitors by several orders of magnitude.

Turns out the compiler figured out that the result from the benchmark calculation wasn't used anywhere so it optimised the entire calculation loop into oblivion. Hence the substantial improvement in run speed.

Update: The reason that optimised code is more efficient in your particular case is because of the way you find the location. a will be at a fixed location decided at link/load time and the reference to it will be fixed up at the same time. So a[0] or indeed a[any constant] will be at a fixed location.

And p itself will also be at a fixed location for the same reason. But *p (the contents of p) is variable and therefore will have an extra lookup involved to find the correct memory location.

You'll probably find that having yet another variable x set to 0 (not const) and using a[x] would also introduce extra calculations.

In one of your comments, you state:

Doing as you suggested resulted in 3 instructions for memory access through arrays too (fetch index, fetch value of array element, store in temp). But I am still unable to see the efficiency. :-(

My response to that is that you very likely won't see an efficiency in using pointers. Modern compilers are more than up to the task of figuring out that array operations and pointer operations can be turned into the same underlying machine code.

In fact, without optimisation turned on, pointer code can be less efficient. Consider the following translations:

int *pa, i, a[10];

for (i = 0; i < 10; i++)
    a[i] = 100;
/*
    movl    $0, -16(%ebp)              ; this is i, init to 0
L2:
    cmpl    $9, -16(%ebp)              ; from 0 to 9
    jg      L3
    movl    -16(%ebp), %eax            ; load i into register
    movl    $100, -72(%ebp,%eax,4)     ; store 100 based on array/i
    leal    -16(%ebp), %eax            ; get address of i
    incl    (%eax)                     ; increment
    jmp     L2                         ; and loop
L3:
*/

for (pa = a; pa < a + 10; pa++)
    *pa = 100;
/*
    leal    -72(%ebp), %eax
    movl    %eax, -12(%ebp)            ; this is pa, init to &a[0]
L5:
    leal    -72(%ebp), %eax
    addl    $40, %eax
    cmpl    -12(%ebp), %eax            ; is pa at &(a[10])
    jbe     L6                         ; yes, stop
    movl    -12(%ebp), %eax            ; get pa
    movl    $100, (%eax)               ; store 100
    leal    -12(%ebp), %eax            ; get pa
    addl    $4, (%eax)                 ; add 4 (sizeof int)
    jmp     L5                         ; loop around
L6:
*/

From that example, you can actually see that the pointer example is longer, and unnecessarily so. It loads pa into %eax multiple times without it changing and indeed alternates %eax between pa and &(a[10]). The default optimisation here is basically none at all.

When you switch up to optimisation level 2, the code you get is:

    xorl    %eax, %eax
L5:
    movl    $100, %edx
    movl    %edx, -56(%ebp,%eax,4)
    incl    %eax
    cmpl    $9, %eax
    jle     L5

for the array version, and:

    leal    -56(%ebp), %eax
    leal    -16(%ebp), %edx
    jmp     L14
L16:
    movl    $100, (%eax)
    addl    $4, %eax
L14:
    cmpl    %eax, %edx
    ja      L16

for the pointer version.

I'm not going to do an analysis on clock cycles here (since it's too much work and I'm basically lazy) but I will point out one thing. There's not a huge difference in the code for both versions in terms of assembler instructions and, given the speeds that modern CPUs actually run at, you won't notice a difference unless you're doing billions of these operations. I always tend to prefer writing code for readability and only worrying about performance if it becomes an issue.

As an aside, that statement you reference:

5.3 Pointers and Arrays: The pointer version will in general be faster but, at least to the uninitiated, somewhat harder to grasp immediately.

dates back to the earliest versions of K&R, including my ancient 1978 one where functions are still written:

getint(pn)
int *pn;
{
    ...
}

Compilers have come an awfully long way since back then.

Beneficial answered 21/2, 2010 at 12:1 Comment(8)

Not an urban myth, this happened to me: we had a test for platform ports, part of a published conformance test set, intended to demonstrate that there was a certain "depth" of call stack available, by making a certain number of recursive calls. GCC made a tail-call optimisation. Oops. Writing optimisation-proof benchmarks is never as easy as you think... – Verdellverderer 21/2, 2010 at 12:15

I had specifically disabled all optimizations to try to see in assembly as to how pointers are more efficient. I am adding this info to my question just in case. – Christachristabel 21/2, 2010 at 12:34

In fact i'm surprised anyway about what K&R are saying apparently. I would say in general, array access is always at least as fast, no matter how the optimization level is. – Prescript 21/2, 2010 at 13:48

"You'll probably find that having yet another variable x set to 0 (not const) and using a[x] would also introduce extra calculations.". You are right. As @Droksha had commented earlier this makes the number of assembly instructions equal for access through array and pointer. I have added this info in the question. – Christachristabel 21/2, 2010 at 14:7

@Johannes: K&R will be thinking of iteration, where in a naive implementation it requires fewer operations to do *p++ than to do a[i++] – Lap 21/2, 2010 at 14:37

@Porculus, ah i see now. I thought it's just about a[i] vs *(p+i). – Prescript 21/2, 2010 at 14:55

@paxdiablo, thanks for the detailed follow up. Just to clarify about my intentions. I was not attempting to optimize any code based on memory access through a pointer instead of an array. I was trying to see for myself the "The pointer version will in general be faster" statement in K&R. – Christachristabel 22/2, 2010 at 8:19

I understand that, @Abhijith. My contention is that you won't see it, simply because it's no longer true. It would only be true for those compilers that calculated base+offset for every index access within an array and those compilers should be thrown immediately on the scrap heap :-) – Beneficial 22/2, 2010 at 8:52

If you're programming embedded platforms, you quickly learn that the pointer method is a lot faster than using an index.

struct bar a[10], *p;

void foo()
{
    int i;

    // slow loop
    for (i = 0; i < 10; ++i)
        printf( a[i].value);

    // faster loop
    for (p = a; p < &a[10]; ++p)
        printf( p->value);
}

The slow loop has to calculate a + (i * sizeof(struct bar)) each time through, whereas the second just has to add sizeof(struct bar) to p each time through. The multiply operation uses more clock cycles than the add on many processors.

You really start to see improvements if you reference a[i] multiple times inside the loop. Some compilers don't cache that address, so it may be recalculated multiple times inside the loop.

Try updating your sample to use a struct and reference multiple elements.

Bethesda answered 21/2, 2010 at 15:49 Comment(3)

That would actually be down to the compiler, really, rather than the target environment. Any good compiler would turn that first loop into the same code as the second. Of course, if you're using an old 8051 compiler (for example) that doesn't do the insane levels of optimization that gcc is capable of then, yes, the first may well be slower. – Beneficial 22/2, 2010 at 0:10

In some cases, pointers will be faster; in others, array accesses will be faster. For example, a compiler for an 8-bit micro might know that a particular array won't ever cross a 256-byte boundary, and generate code to take advantage of this. Code using a pointer would have to handle the upper and lower halves of the pointer with every step. – Roberto 22/11, 2010 at 22:55

How true this is. I coded essentially a data buffer on an 8051, and unwittingly coded an array with a variable. Seems like no big deal on an x86 processor. The whole buffer process required me to double the MCU processor speed just to accomodate the array syntax using a variable. I just ended up hardcoding values instead of the for-loop, and VOILA reduce the processor speed by 100%.... Embedded development is a strange beast compared to writing C for PCs... – Jacintajacinth 13/1, 2018 at 21:36

Pointers naturally express simple induction variables while subscripts somewhat intrinsically require more sophisticated compiler optimizations

In many cases just using a subscripted expression requires that an extra layer be added to the problem. A loop that increments a subscript i can be though of as a state machine, and the expression a[i] technically requires, each time it is used, that i be multiplied times the size of each element and added to the base address.

In order to transform that access pattern to use pointers the compiler must analyze the entire loop and determine that, say, each element is being accessed. Then the compiler can replace the multiple instances of multiplying the subscript by the element size with a simple increment of the previous loop value. This process combines optimizations called common subexpression elimination and induction variable strength reduction.

When writing with pointers, the entire optimization process is not necessary because the programmer will typically just step through the array to start with.

Sometimes the compiler can do the optimization and sometimes it can't. It's more common in recent years to have a sophisticated compiler at hand, so pointer-based code is not always faster.

Because arrrays must usually be contiguous, another advantage for pointers is in creating incrementally allocated composite structures.

Matronly answered 22/2, 2010 at 0:9 Comment(0)

In the first case, the compiler directly knows the address of the array (which is also the address of the first element) and accesses it. In the second case, he knows the address of the pointer and reads the pointer's value, which points to that memory location. That's actually one additional indirection, so it's presumably slower here.

Ceasar answered 21/2, 2010 at 12:0 Comment(3)

This is true if this is not "optimized out" by the compiler as per Paxdiablo's answer. – Swartz 21/2, 2010 at 12:8

Or, if the processor doesn't have an array access addressing mode, because if it does it can do the math in the same time in either case. At which point you're just down to additional register pressure from having to store the base pointer and the offset. – Smattering 21/2, 2010 at 12:37

@dangerstat, but that exactly disproves the claim made by K&R that array access "in general" will be faster. So it's rather the exact other way around. – Prescript 21/2, 2010 at 13:50

The speed is gained in loops, most of all. When you use an array, you would use a counter which you increment. To calculate the position, the system multiplies this counter with the size of the array element, then adds the address of the first element to get the address. With pointers, all you need to do to go to the next element is to increase the current pointer with the size of the element to get the next one, assuming all elements are next to each other in-memory.

Pointer arithmetic thus takes a bit less calculations when doing loops. Also, having pointers to the right element is faster than using an index within an array.

Modern development is slowly getting rid of many pointer operations, though. Processors are getting faster and faster and arrays are easier to manage than pointers. Also, arrays tend to reduce the amount of bugs in code. Array will allow index checks, making sure you're not accessing data outside the array.

Veliavelick answered 21/2, 2010 at 12:19 Comment(3)

"To calculate the position, the system multiplies this counter with the size of the array element, then adds the address of the first element to get the address." This entire sequence of steps are attained by a single assembly instruction( mov ecx, DWORD PTR _a[eax*4]) atleast in x86 thus granting no advantage to access through pointers as far as I can see. – Christachristabel 21/2, 2010 at 13:40

On other platforms (non-x86), you'll have to multiply the index by the size of the array elements as a separate op. You might be able to simply left shift (if the size is a power of 2), but you'll have to multiply in other cases. – Bethesda 21/2, 2010 at 18:54

alex .. can u explain how this happens with Example ? – Weather 12/9, 2010 at 19:8

As paxdiablo said, Any new compiler will make them very similar.

Even more, I saw situations where array was faster then pointers. This was on a DSP processor which uses vector operations.

In this case, using arrays was similar to using restrict pointers. Because by using two arrays the compiler -implicitly- knows that they don't point to the same location. But if you deal with 2 pointer, the compiler may think that they point to same location and will skip pipe lining.

for example:

int a[10],b[10],c[10];
int *pa=a, *pb=b, *pc=c;
int i;

// fill a and b.
fill_arrays(a,b);

// set c[i] = a[i]+b[i];
for (i = 0; i<10; i++)
{
   c[i] = a[i] + b[i];
}

// set *pc++ = *pa++ + *pb++;
for (i = 0; i<10; i++)
{
   *pc++ = *pa++ + *pb++;
}

In case 1, the compiler will easily do pipe-lining of adding a and b, and storing value to c.

In case 2, the compiler will not pipe-line, because he might be overwriting a or b while saving to C.

Iolanthe answered 21/2, 2010 at 12:21 Comment(1)

Good point about the use of restrict - pointer aliasing is pretty much the 'elephant in the room'. I think restrict is now a keyword in C++0x which is an excellent decision IMO – Sample 21/2, 2010 at 13:13

This is a very old question and has been answered, as such I do not need to answer! However, I didn't notice a simple answer so am providing one.

ANSWER: An indirect access (pointer/array) "might" add one additional instruction to load the (base) address, but all accesses after that (elements in case of array / members in case of pointer to struct) should be just one instruction because it is mere addition of an offset to the (base) address that is already loaded. Thus, in a way it is going to be as good as direct access. As such, in majority of the cases, access through array/pointer is equivalent and element accesses are also as good as a direct access to a variable.

Ex. if I have an array (or pointer) with 10 elements or a struct with 10 members (accessed through a pointer to the struct), and I am accessing an element/member, the one possible additional instruction is required only once at the beginning. All the element/member accesses should be just one instruction after that.

Bergius answered 2/8, 2013 at 21:46 Comment(0)

You're getting good answers to your question here, but since you are learning, it is worth pointing out that efficiencies at that level are seldom noticeable.

When you are tuning a program for maximum performance, you should give at least as much attention to finding and fixing larger issues in the structure of the program. After those have been fixed, low-level optimizations can make a further difference.

Here's an example of how this can be done.

Urbano answered 21/2, 2010 at 21:27 Comment(0)

Pointers used to be faster than arrays. Certainly back when the C language was designed pointers were quite a bit faster. But these days, optimizers can usually do a better job optimizing arrays than it can with pointers because arrays are more restricted.

Instruction sets of modern processors have also been designed to help optimize array access.

So the bottom line is that arrays are often faster these days, especially when used in loops with index variables.

Of course you would still want to use pointers for things like linked lists, but the old time optimization of walking a pointer through an array rather than using an index variable is now likely to be a dis-optimization.

Upheave answered 21/2, 2010 at 21:37 Comment(0)

"The pointer version will in general be faster" means that in most cases it's easier for the compiler to generate more efficient code having a pointer (which just needs to be dereferenced) than having an array and subscript (which means that the compiler needs to shift the address from the start of the array). With the modern processors and optimizing compilers, however, array access in the typical case is not slower than pointer access.

Specifically in your case, you would need to switch on the optimization, in order to get the same result.

Nathan answered 21/2, 2010 at 12:4 Comment(0)

Since 0 is defined as a constant, a[0] is a constant too, and the compiler knows where it is at compile time. In the "normal" case, the compiler would have to compute the element address from a base + offset (with offset being scaled according to the element size).

OTOH, p is a variable, and the indirection requires an extra move.

On a general basis, array index is internally handled as pointer arithmetic anyway, so I'm not sure to see the point that the K&R was trying to make.

Psychodrama answered 21/2, 2010 at 12:5 Comment(0)

Since most people has already given detailed answers, I'll just give an intuitive example. If you use array and pointer in larger scale, the efficiency of using pointer will be more significant. For example, if you want to sort a large long int data set by sorting it into several sub set and then merge them.

long int * testData = calloc(N, sizeof(long int));

For daily 8G ram machines in 2017, we can set N to as large as 400000000, which means you will use roughly 1.5G memory for this original data set. And if you are using MPI, you can separate your data quickly by using

MPI_Scatterv(testData, partitionLength, partitionIndex, MPI_LONG, MPI_IN_PLACE, N/number_of_thread, MPI_LONG, 0, MPI_COMM_WORLD);

You can simply treat partitionLength as a pointer which stores N/number_of_thread as length for each identical part, and treat partitionIndex as a pointer which stores N/number_of_threads staring index incrementally. Assume you have an 4-core CPU and you only separate your job in 4 threads. MPI will definitely do the job in a quick sense by the references. But if you are using array, this routine have to run a pointer arithmetic on the array to find the partition point first. Which is not as direct as pointer. Also, when you merge the partitioned data set, you might want to use K-way merge to accelerate. You need a temp space to store the four sorted data set. Here, if you use pointer, you only need to store 4 addresses. However, if you use array, it will store 4 entire sub arrays, which is not efficient. Sometimes, if you are not using MPI_Barrier to make sure your program is thread safe, MPI might even complain that your memory implementation is bad. I got an 32G machine for sorting 400000000 long values on 8 thread by array method and pointer method, I got 11.054980s and 13.182739s correspondingly. And if I increase the size to 1000000000, my sorting program will not be executed successfully if I'm using array. That's why lots of people use pointers for every data structures except scalars in C.

Gahl answered 3/5, 2017 at 22:55 Comment(0)

i am a little surprised about the ptr faster than array discussion, where the evidence that this is not the case is given initially by the asm code from Abhijith.

mov eax, dord ptr _a; // load directly value from adress _a

mov eax, dword ptr _p; // load adress/value of p into eax

and

mov ecx, dword ptr [eax]; // use loaded adress to access value and put into ecx

An array represents a fixed adress so the cpu can directly access it, not so with the ptr it needs to be dereferenced in order for the cpu to access the value!

The second batch of code is not comareable as the array offset must be calulated, in order to do that for the ptr you would also need at least 1/2 more instructions!

Anything a compiler can infer during compile time(fixed adresses, offsets etc.) is key to performant code. Comparing iterative code and assigning to vars:

Array:

; 2791 : tmp = buf_ai[ l ];

mov eax, DWORD PTR _l$[ebp]
mov ecx, DWORD PTR _buf_ai$[ebp+eax*4]
mov DWORD PTR _tmp$[ebp], ecx

PTR

; 2796 : tmp2 = *p;

mov eax, DWORD PTR _p$[ebp]
mov ecx, DWORD PTR [eax]
mov DWORD PTR _tmp2$[ebp], ecx

plus

; 2801 : ++p;

mov eax, DWORD PTR _p$[ebp]
add eax, 4
mov DWORD PTR _p$[ebp], eax

It's simply for ptr load adress first than use it compared to Array use adress and get value simultaneously!

best regards

Debbi answered 6/6, 2018 at 11:12 Comment(0)

Effiency of arrays vs pointers: the case of vectorization

If you are using a compiler like gcc, then it can make a lot of sense to use arrays over points to profit from the gain of auto-vectorization:

Basic block vectorization, aka SLP, is enabled by the flag -ftree-slp-vectorize, and requires the same platform dependent flags as loop vectorization. Basic block SLP is enabled by default at -O3 and when -ftree-vectorize is enabled.

Unvectorizable Loops

Examples of loops that currently cannot be vectorized:

Example 1: Uncountable loop:


while (*p != NULL) {
  *q++ = *p++;
}

Vectorizable Loops

"feature" indicates the vectorization capabilities demonstrated by the example.

Example 1:

int a[256], b[256], c[256];
foo () {
  int i;

  for (i=0; i<256; i++){
    a[i] = b[i] + c[i];
  }
}

Bottom Line

So while many will tell you that pointer or array is better, the best is, like always:

To compile your code with the best possible flags
Inspect the produced bytecode using a compiler explorer
Finally benchmark the real runtime speed.

Unbelief answered 2/3, 2021 at 21:21 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++