In C, accessing an array index is faster or accessing by pointer is faster? By faster I mean, which one would take less clock cycle. The array is not an constant array.
templatetypedef has summed it up. To add some support to his response. Take these example functions:
unsigned int fun1 ( unsigned int *x ) { unsigned int ra,rb; rb=0; for(ra=0;ra<1000;ra++) rb+=*x++; return(rb); } unsigned int fun2 ( unsigned int *x ) { unsigned int ra,rb; rb=0; for(ra=0;ra<1000;ra++) rb+=x[ra]; return(rb); }
Now gcc produced this:
00000000 fun1: 0: e52d4004 push {r4} ; (str r4, [sp, #-4]!) 4: e1a03000 mov r3, r0 8: e2804efa add r4, r0, #4000 ; 0xfa0 c: e3a00000 mov r0, #0 10: e1a02003 mov r2, r3 14: e492c004 ldr ip, [r2], #4 18: e5931004 ldr r1, [r3, #4] 1c: e2823004 add r3, r2, #4 20: e080000c add r0, r0, ip 24: e1530004 cmp r3, r4 28: e0800001 add r0, r0, r1 2c: 1afffff7 bne 10 30: e49d4004 pop {r4} ; (ldr r4, [sp], #4) 34: e12fff1e bx lr 00000038 fun2: 38: e3a03000 mov r3, #0 3c: e1a02003 mov r2, r3 40: e790c003 ldr ip, [r0, r3] 44: e2833004 add r3, r3, #4 48: e7901003 ldr r1, [r0, r3] 4c: e2833004 add r3, r3, #4 50: e082200c add r2, r2, ip 54: e3530efa cmp r3, #4000 ; 0xfa0 58: e0822001 add r2, r2, r1 5c: 1afffff7 bne 40 60: e1a00002 mov r0, r2 64: e12fff1e bx lr
The code is different, but I am surprised at the missed opportunities for optimization.
Clang/llvm produced this:
00000000 fun1: 0: e3a01000 mov r1, #0 4: e3a02ffa mov r2, #1000 ; 0x3e8 8: e1a03001 mov r3, r1 c: e2522001 subs r2, r2, #1 10: e490c004 ldr ip, [r0], #4 14: e08c3003 add r3, ip, r3 18: e2c11000 sbc r1, r1, #0 1c: e182c001 orr ip, r2, r1 20: e35c0000 cmp ip, #0 24: 1afffff8 bne c 28: e1a00003 mov r0, r3 2c: e12fff1e bx lr 00000030 fun2: 30: e3a01000 mov r1, #0 34: e3a02ffa mov r2, #1000 ; 0x3e8 38: e1a03001 mov r3, r1 3c: e2522001 subs r2, r2, #1 40: e490c004 ldr ip, [r0], #4 44: e08c3003 add r3, ip, r3 48: e2c11000 sbc r1, r1, #0 4c: e182c001 orr ip, r2, r1 50: e35c0000 cmp ip, #0 54: 1afffff8 bne 3c 58: e1a00003 mov r0, r3 5c: e12fff1e bx lr
You might notice that the compiler produced the exact same code, pointer or offset. And by changing compilers I was better off than changing pointer vs array indexing. I think llvm could have done a little better, I will need study this some more to understand what my code did to cause this.
EDIT:
I was hoping to get the compiler to at a minimum use the ldr rd,[rs],#4 instruction which favors pointers, and hoped the compiler would see that it could destroy the array address thus treating it like a pointer rather than an offset into an array (and use the above instruction, which is basically what clang/llvm did). Or if it did the array thing that it would use the ldr rd,[rm,rn] instruction. Basically was hoping one of the compilers would generate one of these solutions:
funa: mov r1,#0 mov r2,#1000 funa_loop: ldr r3,[r0],#4 add r1,r1,r3 subs r2,r2,#1 bne funa_loop mov r0,r1 bx lr funb: mov r1,#0 mov r2,#0 funb_loop: ldr r3,[r0,r2] add r1,r1,r3 add r2,r2,#4 cmp r2,#0x4000 bne funb_loop mov r0,r1 bx lr func: mov r1,#0 mov r2,#4000 subs r2,r2,#4 func_loop: beq func_done ldr r3,[r0,r2] add r1,r1,r3 subs r2,r2,#4 b func_loop func_done: mov r0,r1 bx lr
Didnt quite get there but got pretty close. This was a fun exercise. Note the above is all ARM assembler.
In general, (not my specific C code example and not necessarily an ARM), a number of the popular architectures you will have a load from a register based address (ldr r0,[r1]) and a load with a register index/offset (ldr r0,[r1,r2]) where the address is the sum of the two registers. one register ideally is the base address of the array and the second the index/offset. The former load from register lends itself to pointers, the latter to arrays. if your C program is NOT going to change or move the pointer or index, then in both cases that means a static address which is computed then a normal load is used, both array and pointer should produce the same instructions. For the more interesting case of changing the pointer/index.
Pointer ldr r0,[r1] ... add r1,r1,some number Array index ldr r0,[r1,r2] ... add r2,r2,some number
(replace the load with a store and the add with a sub as needed)
Some architectures do not have a three register register index instruction so there you have to do something like
array index: mov r2,r1 ... ldr r0,[r2] ... add r2,r2,some number
Or depending on the compiler it can get really bad, esp if you compile for debugging or without optimizations, and assuming you dont have a three register add
array index: mov r2,#0 ... mov r3,r1 add r3,r2 ldr r4,[r3] ... add r2,some number
So it is quite possible that the two approaches are equal. As seen on the ARM, it can combine the two (within limits for the immediate) pointer instructions into one, making that a little faster. The array index solution burns more registers, and depending on the number of available registers for the architecture that pushes you toward having to swap registers out to the stack sooner and more often (than you would with pointers), slowing you down even more. If you dont mind destroying the base address, the bottom line is the pointer solution might give you an advantage from a performance perspective. It has a lot to do with your code and the compiler. For me it readability comes into play and I feel arrays are easier to read and follow, and second do I need to preserve that pointer to free a malloc or to go through that memory again, etc. If so I will probably use an array with an index, if it is a one time pass and I dont care about destroying the base address I will use a pointer. As you saw above with the compiler generated code, if performance is critical, then hand code the solution in assembler anyway (based on suggested approaches by letting the compilers try it first).
It's completely system-dependent which one is faster, but the two are functionally equivalent to one another and I'd be really surprised if one actually was faster. That is, the code
myArr[index]
Is completely equivalent to
*(&myArr[0] + index)
Similarly, writing
*ptr
Is equivalent to writing
ptr[0]
Most compilers are smart enough to figure this out, so I'd be amazed if one was faster than another.
More importantly, though, you probably shouldn't be too worried about this. Worry about optimizations after you have everything else working. If you find that array accesses really are killing you, then consider finding a faster alternative. Otherwise, don't worry about it; it's infinitely more valuable to have clean, readable, maintainable code than it is to have optimized code unless you have a pressing need for optimization.
templatetypedef has summed it up. To add some support to his response. Take these example functions:
unsigned int fun1 ( unsigned int *x ) { unsigned int ra,rb; rb=0; for(ra=0;ra<1000;ra++) rb+=*x++; return(rb); } unsigned int fun2 ( unsigned int *x ) { unsigned int ra,rb; rb=0; for(ra=0;ra<1000;ra++) rb+=x[ra]; return(rb); }
Now gcc produced this:
00000000 fun1: 0: e52d4004 push {r4} ; (str r4, [sp, #-4]!) 4: e1a03000 mov r3, r0 8: e2804efa add r4, r0, #4000 ; 0xfa0 c: e3a00000 mov r0, #0 10: e1a02003 mov r2, r3 14: e492c004 ldr ip, [r2], #4 18: e5931004 ldr r1, [r3, #4] 1c: e2823004 add r3, r2, #4 20: e080000c add r0, r0, ip 24: e1530004 cmp r3, r4 28: e0800001 add r0, r0, r1 2c: 1afffff7 bne 10 30: e49d4004 pop {r4} ; (ldr r4, [sp], #4) 34: e12fff1e bx lr 00000038 fun2: 38: e3a03000 mov r3, #0 3c: e1a02003 mov r2, r3 40: e790c003 ldr ip, [r0, r3] 44: e2833004 add r3, r3, #4 48: e7901003 ldr r1, [r0, r3] 4c: e2833004 add r3, r3, #4 50: e082200c add r2, r2, ip 54: e3530efa cmp r3, #4000 ; 0xfa0 58: e0822001 add r2, r2, r1 5c: 1afffff7 bne 40 60: e1a00002 mov r0, r2 64: e12fff1e bx lr
The code is different, but I am surprised at the missed opportunities for optimization.
Clang/llvm produced this:
00000000 fun1: 0: e3a01000 mov r1, #0 4: e3a02ffa mov r2, #1000 ; 0x3e8 8: e1a03001 mov r3, r1 c: e2522001 subs r2, r2, #1 10: e490c004 ldr ip, [r0], #4 14: e08c3003 add r3, ip, r3 18: e2c11000 sbc r1, r1, #0 1c: e182c001 orr ip, r2, r1 20: e35c0000 cmp ip, #0 24: 1afffff8 bne c 28: e1a00003 mov r0, r3 2c: e12fff1e bx lr 00000030 fun2: 30: e3a01000 mov r1, #0 34: e3a02ffa mov r2, #1000 ; 0x3e8 38: e1a03001 mov r3, r1 3c: e2522001 subs r2, r2, #1 40: e490c004 ldr ip, [r0], #4 44: e08c3003 add r3, ip, r3 48: e2c11000 sbc r1, r1, #0 4c: e182c001 orr ip, r2, r1 50: e35c0000 cmp ip, #0 54: 1afffff8 bne 3c 58: e1a00003 mov r0, r3 5c: e12fff1e bx lr
You might notice that the compiler produced the exact same code, pointer or offset. And by changing compilers I was better off than changing pointer vs array indexing. I think llvm could have done a little better, I will need study this some more to understand what my code did to cause this.
EDIT:
I was hoping to get the compiler to at a minimum use the ldr rd,[rs],#4 instruction which favors pointers, and hoped the compiler would see that it could destroy the array address thus treating it like a pointer rather than an offset into an array (and use the above instruction, which is basically what clang/llvm did). Or if it did the array thing that it would use the ldr rd,[rm,rn] instruction. Basically was hoping one of the compilers would generate one of these solutions:
funa: mov r1,#0 mov r2,#1000 funa_loop: ldr r3,[r0],#4 add r1,r1,r3 subs r2,r2,#1 bne funa_loop mov r0,r1 bx lr funb: mov r1,#0 mov r2,#0 funb_loop: ldr r3,[r0,r2] add r1,r1,r3 add r2,r2,#4 cmp r2,#0x4000 bne funb_loop mov r0,r1 bx lr func: mov r1,#0 mov r2,#4000 subs r2,r2,#4 func_loop: beq func_done ldr r3,[r0,r2] add r1,r1,r3 subs r2,r2,#4 b func_loop func_done: mov r0,r1 bx lr
Didnt quite get there but got pretty close. This was a fun exercise. Note the above is all ARM assembler.
In general, (not my specific C code example and not necessarily an ARM), a number of the popular architectures you will have a load from a register based address (ldr r0,[r1]) and a load with a register index/offset (ldr r0,[r1,r2]) where the address is the sum of the two registers. one register ideally is the base address of the array and the second the index/offset. The former load from register lends itself to pointers, the latter to arrays. if your C program is NOT going to change or move the pointer or index, then in both cases that means a static address which is computed then a normal load is used, both array and pointer should produce the same instructions. For the more interesting case of changing the pointer/index.
Pointer ldr r0,[r1] ... add r1,r1,some number Array index ldr r0,[r1,r2] ... add r2,r2,some number
(replace the load with a store and the add with a sub as needed)
Some architectures do not have a three register register index instruction so there you have to do something like
array index: mov r2,r1 ... ldr r0,[r2] ... add r2,r2,some number
Or depending on the compiler it can get really bad, esp if you compile for debugging or without optimizations, and assuming you dont have a three register add
array index: mov r2,#0 ... mov r3,r1 add r3,r2 ldr r4,[r3] ... add r2,some number
So it is quite possible that the two approaches are equal. As seen on the ARM, it can combine the two (within limits for the immediate) pointer instructions into one, making that a little faster. The array index solution burns more registers, and depending on the number of available registers for the architecture that pushes you toward having to swap registers out to the stack sooner and more often (than you would with pointers), slowing you down even more. If you dont mind destroying the base address, the bottom line is the pointer solution might give you an advantage from a performance perspective. It has a lot to do with your code and the compiler. For me it readability comes into play and I feel arrays are easier to read and follow, and second do I need to preserve that pointer to free a malloc or to go through that memory again, etc. If so I will probably use an array with an index, if it is a one time pass and I dont care about destroying the base address I will use a pointer. As you saw above with the compiler generated code, if performance is critical, then hand code the solution in assembler anyway (based on suggested approaches by letting the compilers try it first).
Simple index operations compile to the same machine code on every compiler I've ever touched. By index is usually recommended for readability.
More complex cases that involve different logic for pointer access vs array indexing need to be examined on a case-by-case basis. If you are in doubt, profile your code - as always.
There's no meaningful answer to your question. Language-level operations have no specific "speed" associated to them. By themselves, they can't be "faster" or "slower".
Only CPU instructions can be faster or slower and only CPU instructions can consume CPU cycles. In order to somehow carry over this concept of "speed" from CPU instructions back to language-level operations [these CPU instructions were generated from] in general case you'll need to know the context. This is so because the same language-level operation can generate totally different CPU instructions in different contexts (not even mentioning that it might also depend on the compiler settings etc.)
In other words, post the actual code. As an abstract context-less question it simply makes no sense.
At the lowest level, these operations mostly tend to compile to the same thing. If you're really interested, you should get your C compiler to generate assembly output (such as with gcc -S
) so you can check, especially since it depends, at a bare minimum, on:
- your target platform.
- your compiler.
- your optimisation level.
You'll find that, even if there was a difference (which is doubtful), that level of micro-optimisation is mostly not worth the effort you put into it. You're better off doing macro-optimisations such as improved algorithms since that's the sort of thing that offers more return on investment.
In these sorts of situations, where the effect is likely to be minimal, I always optimise for readability.
Explicitly eliminating common subexpressions might work for you. There may be a difference if you are using x86 or RISC architecture and optimizer quality.
When I write a routine which has to run through an array or indexed structure I compute a pointer to the base of the array/structure member and use that to address. The basic case
struct SOMETHING list[100];
int find_something (...)
{
int i;
i=0;
while (i<(sizeof(list)/sizeof(struct SOMETHING)))
{
if (list[i].active && list[i].last_access+60<current_time) return i;
++i;
}
return -1;
}
can be refined to (i e helping the compiler to produce better code):
int find_something (...)
{
int i;
struct SOMETHING *pList;
i=0;
while (i<(sizeof(list)/sizeof(struct SOMETHING)))
{
pList=&list[i];
if (pList->active && pList->last_access+60<current_time) return i;
++i;
}
return -1;
}
This is just to illustrate and the simplicity of the code would probably generate the pointer implicitly but if the routine is more complex that might not be the case. Using "list[i]." as in the first example you'd run (on the x86) the risk (RISC haha) of the compiler not having enough registers to generate and store the address once, instead generating it for every single reference. For the x86-case a local variable is necessary to store the pointer and few compilers will create stack variables unless explicitly directed to. On RISC the compiler has lots of registers at its disposal and will usually decide that it is worthwhile to create (and keep) the pointer once for every iteration.
The loop can be refined further:
pList=list;
i=0;
while (i<(sizeof(list)/sizeof(struct SOMETHING)))
{
if (pList->active && pList->last_access+60<current_time) return i;
pList+=1;
++i;
}
This construction is devoid of any address calculation overhead. "pList+=1" (others might prefer "++pList") causes a constant value (equal to the size of an individual row/member) to be added to pList.
And further:
pList=list;
pEndList=&list[sizeof(list)/sizeof(struct SOMETHING)];
while (pList!=pEndList)
{
if (pList->active && pList->last_access+60<current_time) return pList-list;
pList+=1;
}
Which eliminates the index increment and replaces it with one multiplication outside and one division inside the loop (executed just once, in the return construct).
Now before all you don't-optimizers out there start screaming bloody murder my point is that what constructs are acceptable is determined by the size and complexity of the function in which they reside. I would probably not consider this construct in a 300-line function that is complex enough to begin with but in a situation such as the above? If the searches are a significant part of overall processing? If the speed-ups are large enough?
So why not? Pros and cons. It's always pros and cons. Making the best of them. Absolutes? Rarely (if ever).
When accessing an array through an index, you are actually performing two operations: an addition (adding the index to the base array address), then a memory access (actually reading or writing what is at the resulting address). I suppose that when you are talking about "accessing by pointer" then you mean that you already have the pointer to the target element. So, logically, using the pointer saves the "addition" part, and thus should be faster, or at least no slower.
However...
As a rough approximation, in a modern computer, the memory access is much more expensive than an addition (especially if it falls out of the caches), so the difference, if any, will be slight. On some architectures (e.g. x86 or PowerPC), the addition and memory access can be combined into a single opcode. Things will also be different, depending on whether the array address is a compile-time constant (i.e. the array is not constant data, but is declared as a global variable, vs a block obtained with malloc()
). Using an array may help the compiler find better code, with regards to a generic pointer (in particular when the restrict
keyword is used). The context has a huge influence (e.g. how many free registers there are at that point ?).
So:
- There is no absolute answer to your question. You have to try and make measures.
- If there is a detectable difference (chances are that there will be none), it is hard to predict in which direction, and it depends on a huge set of external factors, including the specific compiler version and optimization flags, processor architecture and model, memory layout and so on.
- You will not be able to make any reliable optimization gain without having some rather in-depth knowledge of assembly, and a bit of theory of compilation.
- You should concentrate on making a correct code first, and then only worry about optimization; and there is no performance issue until it has been duly measured in realistic conditions.
Same. It's all O(1), and clock time is negligible. You're basically accessing the memory address.
a[i]
to a loop that increased a pointer by 1 i times? That would make it O(n). If you want to be pedantic... –
Housebroken © 2022 - 2024 — McMap. All rights reserved.