In C, accessing my array index is faster or accessing by pointer is faster?

A

8

17

In C, accessing an array index is faster or accessing by pointer is faster? By faster I mean, which one would take less clock cycle. The array is not an constant array.

Aconite answered 8/2, 2011 at 23:45 Comment(1)

If you have a pointer to the exact location of an element, then pointers might be faster, because indexing has to calculate the address of the element based on the base address. – Catamnesis 8/2, 2011 at 23:52

N

16

templatetypedef has summed it up. To add some support to his response. Take these example functions:

unsigned int fun1 ( unsigned int *x )
{
    unsigned int ra,rb;

    rb=0;
    for(ra=0;ra<1000;ra++) rb+=*x++;
    return(rb);
}

unsigned int fun2 ( unsigned int *x )
{
    unsigned int ra,rb;
    rb=0;
    for(ra=0;ra<1000;ra++) rb+=x[ra];
    return(rb);
}

Now gcc produced this:

00000000 fun1:
   0:   e52d4004    push    {r4}        ; (str r4, [sp, #-4]!)
   4:   e1a03000    mov r3, r0
   8:   e2804efa    add r4, r0, #4000   ; 0xfa0
   c:   e3a00000    mov r0, #0
  10:   e1a02003    mov r2, r3
  14:   e492c004    ldr ip, [r2], #4
  18:   e5931004    ldr r1, [r3, #4]
  1c:   e2823004    add r3, r2, #4
  20:   e080000c    add r0, r0, ip
  24:   e1530004    cmp r3, r4
  28:   e0800001    add r0, r0, r1
  2c:   1afffff7    bne 10 
  30:   e49d4004    pop {r4}        ; (ldr r4, [sp], #4)
  34:   e12fff1e    bx  lr

00000038 fun2:
  38:   e3a03000    mov r3, #0
  3c:   e1a02003    mov r2, r3
  40:   e790c003    ldr ip, [r0, r3]
  44:   e2833004    add r3, r3, #4
  48:   e7901003    ldr r1, [r0, r3]
  4c:   e2833004    add r3, r3, #4
  50:   e082200c    add r2, r2, ip
  54:   e3530efa    cmp r3, #4000   ; 0xfa0
  58:   e0822001    add r2, r2, r1
  5c:   1afffff7    bne 40 
  60:   e1a00002    mov r0, r2
  64:   e12fff1e    bx  lr

The code is different, but I am surprised at the missed opportunities for optimization.

Clang/llvm produced this:


00000000 fun1:
   0:   e3a01000    mov r1, #0
   4:   e3a02ffa    mov r2, #1000   ; 0x3e8
   8:   e1a03001    mov r3, r1
   c:   e2522001    subs    r2, r2, #1
  10:   e490c004    ldr ip, [r0], #4
  14:   e08c3003    add r3, ip, r3
  18:   e2c11000    sbc r1, r1, #0
  1c:   e182c001    orr ip, r2, r1
  20:   e35c0000    cmp ip, #0
  24:   1afffff8    bne c 
  28:   e1a00003    mov r0, r3
  2c:   e12fff1e    bx  lr

00000030 fun2:
  30:   e3a01000    mov r1, #0
  34:   e3a02ffa    mov r2, #1000   ; 0x3e8
  38:   e1a03001    mov r3, r1
  3c:   e2522001    subs    r2, r2, #1
  40:   e490c004    ldr ip, [r0], #4
  44:   e08c3003    add r3, ip, r3
  48:   e2c11000    sbc r1, r1, #0
  4c:   e182c001    orr ip, r2, r1
  50:   e35c0000    cmp ip, #0
  54:   1afffff8    bne 3c
  58:   e1a00003    mov r0, r3
  5c:   e12fff1e    bx  lr

You might notice that the compiler produced the exact same code, pointer or offset. And by changing compilers I was better off than changing pointer vs array indexing. I think llvm could have done a little better, I will need study this some more to understand what my code did to cause this.

EDIT:

I was hoping to get the compiler to at a minimum use the ldr rd,[rs],#4 instruction which favors pointers, and hoped the compiler would see that it could destroy the array address thus treating it like a pointer rather than an offset into an array (and use the above instruction, which is basically what clang/llvm did). Or if it did the array thing that it would use the ldr rd,[rm,rn] instruction. Basically was hoping one of the compilers would generate one of these solutions:


funa:
    mov r1,#0
    mov r2,#1000
funa_loop:
    ldr r3,[r0],#4
    add r1,r1,r3
    subs r2,r2,#1
    bne funa_loop
    mov r0,r1
    bx lr

funb:
    mov r1,#0
    mov r2,#0
funb_loop:
    ldr r3,[r0,r2]
    add r1,r1,r3
    add r2,r2,#4
    cmp r2,#0x4000
    bne funb_loop
    mov r0,r1
    bx lr

func:
    mov r1,#0
    mov r2,#4000
    subs r2,r2,#4
func_loop:
    beq func_done
    ldr r3,[r0,r2]
    add r1,r1,r3
    subs r2,r2,#4
    b func_loop
func_done:
    mov r0,r1
    bx lr

Didnt quite get there but got pretty close. This was a fun exercise. Note the above is all ARM assembler.

In general, (not my specific C code example and not necessarily an ARM), a number of the popular architectures you will have a load from a register based address (ldr r0,[r1]) and a load with a register index/offset (ldr r0,[r1,r2]) where the address is the sum of the two registers. one register ideally is the base address of the array and the second the index/offset. The former load from register lends itself to pointers, the latter to arrays. if your C program is NOT going to change or move the pointer or index, then in both cases that means a static address which is computed then a normal load is used, both array and pointer should produce the same instructions. For the more interesting case of changing the pointer/index.

Pointer

ldr r0,[r1]
...
add r1,r1,some number

Array index

ldr r0,[r1,r2]
...
add r2,r2,some number

(replace the load with a store and the add with a sub as needed)

Some architectures do not have a three register register index instruction so there you have to do something like

array index:
mov r2,r1
...
ldr r0,[r2]
...
add r2,r2,some number

Or depending on the compiler it can get really bad, esp if you compile for debugging or without optimizations, and assuming you dont have a three register add

array index:
mov r2,#0
...
mov r3,r1
add r3,r2
ldr r4,[r3]
...
add r2,some number

So it is quite possible that the two approaches are equal. As seen on the ARM, it can combine the two (within limits for the immediate) pointer instructions into one, making that a little faster. The array index solution burns more registers, and depending on the number of available registers for the architecture that pushes you toward having to swap registers out to the stack sooner and more often (than you would with pointers), slowing you down even more. If you dont mind destroying the base address, the bottom line is the pointer solution might give you an advantage from a performance perspective. It has a lot to do with your code and the compiler. For me it readability comes into play and I feel arrays are easier to read and follow, and second do I need to preserve that pointer to free a malloc or to go through that memory again, etc. If so I will probably use an array with an index, if it is a one time pass and I dont care about destroying the base address I will use a pointer. As you saw above with the compiler generated code, if performance is critical, then hand code the solution in assembler anyway (based on suggested approaches by letting the compilers try it first).

Nuzzi answered 9/2, 2011 at 15:1 Comment(6)

A very good answer where different approaches were tested against each other. A- for an ambitious approach but you did not do it for arrays as the question specified. I'm also missing a comparison of relative execution times. I'll give you a 1 though! – Absolution 9/2, 2011 at 22:50

clock cycles is as loaded a question as anything else. the instructions with memory cycles are probably going to cost more, so if one loop has 8 instructions and another 7 but the one with 8 also has an extra memory cycle or two that may really change the results. Cache differences, etc it is a mess. Even within a single processor family, or even the same processor core, the speeds are going to vary widely based on factors outside the chip. – Nuzzi 10/2, 2011 at 2:15

read Michael Abrash, Zen of assembly language or others. Even if you think the code looks faster you need to time it and test it. A generalized faster or slower question with out a mountain of details as to what it is running on, a generalized answer that this will often cost you more this is often easier on the compiler to implement, that is about the best you can do. And unless you have a detailed reason and usually closed platform, you dont want more than a generalized performance solution. – Nuzzi 10/2, 2011 at 2:17

if you look at templatetypedef's (excellent) answer, I did respond to that and the original question, arrays vs pointers. *ptr vs ptr[0] is not worth coding we know the answer, ptr[index] vs *(&ptr + index) or ptr[index++] vs *ptr++ are the use cases where a performance question that can be seen in the compiler output is interesting. I didnt cover the *(&ptr +index) vs ptr[index] because like *ptr vs ptr[0] they are the same. so if you have another use case that is not covered by the question or templatetypedef I could talk to that. – Nuzzi 10/2, 2011 at 2:22

As I've written in a comment here: in the same way that it is possible to write poor C code and get poorly performing machine code it is also possible to write good C code and get fast and efficient machine code. Many "don't-optimizers" (or whatever they should be called) seem to think that it doesn't matter what source code is written as the compiler will produce the same machine code anyway. This is pure rubbish. As to hand-writing assembly code I think that in general it is best left to the compiler. However, I can still help the compiler to do a better job. – Absolution 10/2, 2011 at 11:52

Thank you, I understand a lot now! – Aconite 25/2, 2011 at 2:0

O

19

It's completely system-dependent which one is faster, but the two are functionally equivalent to one another and I'd be really surprised if one actually was faster. That is, the code

myArr[index]

Is completely equivalent to

*(&myArr[0] + index)

Similarly, writing

*ptr

Is equivalent to writing

ptr[0]

Most compilers are smart enough to figure this out, so I'd be amazed if one was faster than another.

More importantly, though, you probably shouldn't be too worried about this. Worry about optimizations after you have everything else working. If you find that array accesses really are killing you, then consider finding a faster alternative. Otherwise, don't worry about it; it's infinitely more valuable to have clean, readable, maintainable code than it is to have optimized code unless you have a pressing need for optimization.

Oho answered 8/2, 2011 at 23:49 Comment(2)

"It's infinitely more valuable to have clean, readable, maintainable code than it is to have optimized code unless you have a pressing need for optimization" and then optimization is - what? A very sweeping statement which is more fitting to someone who has believed everything his profs have said at university than someone who has many years of experience. Everything is a balance of pros and cons and the sooner all our new colleagues learn this the sooner our clients will benefit from it. – Absolution 9/2, 2011 at 22:18

"premature optimization is the root of all evil" – Sworn 25/12, 2015 at 5:46

N

16

templatetypedef has summed it up. To add some support to his response. Take these example functions:

unsigned int fun1 ( unsigned int *x )
{
    unsigned int ra,rb;

    rb=0;
    for(ra=0;ra<1000;ra++) rb+=*x++;
    return(rb);
}

unsigned int fun2 ( unsigned int *x )
{
    unsigned int ra,rb;
    rb=0;
    for(ra=0;ra<1000;ra++) rb+=x[ra];
    return(rb);
}

Now gcc produced this:

00000000 fun1:
   0:   e52d4004    push    {r4}        ; (str r4, [sp, #-4]!)
   4:   e1a03000    mov r3, r0
   8:   e2804efa    add r4, r0, #4000   ; 0xfa0
   c:   e3a00000    mov r0, #0
  10:   e1a02003    mov r2, r3
  14:   e492c004    ldr ip, [r2], #4
  18:   e5931004    ldr r1, [r3, #4]
  1c:   e2823004    add r3, r2, #4
  20:   e080000c    add r0, r0, ip
  24:   e1530004    cmp r3, r4
  28:   e0800001    add r0, r0, r1
  2c:   1afffff7    bne 10 
  30:   e49d4004    pop {r4}        ; (ldr r4, [sp], #4)
  34:   e12fff1e    bx  lr

00000038 fun2:
  38:   e3a03000    mov r3, #0
  3c:   e1a02003    mov r2, r3
  40:   e790c003    ldr ip, [r0, r3]
  44:   e2833004    add r3, r3, #4
  48:   e7901003    ldr r1, [r0, r3]
  4c:   e2833004    add r3, r3, #4
  50:   e082200c    add r2, r2, ip
  54:   e3530efa    cmp r3, #4000   ; 0xfa0
  58:   e0822001    add r2, r2, r1
  5c:   1afffff7    bne 40 
  60:   e1a00002    mov r0, r2
  64:   e12fff1e    bx  lr

The code is different, but I am surprised at the missed opportunities for optimization.

Clang/llvm produced this:


00000000 fun1:
   0:   e3a01000    mov r1, #0
   4:   e3a02ffa    mov r2, #1000   ; 0x3e8
   8:   e1a03001    mov r3, r1
   c:   e2522001    subs    r2, r2, #1
  10:   e490c004    ldr ip, [r0], #4
  14:   e08c3003    add r3, ip, r3
  18:   e2c11000    sbc r1, r1, #0
  1c:   e182c001    orr ip, r2, r1
  20:   e35c0000    cmp ip, #0
  24:   1afffff8    bne c 
  28:   e1a00003    mov r0, r3
  2c:   e12fff1e    bx  lr

00000030 fun2:
  30:   e3a01000    mov r1, #0
  34:   e3a02ffa    mov r2, #1000   ; 0x3e8
  38:   e1a03001    mov r3, r1
  3c:   e2522001    subs    r2, r2, #1
  40:   e490c004    ldr ip, [r0], #4
  44:   e08c3003    add r3, ip, r3
  48:   e2c11000    sbc r1, r1, #0
  4c:   e182c001    orr ip, r2, r1
  50:   e35c0000    cmp ip, #0
  54:   1afffff8    bne 3c
  58:   e1a00003    mov r0, r3
  5c:   e12fff1e    bx  lr

You might notice that the compiler produced the exact same code, pointer or offset. And by changing compilers I was better off than changing pointer vs array indexing. I think llvm could have done a little better, I will need study this some more to understand what my code did to cause this.

EDIT:

I was hoping to get the compiler to at a minimum use the ldr rd,[rs],#4 instruction which favors pointers, and hoped the compiler would see that it could destroy the array address thus treating it like a pointer rather than an offset into an array (and use the above instruction, which is basically what clang/llvm did). Or if it did the array thing that it would use the ldr rd,[rm,rn] instruction. Basically was hoping one of the compilers would generate one of these solutions:


funa:
    mov r1,#0
    mov r2,#1000
funa_loop:
    ldr r3,[r0],#4
    add r1,r1,r3
    subs r2,r2,#1
    bne funa_loop
    mov r0,r1
    bx lr

funb:
    mov r1,#0
    mov r2,#0
funb_loop:
    ldr r3,[r0,r2]
    add r1,r1,r3
    add r2,r2,#4
    cmp r2,#0x4000
    bne funb_loop
    mov r0,r1
    bx lr

func:
    mov r1,#0
    mov r2,#4000
    subs r2,r2,#4
func_loop:
    beq func_done
    ldr r3,[r0,r2]
    add r1,r1,r3
    subs r2,r2,#4
    b func_loop
func_done:
    mov r0,r1
    bx lr

Didnt quite get there but got pretty close. This was a fun exercise. Note the above is all ARM assembler.

In general, (not my specific C code example and not necessarily an ARM), a number of the popular architectures you will have a load from a register based address (ldr r0,[r1]) and a load with a register index/offset (ldr r0,[r1,r2]) where the address is the sum of the two registers. one register ideally is the base address of the array and the second the index/offset. The former load from register lends itself to pointers, the latter to arrays. if your C program is NOT going to change or move the pointer or index, then in both cases that means a static address which is computed then a normal load is used, both array and pointer should produce the same instructions. For the more interesting case of changing the pointer/index.

Pointer

ldr r0,[r1]
...
add r1,r1,some number

Array index

ldr r0,[r1,r2]
...
add r2,r2,some number

(replace the load with a store and the add with a sub as needed)

Some architectures do not have a three register register index instruction so there you have to do something like

array index:
mov r2,r1
...
ldr r0,[r2]
...
add r2,r2,some number

Or depending on the compiler it can get really bad, esp if you compile for debugging or without optimizations, and assuming you dont have a three register add

array index:
mov r2,#0
...
mov r3,r1
add r3,r2
ldr r4,[r3]
...
add r2,some number

So it is quite possible that the two approaches are equal. As seen on the ARM, it can combine the two (within limits for the immediate) pointer instructions into one, making that a little faster. The array index solution burns more registers, and depending on the number of available registers for the architecture that pushes you toward having to swap registers out to the stack sooner and more often (than you would with pointers), slowing you down even more. If you dont mind destroying the base address, the bottom line is the pointer solution might give you an advantage from a performance perspective. It has a lot to do with your code and the compiler. For me it readability comes into play and I feel arrays are easier to read and follow, and second do I need to preserve that pointer to free a malloc or to go through that memory again, etc. If so I will probably use an array with an index, if it is a one time pass and I dont care about destroying the base address I will use a pointer. As you saw above with the compiler generated code, if performance is critical, then hand code the solution in assembler anyway (based on suggested approaches by letting the compilers try it first).

Nuzzi answered 9/2, 2011 at 15:1 Comment(6)

A very good answer where different approaches were tested against each other. A- for an ambitious approach but you did not do it for arrays as the question specified. I'm also missing a comparison of relative execution times. I'll give you a 1 though! – Absolution 9/2, 2011 at 22:50

clock cycles is as loaded a question as anything else. the instructions with memory cycles are probably going to cost more, so if one loop has 8 instructions and another 7 but the one with 8 also has an extra memory cycle or two that may really change the results. Cache differences, etc it is a mess. Even within a single processor family, or even the same processor core, the speeds are going to vary widely based on factors outside the chip. – Nuzzi 10/2, 2011 at 2:15

read Michael Abrash, Zen of assembly language or others. Even if you think the code looks faster you need to time it and test it. A generalized faster or slower question with out a mountain of details as to what it is running on, a generalized answer that this will often cost you more this is often easier on the compiler to implement, that is about the best you can do. And unless you have a detailed reason and usually closed platform, you dont want more than a generalized performance solution. – Nuzzi 10/2, 2011 at 2:17

if you look at templatetypedef's (excellent) answer, I did respond to that and the original question, arrays vs pointers. *ptr vs ptr[0] is not worth coding we know the answer, ptr[index] vs *(&ptr + index) or ptr[index++] vs *ptr++ are the use cases where a performance question that can be seen in the compiler output is interesting. I didnt cover the *(&ptr +index) vs ptr[index] because like *ptr vs ptr[0] they are the same. so if you have another use case that is not covered by the question or templatetypedef I could talk to that. – Nuzzi 10/2, 2011 at 2:22

As I've written in a comment here: in the same way that it is possible to write poor C code and get poorly performing machine code it is also possible to write good C code and get fast and efficient machine code. Many "don't-optimizers" (or whatever they should be called) seem to think that it doesn't matter what source code is written as the compiler will produce the same machine code anyway. This is pure rubbish. As to hand-writing assembly code I think that in general it is best left to the compiler. However, I can still help the compiler to do a better job. – Absolution 10/2, 2011 at 11:52

Thank you, I understand a lot now! – Aconite 25/2, 2011 at 2:0

H

6

Simple index operations compile to the same machine code on every compiler I've ever touched. By index is usually recommended for readability.

More complex cases that involve different logic for pointer access vs array indexing need to be examined on a case-by-case basis. If you are in doubt, profile your code - as always.

Housebroken answered 8/2, 2011 at 23:48 Comment(0)

P

4

There's no meaningful answer to your question. Language-level operations have no specific "speed" associated to them. By themselves, they can't be "faster" or "slower".

Only CPU instructions can be faster or slower and only CPU instructions can consume CPU cycles. In order to somehow carry over this concept of "speed" from CPU instructions back to language-level operations [these CPU instructions were generated from] in general case you'll need to know the context. This is so because the same language-level operation can generate totally different CPU instructions in different contexts (not even mentioning that it might also depend on the compiler settings etc.)

In other words, post the actual code. As an abstract context-less question it simply makes no sense.

Poirier answered 8/2, 2011 at 23:54 Comment(6)

This answer is so uninformed that it shames our profession. Harsh words but I stand behind them 100%. The compiler is not a random instruction generator as you appear to be implying but often an excellent quality tool that is imperative to our work. However, if you feed it bloated garbage code source code it will faithfully produce bloated garbage machine code that performs poorly. Judging from your answer, this appears to be your general experience. If you feed it quality source code it will faithfully produce quality machine code that is fast. Perhaps you were unaware of this relationship? – Absolution 9/2, 2011 at 22:33

@Olof Forshell: Are you trolling or what? What you posted in your comment is some kind of irrelevant nonsense (talk about shame). All I said that the actual machine code will drastically depend on the context. Like when accessing all array elements sequentially in a cycle, both index access and "sliding-pointer" access will often produce the same code in many modern compilers. When accessing individual array elements outside of any mass-access context, the code might easily turn out to be different for pointer and index access. – Poirier 9/2, 2011 at 22:43

Where and how did the "bloated garbage source code" came into the picture and how it could be possibly relevant to the question and/or to my answer is totally unclear to me. I suspect that someone's forgetting to drink their glass of juice in the morning might be the root of the problem here. – Poirier 9/2, 2011 at 22:47

You said that "language-level operations have no specific "speed"" and I say yes they do. A poorly constructed piece of C code will cause the compiler to generate poor and slow machine code. Ergo the opposite must be true as well. As to the "same language-level operation can generate totally different CPU-instructions in different contexts" I say that generally speaking they will produce more or less the same code however the context may add instructions not apart from the addressing itself. The optimization level however, can produce very different sequences, but that is not a "context." – Absolution 9/2, 2011 at 23:13

Let me clarify if I was unclear: language-level operations have everything to do with execution speed in that the compiler uses them to produce the executable code. I know of no commercial or free compiler that produces good machine code from poor source code. That does not mean that no such compiler exists, perhaps you've come across them? A speed optimization level will generally produce faster code than that for debug (low) optimization. There are very definitely meaningful answers to the question. – Absolution 9/2, 2011 at 23:21

Oops I meant "the context may add instructions that are not part of the addressing itself" – Absolution 9/2, 2011 at 23:29

R

1

At the lowest level, these operations mostly tend to compile to the same thing. If you're really interested, you should get your C compiler to generate assembly output (such as with gcc -S) so you can check, especially since it depends, at a bare minimum, on:

your target platform.
your compiler.
your optimisation level.

You'll find that, even if there was a difference (which is doubtful), that level of micro-optimisation is mostly not worth the effort you put into it. You're better off doing macro-optimisations such as improved algorithms since that's the sort of thing that offers more return on investment.

In these sorts of situations, where the effect is likely to be minimal, I always optimise for readability.

Rosettarosette answered 8/2, 2011 at 23:49 Comment(8)

"You'll find that, even if there was a difference (which is doubtful), that level of micro-optimisation is not worth the effort you put into it" is a very absolute statement and I'm sure that it will fall like a deck of cards if challenged. Should you really be offering your (hardly informed) opinion here? I say that the correct answer is that "in some - though not all - cases it is well worth the effort and in the process you will have learned something useful for future projects." – Absolution 9/2, 2011 at 22:42

@Olot, you're right, I normally despise absolutes, so I changed it. However, my opinion is anything but uninformed. Short of a brain-dead compiler, pointers and array indexes will have exactly the same underlying assembly and the benefits of macro optimisations vastly outweigh a couple of clock cycles. That's from long experience. – Rosettarosette 9/2, 2011 at 22:57

"exactly" ... hmm. A not entirely hypothetical situation: assume your function is supposed to return the index of the first structure member where active is FALSE. The structure has one million members and is randomly loaded to 2/3. This is a fairly simple algorithm where a local optimization inside the loop can mean a halved or better average search time. Should this optimization not be taken because macro-optimisations should be performed but there are none because the algorithm is too simple? – Absolution 10/2, 2011 at 0:28

@Olot: That depends on the optimisation (you haven't stated what it is so it's a little hard to evaluate). If it's something simple like common subexpression elimination or constant folding, that's best left to the compiler (unless, as I stated, you have a brain-dead compiler). If your optimisation is changing the algorithm based on the fact it's 2/3 loaded, or maintaining the first FALSE index separately to the list to be changed only on certain updates (falsifying one before that point or truifying that particular one), that is a macro optimisation. – Rosettarosette 10/2, 2011 at 2:30

Common subexpression elimination is relatively often performed by RISC compilers that have more "free" registers to allocate. For x86-32 (don't know about -64) it is relatively less often performed since optimizations must be juggled with a quarter as many registers as RISC (8 vs 32). To say that the compiler is "brain-dead" seems to me to be somewhat of an over-simplification: it does the best it can with what it has to work with I guess. My answer further down on the page illustrates an optimization using explict common subexpression elimination and how it can be evolved further. – Absolution 10/2, 2011 at 11:42

The x86-32 general purpose registers are eax, ebc, ecx, edx, esi, edi, ebp and esp. My OpenWatcom compiler uses them all except esp. On RISC two or more may have special purposes such as stack pointer, often-used constants etc. – Absolution 10/2, 2011 at 20:39

Not sure where the register requirement comes into it there. CSE doesn't require registers, it's just the act of memorising a calculation so that you only do it once, not every time in a loop. While register storage of the result would be even more optimal, it's not required. Memory storage of CSE is still a big performance boost. – Rosettarosette 10/2, 2011 at 23:1

It's not the calculation that needs to be memorized, it's the result. To calculate a result you need registers. If it's used often enough the result will be stored in a register on RISC processors. The same applies for x86 if the code isn't too big. Otherwise it will re-calculated. An x86 compiler will not allocate a stack location to store it unless explicitly instructed to do that by the programmer. That's why explicit CSE may be necessary for x86 programs. – Absolution 11/2, 2011 at 11:29

A

1

Explicitly eliminating common subexpressions might work for you. There may be a difference if you are using x86 or RISC architecture and optimizer quality.

When I write a routine which has to run through an array or indexed structure I compute a pointer to the base of the array/structure member and use that to address. The basic case

struct SOMETHING list[100];

int find_something (...)
{
  int i;

  i=0;
  while (i<(sizeof(list)/sizeof(struct SOMETHING)))
  {
    if (list[i].active && list[i].last_access+60<current_time) return i;

    ++i;
  }
  return -1;
}

can be refined to (i e helping the compiler to produce better code):

int find_something (...)
{
  int i;
  struct SOMETHING *pList;

  i=0;
  while (i<(sizeof(list)/sizeof(struct SOMETHING)))
  {
    pList=&list[i];
    if (pList->active && pList->last_access+60<current_time) return i;

    ++i;
  }
  return -1;
}

This is just to illustrate and the simplicity of the code would probably generate the pointer implicitly but if the routine is more complex that might not be the case. Using "list[i]." as in the first example you'd run (on the x86) the risk (RISC haha) of the compiler not having enough registers to generate and store the address once, instead generating it for every single reference. For the x86-case a local variable is necessary to store the pointer and few compilers will create stack variables unless explicitly directed to. On RISC the compiler has lots of registers at its disposal and will usually decide that it is worthwhile to create (and keep) the pointer once for every iteration.

The loop can be refined further:

  pList=list;
  i=0;
  while (i<(sizeof(list)/sizeof(struct SOMETHING)))
  {
    if (pList->active && pList->last_access+60<current_time) return i;

    pList+=1;    
    ++i;
  }

This construction is devoid of any address calculation overhead. "pList+=1" (others might prefer "++pList") causes a constant value (equal to the size of an individual row/member) to be added to pList.

And further:

  pList=list;
  pEndList=&list[sizeof(list)/sizeof(struct SOMETHING)];
  while (pList!=pEndList)
  {
    if (pList->active && pList->last_access+60<current_time) return pList-list;

    pList+=1;    
  }

Which eliminates the index increment and replaces it with one multiplication outside and one division inside the loop (executed just once, in the return construct).

Now before all you don't-optimizers out there start screaming bloody murder my point is that what constructs are acceptable is determined by the size and complexity of the function in which they reside. I would probably not consider this construct in a 300-line function that is complex enough to begin with but in a situation such as the above? If the searches are a significant part of overall processing? If the speed-ups are large enough?

So why not? Pros and cons. It's always pros and cons. Making the best of them. Absolutes? Rarely (if ever).

Absolution answered 9/2, 2011 at 16:14 Comment(0)

A

0

When accessing an array through an index, you are actually performing two operations: an addition (adding the index to the base array address), then a memory access (actually reading or writing what is at the resulting address). I suppose that when you are talking about "accessing by pointer" then you mean that you already have the pointer to the target element. So, logically, using the pointer saves the "addition" part, and thus should be faster, or at least no slower.

However...

As a rough approximation, in a modern computer, the memory access is much more expensive than an addition (especially if it falls out of the caches), so the difference, if any, will be slight. On some architectures (e.g. x86 or PowerPC), the addition and memory access can be combined into a single opcode. Things will also be different, depending on whether the array address is a compile-time constant (i.e. the array is not constant data, but is declared as a global variable, vs a block obtained with malloc()). Using an array may help the compiler find better code, with regards to a generic pointer (in particular when the restrict keyword is used). The context has a huge influence (e.g. how many free registers there are at that point ?).

So:

There is no absolute answer to your question. You have to try and make measures.
If there is a detectable difference (chances are that there will be none), it is hard to predict in which direction, and it depends on a huge set of external factors, including the specific compiler version and optimization flags, processor architecture and model, memory layout and so on.
You will not be able to make any reliable optimization gain without having some rather in-depth knowledge of assembly, and a bit of theory of compilation.
You should concentrate on making a correct code first, and then only worry about optimization; and there is no performance issue until it has been duly measured in realistic conditions.

Acetylate answered 9/2, 2011 at 12:51 Comment(1)

Given an investigative approach, it is entirely possible to learn what sort of machine code the compiler will generate for different C constructs, if not the exact sequence (at least I can't). It is therefore also possible to choose the highest-performing construct and be assured that the performance will be there come execution time. A developer who does not investigate these matters will not be competent to say that it is not possible nor will he or she understand that they should refrain from offering advice except perhaps "I do not know. Isn't there a way you can investigate this?" – Absolution 9/2, 2011 at 22:59

D

-1

Same. It's all O(1), and clock time is negligible. You're basically accessing the memory address.

Doralynn answered 8/2, 2011 at 23:47 Comment(5)

Two operations being O(1) says nothing about their relative speed; both 1 and 1000000 are O(1). – Oho 8/2, 2011 at 23:49

Given the abstract nature of the OPs question, this (and AndreyT's) answer is the only correct answer. You cannot answer this without going into assembly. You say what you know, that it is O(1). Therein lies the beauty of theoretical computer science. – Doralynn 9/2, 2011 at 0:2

How do you know that it's O(1)? What if my compiler compiled a[i] to a loop that increased a pointer by 1 i times? That would make it O(n). If you want to be pedantic... – Housebroken 9/2, 2011 at 0:24

@Oyster: Ah, yes, the famous case of memory being implemented as a linked-list... :) – Doralynn 9/2, 2011 at 11:31

"the only correct answer" - that's a VERY broad statement which I'm not sure anyone could defend, not even you. – Absolution 9/2, 2011 at 16:16

Recommended topics

Hot tags