I have a situation where some of the address space is sensitive in that you read it you crash as there is nobody there to respond to that address.
pop {r3,pc}
bx r0
0: e8bd8008 pop {r3, pc}
4: e12fff10 bx r0
8: bd08 pop {r3, pc}
a: 4700 bx r0
The bx was not created by the compiler as an instruction, instead it is the result of a 32 bit constant that didnt fit as an immediate in a single instruction so a pc relative load is setup. This is basically the literal pool. And it happens to have bits that resemble a bx.
Can easily write a test program to generate the issue.
unsigned int more_fun ( unsigned int );
unsigned int fun ( void )
{
return(more_fun(0x12344700)+1);
}
00000000 <fun>:
0: b510 push {r4, lr}
2: 4802 ldr r0, [pc, #8] ; (c <fun+0xc>)
4: f7ff fffe bl 0 <more_fun>
8: 3001 adds r0, #1
a: bd10 pop {r4, pc}
c: 12344700 eorsne r4, r4, #0, 14
What appears to be happening is the processor is waiting on data coming back from the pop (ldm) moves onto the next instruction bx r0 in this case, and starts a prefetch at the address in r0. Which hangs the ARM.
As humans we see the pop as an unconditional branch, but the processor does not it keeps going through the pipe.
Prefetching and branch prediction are nothing new (we have the branch predictor off in this case), decades old, and not limited to ARM, but the number of instruction sets that have the PC as GPR and instructions that to some extent treat it as non-special are few.
I am looking for a gcc command line option to prevent this. I cant imagine we are the first ones to see this.
I can of course do this
-march=armv4t
00000000 <fun>:
0: b510 push {r4, lr}
2: 4803 ldr r0, [pc, #12] ; (10 <fun+0x10>)
4: f7ff fffe bl 0 <more_fun>
8: 3001 adds r0, #1
a: bc10 pop {r4}
c: bc02 pop {r1}
e: 4708 bx r1
10: 12344700 eorsne r4, r4, #0, 14
preventing the problem
Note, not limited to thumb mode, gcc can produce arm code as well for something like this with the literal pool after the pop.
unsigned int more_fun ( unsigned int );
unsigned int fun ( void )
{
return(more_fun(0xe12fff10)+1);
}
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: e59f0008 ldr r0, [pc, #8] ; 14 <fun+0x14>
8: ebfffffe bl 0 <more_fun>
c: e2800001 add r0, r0, #1
10: e8bd8010 pop {r4, pc}
14: e12fff10 bx r0
Hoping someone knows a generic or arm specific option to do an armv4t like return (pop {r4,lr}; bx lr in arm mode for example) without the baggage or puts a branch to self immediately after a pop pc (seems to solve the problem the pipe is not confused about b as an unconditional branch.
EDIT
ldr pc,[something]
bx rn
also causes a prefetch. which is not going to fall under -march=armv4t. gcc intentionally generates ldrls pc,[]; b somewhere for switch statements and that is fine. Didnt inspect the backend to see if there are other ldr pc,[] instructions generated.
EDIT
Looks like ARM did report this as an Errata (erratum 720247, "Speculative Instruction fetches can be made anywhere in the memory map"), wish I had known that before we spent a month on it...