This is just a demonstration of what fuz answered and others discussed with respect to the what's up with the blx thing.
The linker adds the trampoline/veneer not gcc. No need to use your exact code or even target to demonstrate getting the linker (intentionally or otherwise) to create a veneer.
None of the binaries generated are not intended to run on hardware, just making the tools do what I ask.
Let the C compiler give you some hints.
unsigned int fun ( void )
{
return(0);
}
arm-none-eabi-gcc -O2 -mthumb -c so.c -o so.o
arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: 2000 movs r0, #0
2: 4770 bx lr
arm-none-eabi-gcc -O2 -mthumb -S so.c
cat so.s
Edited to the relevant parts:
.global fun
.syntax unified
.code 16
.thumb_func
.type fun, %function
fun:
movs r0, #0
bx lr
I assume for some (very old) historical reason gcc does overkill by using both .type ... %function and .thumb_func. You only need one, both does not hurt. (You will probably see .thumb instead of .code 16, this is interesting, but not relevant).
Let's call it from C:
extern unsigned int fun ( void );
unsigned int more_fun ( void )
{
return(fun()+1);
}
Intentionally not making it look like an interrupt thing. Using return values I can avoid tail call optimization.
Can build and link, again not a real program, just making the tools do what I ask.
arm-none-eabi-gcc -O2 -c -mthumb so.c -o so.o
arm-none-eabi-gcc -O2 -c -mthumb x.c -o x.o
arm-none-eabi-ld x.o so.o -o so.elf
arm-none-eabi-objdump -d so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00008000 <more_fun>:
8000: b510 push {r4, lr}
8002: f000 f805 bl 8010 <fun>
8006: 3001 adds r0, #1
8008: bc10 pop {r4}
800a: bc02 pop {r1}
800c: 4708 bx r1
800e: 46c0 nop @ (mov r8, r8)
00008010 <fun>:
8010: 2000 movs r0, #0
8012: 4770 bx lr
(Some may recognize why this looks a little strange, quick path to veneers)
Both are thumb mode and close to each other so a simple pc-relative bl, no veneer needed.
Now let's break it. In C:
arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-gcc -O2 -c -mthumb x.c -o x.o
arm-none-eabi-ld x.o so.o -o so.elf
arm-none-eabi-objdump -d so.elf
00008000 <more_fun>:
8000: b510 push {r4, lr}
8002: f000 f809 bl 8018 <__fun_from_thumb>
8006: 3001 adds r0, #1
8008: bc10 pop {r4}
800a: bc02 pop {r1}
800c: 4708 bx r1
800e: 46c0 nop @ (mov r8, r8)
00008010 <fun>:
8010: e3a00000 mov r0, #0
8014: e12fff1e bx lr
00008018 <__fun_from_thumb>:
8018: 4778 bx pc
801a: e7fd b.n 8018 <__fun_from_thumb>
801c: eafffffb b 8010 <fun>
I wonder what linker you are using and as result why it used blx and not bl and a label with veneer in the name vs what binutils ld does. What linker are you using?
Arm code and even just a branch to arm mode will fault on a cortex-m. And that appears to be what happened here.
For fun:
extern unsigned int more_fun ( void );
unsigned int fun ( void )
{
return(more_fun()+1);
}
00008000 <more_fun>:
8000: b510 push {r4, lr}
8002: f000 f811 bl 8028 <__fun_from_thumb>
8006: 3001 adds r0, #1
8008: bc10 pop {r4}
800a: bc02 pop {r1}
800c: 4708 bx r1
800e: 46c0 nop @ (mov r8, r8)
00008010 <fun>:
8010: e92d4010 push {r4, lr}
8014: eb000005 bl 8030 <__more_fun_from_arm>
8018: e8bd4010 pop {r4, lr}
801c: e2800001 add r0, r0, #1
8020: e12fff1e bx lr
8024: 00000000 andeq r0, r0, r0
00008028 <__fun_from_thumb>:
8028: 4778 bx pc
802a: e7fd b.n 8028 <__fun_from_thumb>
802c: eafffff7 b 8010 <fun>
00008030 <__more_fun_from_arm>:
8030: e59fc000 ldr r12, [pc] @ 8038 <__more_fun_from_arm+0x8>
8034: e12fff1c bx r12
8038: 00008001 .word 0x00008001
803c: 00000000 .word 0x00000000
Make both of them have to use a trampoline.
Switch to asm:
.global fun
fun:
mov r0, #0
bx lr
And actually it gets WORSE than what you had.
00008000 <more_fun>:
8000: b510 push {r4, lr}
8002: f000 f805 bl 8010 <fun>
8006: 3001 adds r0, #1
8008: bc10 pop {r4}
800a: bc02 pop {r1}
800c: 4708 bx r1
800e: 46c0 nop @ (mov r8, r8)
00008010 <fun>:
8010: e3a00000 mov r0, #0
8014: e12fff1e bx lr
We did not tell the linker that fun is a function, so it just assumes it is same mode. But generates arm instructions which will hopefully fault.
.global fun
.type fun, %function
fun:
mov r0, #0
bx lr
00008000 <more_fun>:
8000: b510 push {r4, lr}
8002: f000 f809 bl 8018 <__fun_from_thumb>
8006: 3001 adds r0, #1
8008: bc10 pop {r4}
800a: bc02 pop {r1}
800c: 4708 bx r1
800e: 46c0 nop @ (mov r8, r8)
00008010 <fun>:
8010: e3a00000 mov r0, #0
8014: e12fff1e bx lr
00008018 <__fun_from_thumb>:
8018: 4778 bx pc
801a: e7fd b.n 8018 <__fun_from_thumb>
801c: eafffffb b 8010 <fun>
Not as scary but still will fault. And note this faults using the recommended solution to this problem!
.global fun
.thumb_func
fun:
mov r0, #0
bx lr
Wow, okay, decades with these tools and learned something new today:
00008000 <more_fun>:
8000: b510 push {r4, lr}
8002: f000 f805 bl 8010 <fun>
8006: 3001 adds r0, #1
8008: bc10 pop {r4}
800a: bc02 pop {r1}
800c: 4708 bx r1
800e: 46c0 nop @ (mov r8, r8)
00008010 <fun>:
8010: 2000 movs r0, #0
8012: 4770 bx lr
I fully expected it to error out with you are not in thumb mode, instead...it put me in thumb mode. Not sure I like that, but...
I avoided the cortex-m7 because I thought it was doing that, let's see.
.cpu cortex-m7
.global fun
.type fun, %function
fun:
mov r0, #0
bx lr
Logs:
arm-none-eabi-as so.s -o so.o
so.s: Assembler messages:
so.s:6: Error: attempt to use an ARM instruction on a Thumb-only processor -- `mov r0,#0'
so.s:7: Error: attempt to use an ARM instruction on a Thumb-only processor -- `bx lr'
Well that is interesting, but as desired, keep me from failure.
.cpu cortex-m7
.thumb
.global fun
fun:
mov r0, #0
bx lr
00008000 <more_fun>:
8000: b510 push {r4, lr}
8002: f000 f805 bl 8010 <fun>
8006: 3001 adds r0, #1
8008: bc10 pop {r4}
800a: bc02 pop {r1}
800c: 4708 bx r1
800e: 46c0 nop @ (mov r8, r8)
00008010 <fun>:
8010: 2000 movs r0, #0
8012: 4770 bx lr
Now this worked, without .thumb_func nor .type %function. The .cpu cortex-m7 forced thumb mode (for this binutils) and we saw that without declaring the label as a function the linker just assumes it is the same mode and does not trampoline. Feeling like this fixed it but you should still declare the label a function (for not 64 bit arm).
Please never think in terms of ADD ONE to an address think OR ONE. If you add one to a properly created label you will get an lsbit of 0 and fail. If you OR one then if you do it wrong it will fix your label address if you do it right it will not break it.
.globl _start
.thumb
_start:
.word one
.word two
.word three
.type two, %function
.thumb_func
one:
nop
two:
nop
.thumb_func
four:
three:
nop
Disassembly of section .text:
00008000 <_start>:
8000: 0000800d .word 0x0000800d
8004: 0000800f .word 0x0000800f
8008: 00008010 .word 0x00008010
0000800c <one>:
800c: 46c0 nop @ (mov r8, r8)
0000800e <two>:
800e: 46c0 nop @ (mov r8, r8)
00008010 <four>:
8010: 46c0 nop @ (mov r8, r8)
I am now wondering what linker you are using.
Edit
extern unsigned int more_fun ( void );
unsigned int fun ( void )
{
return(more_fun()+1);
}
extern unsigned int fun ( void );
unsigned int more_fun ( void )
{
return(fun()+1);
}
arm-none-eabi-gcc -nostdlib -nostartfiles -ffreestanding -O2 -c so.c -o so.o
arm-none-eabi-gcc -nostdlib -nostartfiles -ffreestanding -O2 -c -mthumb x.c -o x.o
arm-none-eabi-gcc -nostdlib -nostartfiles -ffreestanding so.o x.o -o so.elf
arm-none-eabi-objdump -d so.elf
Disassembly of section .text:
00008000 <fun>:
8000: e92d4010 push {r4, lr}
8004: eb000007 bl 8028 <__more_fun_from_arm>
8008: e8bd4010 pop {r4, lr}
800c: e2800001 add r0, r0, #1
8010: e12fff1e bx lr
00008014 <more_fun>:
8014: b510 push {r4, lr}
8016: f000 f80d bl 8034 <__fun_from_thumb>
801a: 3001 adds r0, #1
801c: bc10 pop {r4}
801e: bc02 pop {r1}
8020: 4708 bx r1
8022: 46c0 nop @ (mov r8, r8)
8024: 0000 movs r0, r0
...
00008028 <__more_fun_from_arm>:
8028: e59fc000 ldr r12, [pc] @ 8030 <__more_fun_from_arm+0x8>
802c: e12fff1c bx r12
8030: 00008015 .word 0x00008015
00008034 <__fun_from_thumb>:
8034: 4778 bx pc
8036: e7fd b.n 8034 <__fun_from_thumb>
8038: eafffff0 b 8000 <fun>
803c: 00000000 andeq r0, r0, r0
As well as grepping through binutils, I do not see it generating the word veneer in a label. Maybe the debugger somehow magically knows and generates that string. Otherwise not sure what linker and from that why it may be generating a blx instead of bl.
BLX label
is available but not for cortex-m7 core. You are compiling for the wrong target! – Avifauna.type asm_func, %function
to your assembly file to fix this prolem. – Denotativeldr sp
is to set up the stack pointer in some custom way. ARM cores already do this automatically by loading it from flash, so some manner of stack is already set up prior to this. It would seem that some register was already pushed to the default stack. Then you change the stack pointer in your function. Then the program attempts to pop something but the stack you just set up is empty. In comes the hard fault. Comment out theldr
line, all problems gone? – Faughtldr sp
in the source, then in the disassemblyldr pc
. Besides, isn't the sp calledmsp
in ARM... I guess the OP didn't post the actual code but something else, just to confuse... – Faught.thumb
directive in your asm file in case you forgot to do so before. – Denotativeasm_func();
is C code and in this case brings in calling convention which in turn brings in stacking. So you can't use C code like that without setting up the stack first, as evident from your own disassembly. You could perhaps call the function from inline asm instead so that nothing gets stacked by accident. – Faught