Bad BLX instruction generated when calling asm function from C function (gcc on STM32H753)
Asked Answered
H

2

5

Context is: STM32H753 bare-metal software compiled with arm-none-eabi-gcc.

The reset handler is implemented in C and located in Flash memory:

void reset_handler_c(void)
{
   asm_func();
}

The asm function is implemented in a .s file and located in RAM:

.global asm_func

asm_func:
  ldr sp,=xxx
  bl entry_point
  bx lr

(As it is, it does not make a lot of sense but it is obviously a simplified example just to reproduce the issue)

The asm generated is the following:

enter image description here

Problem is: the BLX instruction can take only a register as parameter and a hardfault is generated at execution. Extract from STM32H7 programming manual:

enter image description here

Now, if I call a C function instead, the BLX is replaced by a BL, which is correct:

enter image description here

Any idea why gcc is generating this weird BLX instruction ?

EDIT: the compilation options are -mcpu=cortex-m7 -std=c99 -mfpu=fpv5-d16 -mfloat-abi=hard -mthumb -O1 ...

Hippy answered 11/4, 2024 at 9:39 Comment(17)
Do you use the correct gcc flags? Show us the invocation. BLX label is available but not for cortex-m7 core. You are compiling for the wrong target!Avifauna
edited the answer, i think the target is correct.Hippy
Add .type asm_func, %function to your assembly file to fix this prolem.Denotative
I might misunderstand something here, but the bug seems pretty plain to me. The purpose of ldr sp is to set up the stack pointer in some custom way. ARM cores already do this automatically by loading it from flash, so some manner of stack is already set up prior to this. It would seem that some register was already pushed to the default stack. Then you change the stack pointer in your function. Then the program attempts to pop something but the stack you just set up is empty. In comes the hard fault. Comment out the ldr line, all problems gone?Faught
A rule of thumb from other non-ARM cores: C programming is not available until the sp has been set.Faught
@Faught it is on arm cores as SP is set by the hardwareAvifauna
@Avifauna Yeah but apart from the reasoning about setting up the sp, I'm not following the question because it says ldr sp in the source, then in the disassembly ldr pc. Besides, isn't the sp called msp in ARM... I guess the OP didn't post the actual code but something else, just to confuse...Faught
The prupose of the asm function is to set the SP (at a different location than the SP frmo the vector table). As I said I tried to build a minimal reproductible exampleHippy
The problem comes from the BLX instruction that is invalid (if I understood correctly), I don't think it is linked to the stack pointer manipulation.Hippy
You're right, changing the SP is dangerous but it is done by an OS, a bootloader etc...Hippy
@GuillaumePetitjean Have you tried setting the symbol type? Did that fix your issue? You may also need to supply a .thumb directive in your asm file in case you forgot to do so before.Denotative
@Denotative yes it worked, I accepted your answer , thank youHippy
@GuillaumePetitjean The problem is that asm_func(); is C code and in this case brings in calling convention which in turn brings in stacking. So you can't use C code like that without setting up the stack first, as evident from your own disassembly. You could perhaps call the function from inline asm instead so that nothing gets stacked by accident.Faught
Note gcc does not insert the veneer, it is the linker. This is interesting as I have only seen binutils insert a bl not a bl for a veneer. As answered though the problem is not the blx it is the mode switch to arm on a thumb only machine.Crack
note for thumb code you can use .thumb_func before the label rather than the more generic .type functionname, %function. there does not appear to be a .arm_func though so it is a personal choice.Crack
for C code you insure that when you compile you specify -mthumb so that the generated code marks the labels as thumb functions so that the linker can properly link the code.Crack
the arm manuals are the standard/reference. st may have some but you should always get the architectural reference manual and technical reference manuals from arm (the programmers reference from arm is as much of a problem as it solves, so be very careful with that one or just avoid it).Crack
D
7

If you call a function that's in another section or object, the function address is not known ahead of time. A relocation is generated and fixed up by the linker at link time. To patch in the correct function call for the relocation, the linker needs to know whether the function you call is an ARM or a Thumb function. It knows this by inspecting the least-significant bit of the address of the symbol you called. If it's set, it generates code to call a Thumb function. If it's clear, it generates code to call an ARM function. This is what went wrong in your case: the LSB of the address is clear, hence a BLX instruction to call an ARM function was generated.

The least significant bit of the address needs to be set by the assembler for thumb functions for this to work. However, there's a bit of a problem here: setting the least significant bit is the right thing for function symbols, but wrong for all other symbols. Say for example you're placing a look-up table in the text section and want to access it through a symbol. If the assembler was to set the LSB in the look-up table's symbol, it would introduce off-by-one errors when you tried to access the table. For this reason, the assembler only sets the LSB when you declare the symbol to be a function-type symbol. For this to happen, you need to issue an appropriate .type directive in the translation unit where the symbol is defined:

.type asm_func, %function

With the symbol type declared correctly, the assembler will set the LSB correctly and the linker will generate the correct type of function call.

It's a good habit to do this for every symbol referring to a function, regardless of architecture. This'll fix a number of diffuse problems you might encounter otherwise.

Also make sure your assembly file is indeed assembled for thumb mode by issuing a

.thumb

directive as the first thing in the source code. I assume you have already done so. Changing the target will not change what mode code is emitted, but it may cause the assembler to refuse assembling code in ARM mode, which is at least a visible failure at build time instead of a silent failure at runtime.

Denotative answered 11/4, 2024 at 10:49 Comment(9)
it does not solve the problem. The illegal instruction is still generated.Avifauna
@Avifauna Assuming OPs assembly file is assembled for thumb mode (by means of a .thumb directive), this will indeed solve the problem. Had the same issue before. Changing compiler options or the target will not change any of that because whether a BL or BLX instruction is generated is up to the linker, who does not change its behaviour based on the target selected.Denotative
It indeed solved the issue. I thought about the thumb / adress least significant bit but since the instruction looked invalid (BLX with label and not with register) I didn't think it was related.Hippy
Indeed this is correct, but also, it is so wrong. The best answer is that the tooling would just assume thumb for everything. It is Cortex-M CPU. There is no possibility of the 32-bit ARM legacy. I will not blame the tool vendors. It is an accident of historical ISA, and the historical cost of transistors vs modern day. Why on earth is linking concerned with Thumb/ARM, if we have set a CPU that is only Thumb2? These relocations are baked into the ARM elf format, but really there are ways the tools could make this work. Still a great pragmatic answer.Trumantrumann
@artlessnoise That's the way the EABI is specified. File a complaint with ARM if you think it should work differently. I think it's very useful that the linker always works the same way, regardless of whether you need thumb interwork or not.Denotative
@artlessnoise Also note that the distinction between function and data symbols and the +1 for function symbols is still needed even if only Thumb is used as e.g. the BLX Rd instruction checks if the LSB is set before calling a function. So the linker must know which symbols refer to (Thumb) functions to generate correct function pointers in any case.Denotative
I don't think this is the case for a Cortex-M CPU. The EABI baggage is also due to history. Even with more modern Cortex-A, you would use Thumb-2 as 32-bit ARM has no advantage. Interwork only makes sense with Thumb-1 where the registers were limited. Thumb-2 can do anything ARM 32bit can do, but with a more concise instruction stream, not burden with the large condition fields. It takes more silicon to implement Thumb2, but it is not a concern for the last 10 years. Yes, I was implying that the tools get rid of interworking, it is a concept for another time (and the root of this issue)Trumantrumann
@artlessnoise If you think the thumb bit is ignored for ARMv6-M / ARMv7-M class processors, you are unfortunately wrong. Check the appropriate Architecture Reference Manuals for details. Note that UNIX systems running 32 bit ARM code still default to ARM-mode code, so there's still quite a bit of use. I do agree that Thumb code is generally superior, but if the CPU supports both execution states, the ABI should do so, too.Denotative
ARM could make this a configurable behaviour in newer cores, because it is effectively useless for new designs with new software systems and is only for legacy support. The manual may say one things about Cortex-M cores, but they would have to implement logic that would have no tangible benefit. On the Cortex-M, I would find it bizarre (and malicious) if they actually examine this bit. There are no other possible execution modes. Yes, design exist where interworking is required and tools should support this; but they (tools and systems) should also give the option to ignore it.Trumantrumann
C
-1

This is just a demonstration of what fuz answered and others discussed with respect to the what's up with the blx thing.

The linker adds the trampoline/veneer not gcc. No need to use your exact code or even target to demonstrate getting the linker (intentionally or otherwise) to create a veneer.

None of the binaries generated are not intended to run on hardware, just making the tools do what I ask.

Let the C compiler give you some hints.

unsigned int fun ( void )
{
    return(0);
}

arm-none-eabi-gcc -O2 -mthumb -c so.c -o so.o
arm-none-eabi-objdump -d so.o

so.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <fun>:
   0:   2000        movs    r0, #0
   2:   4770        bx  lr

arm-none-eabi-gcc -O2 -mthumb -S so.c cat so.s

Edited to the relevant parts:

    .global fun
    .syntax unified
    .code   16
    .thumb_func
    .type   fun, %function
fun:
    movs    r0, #0
    bx  lr

I assume for some (very old) historical reason gcc does overkill by using both .type ... %function and .thumb_func. You only need one, both does not hurt. (You will probably see .thumb instead of .code 16, this is interesting, but not relevant).

Let's call it from C:

extern unsigned int fun ( void );

unsigned int more_fun ( void )
{
    return(fun()+1);
}

Intentionally not making it look like an interrupt thing. Using return values I can avoid tail call optimization.

Can build and link, again not a real program, just making the tools do what I ask.

arm-none-eabi-gcc -O2 -c -mthumb so.c -o so.o
arm-none-eabi-gcc -O2 -c -mthumb x.c -o x.o
arm-none-eabi-ld x.o so.o -o so.elf
arm-none-eabi-objdump -d so.elf

so.elf:     file format elf32-littlearm


Disassembly of section .text:

00008000 <more_fun>:
    8000:   b510        push    {r4, lr}
    8002:   f000 f805   bl  8010 <fun>
    8006:   3001        adds    r0, #1
    8008:   bc10        pop {r4}
    800a:   bc02        pop {r1}
    800c:   4708        bx  r1
    800e:   46c0        nop         @ (mov r8, r8)

00008010 <fun>:
    8010:   2000        movs    r0, #0
    8012:   4770        bx  lr

(Some may recognize why this looks a little strange, quick path to veneers)

Both are thumb mode and close to each other so a simple pc-relative bl, no veneer needed.

Now let's break it. In C:

arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-gcc -O2 -c -mthumb x.c -o x.o
arm-none-eabi-ld x.o so.o -o so.elf
arm-none-eabi-objdump -d so.elf

00008000 <more_fun>:
    8000:   b510        push    {r4, lr}
    8002:   f000 f809   bl  8018 <__fun_from_thumb>
    8006:   3001        adds    r0, #1
    8008:   bc10        pop {r4}
    800a:   bc02        pop {r1}
    800c:   4708        bx  r1
    800e:   46c0        nop         @ (mov r8, r8)

00008010 <fun>:
    8010:   e3a00000    mov r0, #0
    8014:   e12fff1e    bx  lr

00008018 <__fun_from_thumb>:
    8018:   4778        bx  pc
    801a:   e7fd        b.n 8018 <__fun_from_thumb>
    801c:   eafffffb    b   8010 <fun>

I wonder what linker you are using and as result why it used blx and not bl and a label with veneer in the name vs what binutils ld does. What linker are you using?

Arm code and even just a branch to arm mode will fault on a cortex-m. And that appears to be what happened here.

For fun:

extern unsigned int more_fun ( void );
unsigned int fun ( void )
{
    return(more_fun()+1);
}

00008000 <more_fun>:
    8000:   b510        push    {r4, lr}
    8002:   f000 f811   bl  8028 <__fun_from_thumb>
    8006:   3001        adds    r0, #1
    8008:   bc10        pop {r4}
    800a:   bc02        pop {r1}
    800c:   4708        bx  r1
    800e:   46c0        nop         @ (mov r8, r8)

00008010 <fun>:
    8010:   e92d4010    push    {r4, lr}
    8014:   eb000005    bl  8030 <__more_fun_from_arm>
    8018:   e8bd4010    pop {r4, lr}
    801c:   e2800001    add r0, r0, #1
    8020:   e12fff1e    bx  lr
    8024:   00000000    andeq   r0, r0, r0

00008028 <__fun_from_thumb>:
    8028:   4778        bx  pc
    802a:   e7fd        b.n 8028 <__fun_from_thumb>
    802c:   eafffff7    b   8010 <fun>

00008030 <__more_fun_from_arm>:
    8030:   e59fc000    ldr r12, [pc]   @ 8038 <__more_fun_from_arm+0x8>
    8034:   e12fff1c    bx  r12
    8038:   00008001    .word   0x00008001
    803c:   00000000    .word   0x00000000

Make both of them have to use a trampoline.

Switch to asm:

    .global fun
fun:
    mov r0, #0
    bx  lr

And actually it gets WORSE than what you had.

00008000 <more_fun>:
    8000:   b510        push    {r4, lr}
    8002:   f000 f805   bl  8010 <fun>
    8006:   3001        adds    r0, #1
    8008:   bc10        pop {r4}
    800a:   bc02        pop {r1}
    800c:   4708        bx  r1
    800e:   46c0        nop         @ (mov r8, r8)

00008010 <fun>:
    8010:   e3a00000    mov r0, #0
    8014:   e12fff1e    bx  lr

We did not tell the linker that fun is a function, so it just assumes it is same mode. But generates arm instructions which will hopefully fault.

    .global fun
    .type fun, %function
fun:
    mov r0, #0
    bx  lr


00008000 <more_fun>:
    8000:   b510        push    {r4, lr}
    8002:   f000 f809   bl  8018 <__fun_from_thumb>
    8006:   3001        adds    r0, #1
    8008:   bc10        pop {r4}
    800a:   bc02        pop {r1}
    800c:   4708        bx  r1
    800e:   46c0        nop         @ (mov r8, r8)

00008010 <fun>:
    8010:   e3a00000    mov r0, #0
    8014:   e12fff1e    bx  lr

00008018 <__fun_from_thumb>:
    8018:   4778        bx  pc
    801a:   e7fd        b.n 8018 <__fun_from_thumb>
    801c:   eafffffb    b   8010 <fun>

Not as scary but still will fault. And note this faults using the recommended solution to this problem!

    .global fun
    .thumb_func
fun:
    mov r0, #0
    bx  lr

Wow, okay, decades with these tools and learned something new today:

00008000 <more_fun>:
    8000:   b510        push    {r4, lr}
    8002:   f000 f805   bl  8010 <fun>
    8006:   3001        adds    r0, #1
    8008:   bc10        pop {r4}
    800a:   bc02        pop {r1}
    800c:   4708        bx  r1
    800e:   46c0        nop         @ (mov r8, r8)

00008010 <fun>:
    8010:   2000        movs    r0, #0
    8012:   4770        bx  lr

I fully expected it to error out with you are not in thumb mode, instead...it put me in thumb mode. Not sure I like that, but...

I avoided the cortex-m7 because I thought it was doing that, let's see.

    .cpu cortex-m7
    .global fun
    .type fun, %function
fun:
    mov r0, #0
    bx  lr

Logs:

arm-none-eabi-as so.s -o so.o
so.s: Assembler messages:
so.s:6: Error: attempt to use an ARM instruction on a Thumb-only processor -- `mov r0,#0'
so.s:7: Error: attempt to use an ARM instruction on a Thumb-only processor -- `bx lr'

Well that is interesting, but as desired, keep me from failure.

    .cpu cortex-m7
    .thumb
    .global fun
fun:
    mov r0, #0
    bx  lr

00008000 <more_fun>:
    8000:   b510        push    {r4, lr}
    8002:   f000 f805   bl  8010 <fun>
    8006:   3001        adds    r0, #1
    8008:   bc10        pop {r4}
    800a:   bc02        pop {r1}
    800c:   4708        bx  r1
    800e:   46c0        nop         @ (mov r8, r8)

00008010 <fun>:
    8010:   2000        movs    r0, #0
    8012:   4770        bx  lr

Now this worked, without .thumb_func nor .type %function. The .cpu cortex-m7 forced thumb mode (for this binutils) and we saw that without declaring the label as a function the linker just assumes it is the same mode and does not trampoline. Feeling like this fixed it but you should still declare the label a function (for not 64 bit arm).

Please never think in terms of ADD ONE to an address think OR ONE. If you add one to a properly created label you will get an lsbit of 0 and fail. If you OR one then if you do it wrong it will fix your label address if you do it right it will not break it.

.globl _start
.thumb

_start:
.word one
.word two
.word three

.type two, %function


.thumb_func
one:
    nop
    
two:
    nop

.thumb_func
four:
three:
    nop



Disassembly of section .text:

00008000 <_start>:
    8000:   0000800d    .word   0x0000800d
    8004:   0000800f    .word   0x0000800f
    8008:   00008010    .word   0x00008010

0000800c <one>:
    800c:   46c0        nop         @ (mov r8, r8)

0000800e <two>:
    800e:   46c0        nop         @ (mov r8, r8)

00008010 <four>:
    8010:   46c0        nop         @ (mov r8, r8)

I am now wondering what linker you are using.

Edit

extern unsigned int more_fun ( void );

unsigned int fun ( void )
{
    return(more_fun()+1);
}


extern unsigned int fun ( void );

unsigned int more_fun ( void )
{
    return(fun()+1);
}


arm-none-eabi-gcc -nostdlib -nostartfiles -ffreestanding -O2 -c so.c -o so.o
arm-none-eabi-gcc -nostdlib -nostartfiles -ffreestanding -O2 -c -mthumb x.c -o x.o
arm-none-eabi-gcc -nostdlib -nostartfiles -ffreestanding so.o x.o -o so.elf
arm-none-eabi-objdump -d so.elf


Disassembly of section .text:

00008000 <fun>:
    8000:   e92d4010    push    {r4, lr}
    8004:   eb000007    bl  8028 <__more_fun_from_arm>
    8008:   e8bd4010    pop {r4, lr}
    800c:   e2800001    add r0, r0, #1
    8010:   e12fff1e    bx  lr

00008014 <more_fun>:
    8014:   b510        push    {r4, lr}
    8016:   f000 f80d   bl  8034 <__fun_from_thumb>
    801a:   3001        adds    r0, #1
    801c:   bc10        pop {r4}
    801e:   bc02        pop {r1}
    8020:   4708        bx  r1
    8022:   46c0        nop         @ (mov r8, r8)
    8024:   0000        movs    r0, r0
    ...

00008028 <__more_fun_from_arm>:
    8028:   e59fc000    ldr r12, [pc]   @ 8030 <__more_fun_from_arm+0x8>
    802c:   e12fff1c    bx  r12
    8030:   00008015    .word   0x00008015

00008034 <__fun_from_thumb>:
    8034:   4778        bx  pc
    8036:   e7fd        b.n 8034 <__fun_from_thumb>
    8038:   eafffff0    b   8000 <fun>
    803c:   00000000    andeq   r0, r0, r0

As well as grepping through binutils, I do not see it generating the word veneer in a label. Maybe the debugger somehow magically knows and generates that string. Otherwise not sure what linker and from that why it may be generating a blx instead of bl.

Crack answered 11/4, 2024 at 17:37 Comment(4)
I'm using arm-none-eabi-gcc as linker. nothing fancy really.Hippy
gcc (binary) is neither a compiler nor linker it is a scheduler it calls a compiler then it calls an assembler then if enabled (not disabled) it calls a linker. it does not appear that gcc is calling gnu ld. you can have it show you what commands and commands line it is calling.Crack
by default it uses binutils assembler to turn the output of the compiler into an object and it uses binutils linker to link objects together into a final binary.Crack
yes it calls GNU ld : I just checked.Hippy

© 2022 - 2025 — McMap. All rights reserved.