Really Minimal STM32 Application: linker failure
Asked Answered
D

4

5

I'm building a tiny microcontroller with only the bare essentials for self-educational purposes. This way, I can refresh my knowledge about topics like the linkerscript, the startup code, ...


EDIT:
I got quite a lot of comments pointing out that the "absolute minimal STM32-application" shown below is no good. You are absolutely right when noticing that the vector table is not complete, the .bss-section is not taken care of, the peripheral addresses are not complete, ... Please allow me to explain why.

  1. It has never been the purpose of the author to write a complete and useful application in this particular chapter. His purpose was to explain step-by-step how a linkerscript works, how startup code works, what the boot procedure of an STM32 looks like, ... purely for educational purposes. I can appreciate this approach, and learned a lot.

  2. The example I have put below is taken from the middle of the chapter in question. The chapter keeps adding more parts to the linkerscript and startup code (for example initialization of .bss-section) as it goes forward.
    The reason I put files here from the middle of his chapter, is because I got stuck at a particular error message. I want to get that fixed before continuing.

  3. The chapter in question is somewhere at the end of his book. It is intended for the more experienced or curious reader who wants to gain deeper knowledge about topics most people don't even consider (most people use the standard linkerscript and startup code given by the manufacturer without ever reading it).

Keeping this in mind, please let us focus on the technical issue at hand (as described below in the error messages). Please also accept my sincere apologies that I didn't clarify the intentions of the writer earlier. But I've done it now, so we can move on ;-)


 

1. Absolute minimal STM32-application

The tutorial I'm following is chapter 20 from this book: "Mastering STM32" (https://leanpub.com/mastering-stm32). The book explains how to make a tiny microcontroller application with two files: main.c and linkerscript.ld. As I'm not using an IDE (like Eclipse), I also added build.bat and clean.bat to generate the compilation commands. So my project folder looks like this:

enter image description here

Before I continue, I should perhaps give some more details about my system:

  • OS: Windows 10, 64-bit

  • Microcontroller: NUCLEO-F401RE board with STM32F401RE microcontroller.

  • Compiler: arm-none-eabi-gcc version 6.3.1 20170620 (release) [ARM/embedded-6-branch revision 249437].

The main file looks like this:

/* ------------------------------------------------------------ */
/*                     Minimal application                      */
/*                      for NUCLEO-F401RE                       */
/* ------------------------------------------------------------ */
typedef unsigned long uint32_t;

/* Memory and peripheral start addresses (common to all STM32 MCUs) */
#define FLASH_BASE      0x08000000
#define SRAM_BASE       0x20000000
#define PERIPH_BASE     0x40000000

/* Work out end of RAM address as initial stack pointer
 * (specific of a given STM32 MCU) */
#define SRAM_SIZE       96*1024 //STM32F401RE has 96 KB of RAM
#define SRAM_END        (SRAM_BASE + SRAM_SIZE)

/* RCC peripheral addresses applicable to GPIOA
 * (specific of a given STM32 MCU) */
#define RCC_BASE        (PERIPH_BASE + 0x23800)
#define RCC_APB1ENR     ((uint32_t*)(RCC_BASE + 0x30))

/* GPIOA peripheral addresses
 * (specific of a given STM32 MCU) */
#define GPIOA_BASE      (PERIPH_BASE + 0x20000)
#define GPIOA_MODER     ((uint32_t*)(GPIOA_BASE + 0x00))
#define GPIOA_ODR       ((uint32_t*)(GPIOA_BASE + 0x14))

/* Function headers */
int main(void);
void delay(uint32_t count);

/* Minimal vector table */
uint32_t *vector_table[] __attribute__((section(".isr_vector"))) = {
    (uint32_t*)SRAM_END,    // initial stack pointer (MSP)
    (uint32_t*)main         // main as Reset_Handler
};

/* Main function */
int main() {
    /* Enable clock on GPIOA peripheral */
    *RCC_APB1ENR = 0x1;

    /* Configure the PA5 as output pull-up */
    *GPIOA_MODER |= 0x400;  // Sets MODER[11:10] = 0x1

    while(1) {    // Always true
        *GPIOA_ODR = 0x20;
        delay(200000);
        *GPIOA_ODR = 0x0;
        delay(200000);
    }
}

void delay(uint32_t count) {
    while(count--);
}

 
The linkerscript looks like this:

/* ------------------------------------------------------------ */
/*                        Linkerscript                          */
/*                      for NUCLEO-F401RE                       */
/* ------------------------------------------------------------ */

/* Memory layout for STM32F401RE */
MEMORY
{
    FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
    SRAM (xrw) : ORIGIN = 0x20000000, LENGTH = 96K
}

/* The ENTRY(..) directive overrides the default entry point symbol _start.
 * Here we define the main-routine as the entry point.
 * In fact, the ENTRY(..) directive is meaningless for embedded chips,
 * but it is informative for debuggers. */
ENTRY(main)

SECTIONS
{
    /* Program code into FLASH */
    .text : ALIGN(4)
    {
        *(.isr_vector)          /* Vector table */
        *(.text)                /* Program code */
        *(.text*)               /* Merge all .text.* sections inside the .text section */
        KEEP(*(.isr_vector))    /* Don't allow other tools to strip this off */
    } >FLASH


    _sidata = LOADADDR(.data);  /* Used by startup code to initialize data */

    .data : ALIGN(4)
    {
        . = ALIGN(4);
        _sdata = .;             /* Create a global symbol at data start */

        *(.data)
        *(.data*)

        . = ALIGN(4);
        _edata = .;             /* Define a global symbol at data end */
    } >SRAM AT >FLASH

}

 
The build.bat file calls the compiler on main.c, and next the linker:

@echo off
setlocal EnableDelayedExpansion

echo.
echo ----------------------------------------------------------------
echo.             )\     ***************************
echo.   ( =_=_=_=^<  ^|    * build NUCLEO-F401RE     *     
echo.             )(     ***************************
echo.             ""                        
echo.                                       
echo.
echo.   Call the compiler on main.c
echo.
@arm-none-eabi-gcc main.c -o main.o -c -MMD -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -O0 -g3 -Wall -fmessage-length=0 -Werror-implicit-function-declaration -Wno-comment -Wno-unused-function -ffunction-sections -fdata-sections
echo.
echo.   Call the linker
echo.
@arm-none-eabi-gcc main.o -o myApp.elf -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -specs=nosys.specs -specs=nano.specs -T linkerscript.ld -Wl,-Map=output.map -Wl,--gc-sections
echo.
echo.   Post build
echo.
@arm-none-eabi-objcopy -O binary myApp.elf myApp.bin
arm-none-eabi-size myApp.elf
echo.
echo ----------------------------------------------------------------

 
The clean.bat file removes all the compiler output:

@echo off
setlocal EnableDelayedExpansion

echo ----------------------------------------------------------------
echo.        __         **************    
echo.      __\ \___     *   clean    *    
echo.      \ _ _ _ \    **************    
echo.       \_`_`_`_\                     
echo.                                     
del /f /q main.o
del /f /q main.d
del /f /q myApp.bin
del /f /q myApp.elf
del /f /q output.map
echo ----------------------------------------------------------------

Building this works. I get the following output:

C:\Users\Kristof\myProject>build

----------------------------------------------------------------
             )\     ***************************
   ( =_=_=_=<  |    * build NUCLEO-F401RE     *
             )(     ***************************
             ""


   Call the compiler on main.c


   Call the linker


   Post build

   text    data     bss     dec     hex filename
    112       0       0     112      70 myApp.elf

----------------------------------------------------------------

 

2. Proper startup code

Maybe you have noticed that the minimal application didn't have proper startup code to initialize the global variables in the .data-section. Chapter 20.2.2 .data and .bss Sections initialization from the "Mastering STM32" book explains how to do this.

As I follow along, my main.c file now looks like this:

/* ------------------------------------------------------------ */
/*                     Minimal application                      */
/*                      for NUCLEO-F401RE                       */
/* ------------------------------------------------------------ */
typedef unsigned long uint32_t;

/* Memory and peripheral start addresses (common to all STM32 MCUs) */
#define FLASH_BASE      0x08000000
#define SRAM_BASE       0x20000000
#define PERIPH_BASE     0x40000000

/* Work out end of RAM address as initial stack pointer
 * (specific of a given STM32 MCU) */
#define SRAM_SIZE       96*1024 //STM32F401RE has 96 KB of RAM
#define SRAM_END        (SRAM_BASE + SRAM_SIZE)

/* RCC peripheral addresses applicable to GPIOA
 * (specific of a given STM32 MCU) */
#define RCC_BASE        (PERIPH_BASE + 0x23800)
#define RCC_APB1ENR     ((uint32_t*)(RCC_BASE + 0x30))

/* GPIOA peripheral addresses
 * (specific of a given STM32 MCU) */
#define GPIOA_BASE      (PERIPH_BASE + 0x20000)
#define GPIOA_MODER     ((uint32_t*)(GPIOA_BASE + 0x00))
#define GPIOA_ODR       ((uint32_t*)(GPIOA_BASE + 0x14))

/* Function headers */
void __initialize_data(uint32_t*, uint32_t*, uint32_t*);
void _start (void);
int main(void);
void delay(uint32_t count);

/* Minimal vector table */
uint32_t *vector_table[] __attribute__((section(".isr_vector"))) = {
    (uint32_t*)SRAM_END,    // initial stack pointer (MSP)
    (uint32_t*)_start       // _start as Reset_Handler
};

/* Variables defined in linkerscript */
extern uint32_t _sidata;
extern uint32_t _sdata;
extern uint32_t _edata;

volatile uint32_t dataVar = 0x3f;

/* Data initialization */
inline void __initialize_data(uint32_t* flash_begin, uint32_t* data_begin, uint32_t* data_end) {
    uint32_t *p = data_begin;
    while(p < data_end)
        *p++ = *flash_begin++;
}

/* Entry point */
void __attribute__((noreturn,weak)) _start (void) {
    __initialize_data(&_sidata, &_sdata, &_edata);
    main();

    for(;;);
}

/* Main function */
int main() {
    /* Enable clock on GPIOA peripheral */
    *RCC_APB1ENR = 0x1;

    /* Configure the PA5 as output pull-up */
    *GPIOA_MODER |= 0x400;  // Sets MODER[11:10] = 0x1

    while(dataVar == 0x3f) {    // Always true
        *GPIOA_ODR = 0x20;
        delay(200000);
        *GPIOA_ODR = 0x0;
        delay(200000);
    }
}

void delay(uint32_t count) {
    while(count--);
}

I've added the initialization code just above the main(..) function. The linkerscript has also some modification:

/* ------------------------------------------------------------ */
/*                        Linkerscript                          */
/*                      for NUCLEO-F401RE                       */
/* ------------------------------------------------------------ */

/* Memory layout for STM32F401RE */
MEMORY
{
    FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
    SRAM (xrw) : ORIGIN = 0x20000000, LENGTH = 96K
}

/* The ENTRY(..) directive overrides the default entry point symbol _start.
 * In fact, the ENTRY(..) directive is meaningless for embedded chips,
 * but it is informative for debuggers. */
ENTRY(_start)

SECTIONS
{
    /* Program code into FLASH */
    .text : ALIGN(4)
    {
        *(.isr_vector)          /* Vector table */
        *(.text)                /* Program code */
        *(.text*)               /* Merge all .text.* sections inside the .text section */
        KEEP(*(.isr_vector))    /* Don't allow other tools to strip this off */
    } >FLASH


    _sidata = LOADADDR(.data);  /* Used by startup code to initialize data */

    .data : ALIGN(4)
    {
        . = ALIGN(4);
        _sdata = .;             /* Create a global symbol at data start */

        *(.data)
        *(.data*)

        . = ALIGN(4);
        _edata = .;             /* Define a global symbol at data end */
    } >SRAM AT >FLASH

}

The little application doesn't compile anymore. Actually, the compilation from main.c to main.o is still okay. But the linking process gets stuck:

C:\Users\Kristof\myProject>build

----------------------------------------------------------------
             )\     ***************************
   ( =_=_=_=<  |    * build NUCLEO-F401RE     *
             )(     ***************************
             ""


   Call the compiler on main.c


   Call the linker

c:/gnu_arm_embedded_toolchain/bin/../lib/gcc/arm-none-eabi/6.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m/fpv4-sp/hard/crt0.o: In function `_start':
(.text+0x64): undefined reference to `__bss_start__'
c:/gnu_arm_embedded_toolchain/bin/../lib/gcc/arm-none-eabi/6.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m/fpv4-sp/hard/crt0.o: In function `_start':
(.text+0x68): undefined reference to `__bss_end__'
collect2.exe: error: ld returned 1 exit status

   Post build

arm-none-eabi-objcopy: 'myApp.elf': No such file
arm-none-eabi-size: 'myApp.elf': No such file

----------------------------------------------------------------

 

3. What I've tried

I've omitted this part, otherwise this question gets too long ;-)

 

4. Solution

@berendi provided the solution. Thank you @berendi! Apparently I need to add the flags -nostdlib and -ffreestanding to gcc and the linker. The build.bat file now looks like this:

@echo off
setlocal EnableDelayedExpansion

echo.
echo ----------------------------------------------------------------
echo.             )\     ***************************
echo.   ( =_=_=_=^<  ^|    * build NUCLEO-F401RE     *     
echo.             )(     ***************************
echo.             ""                        
echo.                                       
echo.
echo.   Call the compiler on main.c
echo.
@arm-none-eabi-gcc main.c -o main.o -c -MMD -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -O0 -g3 -Wall -fmessage-length=0 -Werror-implicit-function-declaration -Wno-comment -Wno-unused-function -ffunction-sections -fdata-sections -ffreestanding -nostdlib
echo.
echo.   Call the linker
echo.
@arm-none-eabi-gcc main.o -o myApp.elf -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -specs=nosys.specs -specs=nano.specs -T linkerscript.ld -Wl,-Map=output.map -Wl,--gc-sections -ffreestanding -nostdlib
echo.
echo.   Post build
echo.
@arm-none-eabi-objcopy -O binary myApp.elf myApp.bin
arm-none-eabi-size myApp.elf
echo.
echo ----------------------------------------------------------------

Now it works! In his answer, @berendi also gives a few interesting remarks about the main.c file. I've applied most of them:

  1. Missing volatile keyword

  2. Empty loop

  3. Missing Memory Barrier (did I put the memory barrier in the correct place?)

  4. Missing delay after RCC enable

  5. Misleading symbolic name (apparently it should be RCC_AHB1ENR instead of RCC_APB1ENR).

  6. The vector table: this part I've skipped. Right now I don't really need a HardFault_Handler, MemManage_Handler, ... as this is just a tiny test for educational purposes.
    Nevertheless, I did notice that @berendi put a few interesting modifications in the way he declares the vector table. But I'm not entirely grasping what he's doing exactly.

The main.c file now looks like this:

/* ------------------------------------------------------------ */
/*                     Minimal application                      */
/*                      for NUCLEO-F401RE                       */
/* ------------------------------------------------------------ */
typedef unsigned long uint32_t;

/**
  \brief   Data Synchronization Barrier
  \details Acts as a special kind of Data Memory Barrier.
           It completes when all explicit memory accesses before this instruction complete.
 */
__attribute__((always_inline)) static inline void __DSB(void)
{
  __asm volatile ("dsb 0xF":::"memory");
}


/* Memory and peripheral start addresses (common to all STM32 MCUs) */
#define FLASH_BASE      0x08000000
#define SRAM_BASE       0x20000000
#define PERIPH_BASE     0x40000000

/* Work out end of RAM address as initial stack pointer
 * (specific of a given STM32 MCU) */
#define SRAM_SIZE       96*1024 //STM32F401RE has 96 KB of RAM
#define SRAM_END        (SRAM_BASE + SRAM_SIZE)

/* RCC peripheral addresses applicable to GPIOA
 * (specific of a given STM32 MCU) */
#define RCC_BASE        (PERIPH_BASE + 0x23800)
#define RCC_AHB1ENR     ((volatile uint32_t*)(RCC_BASE + 0x30))

/* GPIOA peripheral addresses
 * (specific of a given STM32 MCU) */
#define GPIOA_BASE      (PERIPH_BASE + 0x20000)
#define GPIOA_MODER     ((volatile uint32_t*)(GPIOA_BASE + 0x00))
#define GPIOA_ODR       ((volatile uint32_t*)(GPIOA_BASE + 0x14))

/* Function headers */
void __initialize_data(uint32_t*, uint32_t*, uint32_t*);
void _start (void);
int main(void);
void delay(uint32_t count);

/* Minimal vector table */
uint32_t *vector_table[] __attribute__((section(".isr_vector"))) = {
    (uint32_t*)SRAM_END,    // initial stack pointer (MSP)
    (uint32_t*)_start       // _start as Reset_Handler
};

/* Variables defined in linkerscript */
extern uint32_t _sidata;
extern uint32_t _sdata;
extern uint32_t _edata;

volatile uint32_t dataVar = 0x3f;

/* Data initialization */
inline void __initialize_data(uint32_t* flash_begin, uint32_t* data_begin, uint32_t* data_end) {
    uint32_t *p = data_begin;
    while(p < data_end)
        *p++ = *flash_begin++;
}

/* Entry point */
void __attribute__((noreturn,weak)) _start (void) {
    __initialize_data(&_sidata, &_sdata, &_edata);
    asm volatile("":::"memory"); // <- Did I put this instruction at the right spot?
    main();

    for(;;);
}

/* Main function */
int main() {
    /* Enable clock on GPIOA peripheral */
    *RCC_AHB1ENR = 0x1;
    __DSB();

    /* Configure the PA5 as output pull-up */
    *GPIOA_MODER |= 0x400;  // Sets MODER[11:10] = 0x1

    while(dataVar == 0x3f) {    // Always true
        *GPIOA_ODR = 0x20;
        delay(200000);
        *GPIOA_ODR = 0x0;
        delay(200000);
    }
}

void delay(uint32_t count) {
    while(count--){
        asm volatile("");
    }
}

PS: The book "Mastering STM32" from Carmine Noviello is an absolute masterpiece. You should read it!   => https://leanpub.com/mastering-stm32

Denazify answered 17/4, 2018 at 18:24 Comment(11)
Hi @PeterJ_01. You are absolutely right. I usually use makefiles. The reason I use a bat-file here, is because I just wanted to compile 1 file and link it into an executable. That's two commands. I could easily do that in a bat-file :-)Denazify
Use CMSIS definitions - or wtite tens of thousands of definitions by hand. Do not learn ARM-s from the low level book. Come back to it when start to understand it. There is no reason to write own startup files unless you really need it.Anthropomorphosis
Hi @PeterJ_01, I get what you mean. Of course, I wouldn't write thousands of definitions by hand. I also use CMSIS in normal projects. But this is just a tiny project for (self-)educational purpose. It's fun to get a blinking LED with just a single code-file and a tiny linkerscript.Denazify
IMO waste of time. I can give more exercises if you wish. For F303 processor place the stack at the beginning of the CCM RAM. Place the .bss & data into the CCMRAM. Create sections .bbssram ad .datastam in the SRAM memory. Write (or modify) the startup code to properly initialize all 4Anthropomorphosis
Have you tried adding one or more of the following command line options when linking, -nostartfiles, -nodefaultlibs, or -nostdlib?Davon
Contact the book's author and send him/her a link to your question here. If I was the author I'd sure as heck help a diligent reader like you!Irreverence
the definitions are not the issue here (well the pointer stuff can lead to issues). nostartfiles, etc is not really an issue here depends on the compiler version. using C exclusively rather than simpler asm to do the .data and .bss initialization that will make your life easier.Houseleek
batch files, script, not relevant. you didnt show your build commands for the latter example right? using gcc to link just feels wrong, but without it getting gcclib stuff in there takes some more work. you are linking in two bootstraps that is the issue you are having. avoiding .data requirements makes for much easier management of baremetal code. likewise assuming .bss is zeroed, should never read before you write anyway, a bad habit if you do, the tools are starting to warn about that now which is a good thing.Houseleek
decide if you want to define stuff in the MEMORY section or the SECTIONS section (get rid of the (rx) (xrw) thats not helping you in the long run, if you want a complicated SECTIONS (or that section at all) then do the work there.Houseleek
These examples were derived from code that minimizes language and tool issues, I can (already have in the past) post it if desired. Linker scripts are an artform of themselves, the documentation is not as good as it needs to be so much hacking is required.Houseleek
@Denazify The vector table in my answer is an array of function pointers. This way, I only have to cast the stack pointer to another type, and I can simply list the handler functions. The memory barrier is at the right place.Cayes
C
8

You can tell gcc not to use the library.

The Compiler

By default, gcc assumes that you are using a standard C library, and can emit code that calls some functions. For example, when optimizations are enabled, it detects loops that copy a piece of memory, and may substitute them with a call to memcpy(). Disable it with -ffreestanding.

The Linker

The linker assumes as well that you want to link your program with the C library and startup code. The library startup code is responsible for initializing the library and the program execution environment. It has a function named _start() which has to be called after reset. One of its functions is to fill the .bss segment (see below) with zero. If the symbols that delimit .bss are not defined, then _startup() cannot be linked. Had you named your startup function anything else but _startup(), then the library startup would have been siletly dropped by the linker as an unused function, and the code could have been linked.

You can tell the linker not to link any standard library or startup code with -nostdlib, then the library supplied startup function name would not conflict with yours, and you would get a linker error every time you accidentally invoked a library function.

Missing volatile

Your register definitions are missing the volatile qualifier. Without it, subsequent writes to *GPIOA_ODR will be optimized out. The compiler will move this "invariant code" out of the loop. Changing the type in the register definitions to (volatile uint32_t*) would fix that.

Empty loop

The optimizer can recognize that the delay loop does nothing, and eliminate it completely to speed up execution. Add an empty but non-removable asm volatile(""); instruction to the delay loop.

Missing Memory Barrier

You are initializing the .data section that holds dataVar in a C function. The *p in __initialize_data() is effectively an alias for dataVar, and the compiler has no way to know it. The optimizer could theoretically rearrange the test of dataVar before __initialize_data(). Even if dataVar is volatile, *p is not, therefore ordering is not guaranteed.

After the data initialization loop, you should tell the compiler that program variables are changed by a mechanism unknown to the compiler:

asm volatile("":::"memory");

It's an old-fashioned gcc extension, the latest C standards might have defined a portable way to do this (which is not recognized by older gcc versions).

Missing delay after RCC enable

The Errata is saying,

A delay between an RCC peripheral clock enable and the effective peripheral enabling should be taken into account in order to manage the peripheral read/write to registers.

This delay depends on the peripheral mapping:

• If the peripheral is mapped on AHB: the delay should be equal to 2 AHB cycles.

• If the peripheral is mapped on APB: the delay should be equal to 1 + (AHB/APB prescaler) cycles.

Workarounds

  1. Use the DSB instruction to stall the Cortex®-M4 CPU pipeline until the instruction is completed.

Therefore, insert a

__DSB();

after *RCC_APB1ENR = 0x1; (which should be called something else)

Misleading symbolic name

Although the address for enabling GPIOA in RCC seems to be correct, the register is called RCC_AHB1ENR in the documentation. It will confuse people trying to understand your code.

The Vector Table

Although technically you can get away with having only a stack pinter and a reset handler in it, I'd too recommend having a few more entries, at least the fault handlers for simple troubleshooting.

__attribute__ ((section(".isr_vector"),used))
void (* const _vectors[]) (void) = {
          (void (*const)(void))(&__stack),
  Reset_Handler,
  NMI_Handler,
  HardFault_Handler,
  MemManage_Handler,
  BusFault_Handler,
  UsageFault_Handler
}

The Linker Script

At the bare minimum, it must define a section for your vector table, and the code. A program must have a start address and some code, static data is optional. The rest depends on what kind of data your program is using. You could technically omit them from the linker script if there are no data of a particular type.

  • .rodata: read-only data, const arrays and structs go here. They remain in flash. (simple const variables are usually put in the code)
  • .data: initialized variables, everything you declare with an = sign, and without const.
  • .bss: variables that should be zero-initialized in C, i.e. global and static ones.

As you don't need .rodata or .bss now, it's fine.

Cayes answered 18/4, 2018 at 9:19 Comment(6)
Great answer! Also lots of thanks for all the extra information you provide. I've added the -nostdlib flag to both the compilation and linking command, but I still get the same error. That's pretty weird ...Denazify
Hi @berendi, I've tried adding the flags -ffreestanding, -nostdlib to the compiler and -Wl,-nostartfiles, -Wl,-nostdlib and -Wl,--ffreestanding to the linker. But now I get another error message. Please see the EDIT at the bottom of my question for more details. Thank you so much for your help :-)Denazify
@Denazify looks like the linker is complaining about an option which is recognized as something else. I'll check when I'm back at my pc, if you can't figure it out.Cayes
Hi @berendi. Thank you very much. The easiest way for you to check it, is to copy-paste the files main.c, linkerscript.ld, build.bat and clean.bat. Please run the build and let me know if you experience the same error message :-)Denazify
@Denazify you don't need the -Wl, prefix, just pass -nostdlib -ffreestanding to gcc on the second invocation. You don't need -nostartfiles, because it's implied by -nostdlib. Also note the further issues with volatile and the delay loop I've added.Cayes
Thank you so much! It works! I've applied your modifications as much as possible, and I've written it all down at the bottom of my question. Thank you once again for all the time and effort you've spent to help me :-) You're so very kind!Denazify
H
3

Linker scripts in general are an artform, they are their own programming language and gnu's are certainly a bit of a nightmare. Divide the task into figuring out the linker script from making a working binary, once you can see the linker script is doing what you want then make the bootstrap code to use it. Take advantage of the toolchain.

The example the author used was derived from code written specifically to be used as baremetal examples that maximize success. Avoided common language and toolchain issues, yet be portable across many versions of the toolchain and to be easily ported to other toolchains (minimal reliance on the toolchain, in particular the linker script which leads to the bootstrap). The author of the book used that code but added risk to it to not be as reliable of an example.

Avoiding .data specifically and not relying on .bss to be zeroed when you write baremetal code goes a very long way toward long term success.

It was also modified such that optimization would prevent that code from working (well blinking at a rate you can see).

An example somewhat minimal linker script for binutils that you can modify to work toward .data and .bss initialization looks generically like this

test.ld

MEMORY
{
    bob : ORIGIN = 0x8000, LENGTH = 0x1000
    ted : ORIGIN = 0xA000, LENGTH = 0x1000
}

SECTIONS
{
   .text : { *(.text*) } > bob
   __data_rom_start__ = .;
   .data : {
    __data_start__ = .;
    *(.data*)
   } > ted AT > bob
   __data_end__ = .;
   __data_size__ = __data_end__ - __data_start__;
   .bss  : {
   __bss_start__ = .;
   *(.bss*)
   } > ted
   __bss_end__ = .;
   __bss_size__ = __bss_end__ - __bss_start__;
}

(note memory names dont have to be rom or ram or flash or data or whatever bob is program space and ted is memory btw, change the addresses as desired)

How you see what is going on is you can link with a simple example or with your code, you need some .data and some .bss (and some .text).

vectors.s

.thumb
.globl _start
_start:
.word 0x20001000
.word reset
.thumb_func
reset:
    bl notmain
    b .

.globl bss_start
bss_start: .word __bss_start__
.globl bss_end
bss_end: .word __bss_end__
.word __bss_size__
.globl data_rom_start
data_rom_start:
.word __data_rom_start__
.globl data_start
data_start:
.word __data_start__
.globl data_end
data_end:
.word __data_end__
.word __data_size__

so.c

unsigned int a=1;
unsigned int b=2;
unsigned int c;
unsigned int d;
unsigned int e;

unsigned int notmain ( void )
{
    return(a+b+c+d+e);
}

build

arm-none-eabi-as vectors.s -o vectors.o
arm-none-eabi-gcc -O2 -c -mthumb so.c -o so.o
arm-none-eabi-ld -T test.ld vectors.o so.o -o vectors.elf
arm-none-eabi-objdump -D vectors.elf

The code so far is not specific to arm-none-whatever or arm-linux-whatever versions of the toolchain. If/when you need gcclib items you can use gcc instead of ld but you have to be careful when doing that...or provide the path to libgcc and use ld.

What we get from this code is linker script debugging on the cheap:

Disassembly of section .text:

00008000 <_start>:
    8000:   20001000    andcs   r1, r0, r0
    8004:   00008009    andeq   r8, r0, r9

00008008 <reset>:
    8008:   f000 f810   bl  802c <notmain>
    800c:   e7fe        b.n 800c <reset+0x4>

0000800e <bss_start>:
    800e:   0000a008    andeq   sl, r0, r8

00008012 <bss_end>:
    8012:   0000a014    andeq   sl, r0, r4, lsl r0
    8016:   0000000c    andeq   r0, r0, ip

0000801a <data_rom_start>:
    801a:   00008058    andeq   r8, r0, r8, asr r0

0000801e <data_start>:
    801e:   0000a000    andeq   sl, r0, r0

00008022 <data_end>:
    8022:   0000a008    andeq   sl, r0, r8
    8026:   00000008    andeq   r0, r0, r8
    ...

We care about the 32 bit values being created the andeq disassembly is because the disassembler is trying to disassemble those values as instructions which they are not. The reset instructions are real the rest is 32 bit values we are generating. might be able to use readelf, but getting used to disassembling, insuring the vector table is correct as step one, which is easy to see in the disassembly. Using the disassembler as a habit can then lead to using it as above to show you what the linker generated.

If you dont get the linker script variables right you wont be able to write a successful bootstrap, if you dont have a good way to see what the linker is producing you will fail on a regular basis.

Yes, you could have exposed them in C and not assembly, the toolchain would still help you there.

You can work toward this now that you can see what the linker is doing:

.thumb
.globl _start
_start:
.word 0x20001000
.word reset
.thumb_func
reset:
    ldr r0,=__bss_start__
    ldr r1,=__bss_size__
    @ zero this

    ldr r0,=__data_rom_start__
    ldr r1,=__data_start__
    ldr r2,=__data_size__
    @ copy this

    bl notmain
    b .

giving something like this

00008000 <_start>:
    8000:   20001000    andcs   r1, r0, r0
    8004:   00008009    andeq   r8, r0, r9

00008008 <reset>:
    8008:   4803        ldr r0, [pc, #12]   ; (8018 <reset+0x10>)
    800a:   4904        ldr r1, [pc, #16]   ; (801c <reset+0x14>)
    800c:   4804        ldr r0, [pc, #16]   ; (8020 <reset+0x18>)
    800e:   4905        ldr r1, [pc, #20]   ; (8024 <reset+0x1c>)
    8010:   4a05        ldr r2, [pc, #20]   ; (8028 <reset+0x20>)
    8012:   f000 f80b   bl  802c <notmain>
    8016:   e7fe        b.n 8016 <reset+0xe>
    8018:   0000a008    andeq   sl, r0, r8
    801c:   0000000c    andeq   r0, r0, ip
    8020:   00008058    andeq   r8, r0, r8, asr r0
    8024:   0000a000    andeq   sl, r0, r0
    8028:   00000008    andeq   r0, r0, r8

0000802c <notmain>:
    802c:   4b06        ldr r3, [pc, #24]   ; (8048 <notmain+0x1c>)
    802e:   6818        ldr r0, [r3, #0]
    8030:   685b        ldr r3, [r3, #4]
    8032:   18c0        adds    r0, r0, r3

If you then align the items in the linker script the copy/zero code gets even simpler you can stick to 1 to some number N whole registers rather than dealing with bytes or halfwords, can use ldr/str, ldrd/strd (if available) or ldm/stm (and not need ldrb/strb nor ldrh/strh), tight simple few line loops to complete the job.

I highly recommend you do not use C for your bootstrap.

Note that the ld linker script variables are very sensitive to position (inside or outside curly braces)

The above linker script is somewhat typical of what you will find in stock linker scripts a defined start and end, sometimes the size is computed in the linker script sometimes the bootstrap code computes the size or the bootstrap code can just loop until the address equals the end value, depends on the overall system design between the two.

Your specific issue BTW is you linked in two bootstraps, at the time I wrote this I dont see your command line(s) in the question so that would tell us more. That is why you are seeing the bss_start, etc, things that you didnt put in your linker script but are often found in stock ones that come with a pre-built toolchain (similar to the above but more complicated)

It could be by using gcc instead of ld and without the the various -nostartfiles options (that it pulled in crt0.o), just try ld instead of gcc and see what changes. You would have failed with the original example had it been something like this though so I dont think that is the issue here. If you used the same command lines the failure should have been on both examples not just the latter.

Houseleek answered 18/4, 2018 at 0:56 Comment(5)
A nightmare? It is a very simple definition language. BTW I did not DV.Anthropomorphosis
Hi @old_timer. Thank you very much for your answer, and all the effort you put in it! I feel sorry someone has downvoted it - especially because that person didn't clarify why in a comment.Denazify
The initial downvotes came from my first two drafts, gave up then went back in and took a different approach. It is very rare, once every two years or so we get someone who is actually interested in understanding and learning like your self, most folks want the drink the kool-aid answer, and dont care about poison (just want the fish dont want to learn to fish). IMO you appear to want to learn to fish, so have personal interest in your success, someone has to know how this stuff works so devices, compilers, etc can continue to operate in the future.Houseleek
Please dont give up baremetal even if you just do it for fun...Will help all of the other programming you do in whatever language on whatever platform.Houseleek
@Houseleek arduino generation :(Anthropomorphosis
H
2

The book you're reading has led you astray. Discard it and start learning from another source.

I see at least four major problems with what it has told you to do:

  1. The linker script and _start function you included is missing a number of important sections, and will either malfunction or fail to link many executables. Most notably, it lacks any handling for BSS (zero-filled) sections.

  2. The vector table in main.c is beyond "minimal"; it lacks the required definitions for even the standard ARM interrupt vectors. Without these, debugging hardfaults will become very difficult, as the microcontroller will treat random code following the vector table as an interrupt vector when a fault occurs, which will probably lead to a secondary fault as it fails to load code from that "address".

  3. The startup functions given by your book bypass the libc startup functions. This will cause some portions of the standard C library, as well as any C++ code, to fail to work correctly.

  4. You are defining peripheral addresses yourself in main.c. These addresses are all defined in standard ST header files (e.g. <stm32f4xx.h>), so there is no need to define them yourself.

As a starter, I would recommend that you refer to the startup code provided by ST in any of their examples. These will all include a complete linker script and startup code.

Homochromous answered 17/4, 2018 at 18:47 Comment(7)
anything that requires a C library is a LONG way from minimal or "Absolute Minimal". If you will notice .bss is covered, need to know where to look (they took both approaches, a little overkill). ST header files are FAR from minimal as well and carry some heavy baggage. So please dont lead the OP astray into a more complicated, less likely to succeed path.Houseleek
@Houseleek 1) With -Wl,-gc-sections enabled, the C library is exactly as big as the part of it you pull in. For a minimal Cortex-M application, that's a couple of hundred bytes at most, most of it in initialization code which is expected by the compiler.Homochromous
@Houseleek 2) The ST header files have no overhead. They only define structures and macros -- most of which are identical or very similar to what the OP was already defining. You may be thinking of the ST peripheral library, which is a separate matter.Homochromous
have you actually looked at those headers its not the size that is the issue. C library files, depends on the library call as to how huge it gets, the more code you pull in the less likely you are to succeed. for something called an absolute minimum application.Houseleek
@Houseleek I am intimately familiar with the ST part-family headers. Again: They are headers, not libraries. They contain no code, only declarations.Homochromous
I am also intimately familiar, you are missing the point, the OP is looking for success here, not more obstacles...Houseleek
@Houseleek misspelling a register name in the homegrown definition is not part of the recipe for successCayes
J
2

As old_timer hinted in the comments, using gcc to link is a problem.

If you change the linker call in your batch file to use ld, it links without error. Try the following:

echo.
echo.   Call the linker
echo.
@arm-none-eabi-ld main.o -o myApp.elf -T linkerscript.ld
Jacquijacquie answered 18/4, 2018 at 17:32 Comment(4)
Hi @DKrueger, thank you for this advice. What exactly makes the difference? I remember reading some time ago that linking with gcc is better than with ld, but I don't remember why.Denazify
@Denazify gcc is a frontend to ld, it figures out which libraries need to be linked (based on instruction set options, floating point etc), and passes their path to ld. In your case, these libraries are not needed, you can use gcc -nostdlib to link without them, or just call ld directly.Cayes
As @berendi said, gcc automatically selects extra things to add to the linker. This is convenient for desktop apps where the same items would need to be passed to ld every time. But your project doesn't need them. And as you noticed, it automatically pulls in crt0, which prevents your implementation of _start from being linked. So instead of adding a bunch of flags to gcc to tell it to exclude the extras, you can simply use ld and specify exactly what is linked.Jacquijacquie
Hi @berendi and DKrueger, thank you very much for clarifying this :-)Denazify

© 2022 - 2024 — McMap. All rights reserved.