I'm building a tiny microcontroller with only the bare essentials for self-educational purposes. This way, I can refresh my knowledge about topics like the linkerscript, the startup code, ...
EDIT:
I got quite a lot of comments pointing out that the "absolute minimal STM32-application" shown below is no good. You are absolutely right when noticing that the vector table is not complete, the .bss
-section is not taken care of, the peripheral addresses are not complete, ... Please allow me to explain why.
It has never been the purpose of the author to write a complete and useful application in this particular chapter. His purpose was to explain step-by-step how a linkerscript works, how startup code works, what the boot procedure of an STM32 looks like, ... purely for educational purposes. I can appreciate this approach, and learned a lot.
The example I have put below is taken from the middle of the chapter in question. The chapter keeps adding more parts to the linkerscript and startup code (for example initialization of
.bss
-section) as it goes forward.
The reason I put files here from the middle of his chapter, is because I got stuck at a particular error message. I want to get that fixed before continuing.The chapter in question is somewhere at the end of his book. It is intended for the more experienced or curious reader who wants to gain deeper knowledge about topics most people don't even consider (most people use the standard linkerscript and startup code given by the manufacturer without ever reading it).
Keeping this in mind, please let us focus on the technical issue at hand (as described below in the error messages). Please also accept my sincere apologies that I didn't clarify the intentions of the writer earlier. But I've done it now, so we can move on ;-)
1. Absolute minimal STM32-application
The tutorial I'm following is chapter 20 from this book: "Mastering STM32" (https://leanpub.com/mastering-stm32). The book explains how to make a tiny microcontroller application with two files: main.c
and linkerscript.ld
. As I'm not using an IDE (like Eclipse), I also added build.bat
and clean.bat
to generate the compilation commands. So my project folder looks like this:
Before I continue, I should perhaps give some more details about my system:
OS: Windows 10, 64-bit
Microcontroller: NUCLEO-F401RE board with STM32F401RE microcontroller.
Compiler:
arm-none-eabi-gcc
version 6.3.1 20170620 (release) [ARM/embedded-6-branch revision 249437].
The main file looks like this:
/* ------------------------------------------------------------ */
/* Minimal application */
/* for NUCLEO-F401RE */
/* ------------------------------------------------------------ */
typedef unsigned long uint32_t;
/* Memory and peripheral start addresses (common to all STM32 MCUs) */
#define FLASH_BASE 0x08000000
#define SRAM_BASE 0x20000000
#define PERIPH_BASE 0x40000000
/* Work out end of RAM address as initial stack pointer
* (specific of a given STM32 MCU) */
#define SRAM_SIZE 96*1024 //STM32F401RE has 96 KB of RAM
#define SRAM_END (SRAM_BASE + SRAM_SIZE)
/* RCC peripheral addresses applicable to GPIOA
* (specific of a given STM32 MCU) */
#define RCC_BASE (PERIPH_BASE + 0x23800)
#define RCC_APB1ENR ((uint32_t*)(RCC_BASE + 0x30))
/* GPIOA peripheral addresses
* (specific of a given STM32 MCU) */
#define GPIOA_BASE (PERIPH_BASE + 0x20000)
#define GPIOA_MODER ((uint32_t*)(GPIOA_BASE + 0x00))
#define GPIOA_ODR ((uint32_t*)(GPIOA_BASE + 0x14))
/* Function headers */
int main(void);
void delay(uint32_t count);
/* Minimal vector table */
uint32_t *vector_table[] __attribute__((section(".isr_vector"))) = {
(uint32_t*)SRAM_END, // initial stack pointer (MSP)
(uint32_t*)main // main as Reset_Handler
};
/* Main function */
int main() {
/* Enable clock on GPIOA peripheral */
*RCC_APB1ENR = 0x1;
/* Configure the PA5 as output pull-up */
*GPIOA_MODER |= 0x400; // Sets MODER[11:10] = 0x1
while(1) { // Always true
*GPIOA_ODR = 0x20;
delay(200000);
*GPIOA_ODR = 0x0;
delay(200000);
}
}
void delay(uint32_t count) {
while(count--);
}
The linkerscript looks like this:
/* ------------------------------------------------------------ */
/* Linkerscript */
/* for NUCLEO-F401RE */
/* ------------------------------------------------------------ */
/* Memory layout for STM32F401RE */
MEMORY
{
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
SRAM (xrw) : ORIGIN = 0x20000000, LENGTH = 96K
}
/* The ENTRY(..) directive overrides the default entry point symbol _start.
* Here we define the main-routine as the entry point.
* In fact, the ENTRY(..) directive is meaningless for embedded chips,
* but it is informative for debuggers. */
ENTRY(main)
SECTIONS
{
/* Program code into FLASH */
.text : ALIGN(4)
{
*(.isr_vector) /* Vector table */
*(.text) /* Program code */
*(.text*) /* Merge all .text.* sections inside the .text section */
KEEP(*(.isr_vector)) /* Don't allow other tools to strip this off */
} >FLASH
_sidata = LOADADDR(.data); /* Used by startup code to initialize data */
.data : ALIGN(4)
{
. = ALIGN(4);
_sdata = .; /* Create a global symbol at data start */
*(.data)
*(.data*)
. = ALIGN(4);
_edata = .; /* Define a global symbol at data end */
} >SRAM AT >FLASH
}
The build.bat
file calls the compiler on main.c, and next the linker:
@echo off
setlocal EnableDelayedExpansion
echo.
echo ----------------------------------------------------------------
echo. )\ ***************************
echo. ( =_=_=_=^< ^| * build NUCLEO-F401RE *
echo. )( ***************************
echo. ""
echo.
echo.
echo. Call the compiler on main.c
echo.
@arm-none-eabi-gcc main.c -o main.o -c -MMD -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -O0 -g3 -Wall -fmessage-length=0 -Werror-implicit-function-declaration -Wno-comment -Wno-unused-function -ffunction-sections -fdata-sections
echo.
echo. Call the linker
echo.
@arm-none-eabi-gcc main.o -o myApp.elf -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -specs=nosys.specs -specs=nano.specs -T linkerscript.ld -Wl,-Map=output.map -Wl,--gc-sections
echo.
echo. Post build
echo.
@arm-none-eabi-objcopy -O binary myApp.elf myApp.bin
arm-none-eabi-size myApp.elf
echo.
echo ----------------------------------------------------------------
The clean.bat
file removes all the compiler output:
@echo off
setlocal EnableDelayedExpansion
echo ----------------------------------------------------------------
echo. __ **************
echo. __\ \___ * clean *
echo. \ _ _ _ \ **************
echo. \_`_`_`_\
echo.
del /f /q main.o
del /f /q main.d
del /f /q myApp.bin
del /f /q myApp.elf
del /f /q output.map
echo ----------------------------------------------------------------
Building this works. I get the following output:
C:\Users\Kristof\myProject>build
----------------------------------------------------------------
)\ ***************************
( =_=_=_=< | * build NUCLEO-F401RE *
)( ***************************
""
Call the compiler on main.c
Call the linker
Post build
text data bss dec hex filename
112 0 0 112 70 myApp.elf
----------------------------------------------------------------
2. Proper startup code
Maybe you have noticed that the minimal application didn't have proper startup code to initialize the global variables in the .data-section. Chapter 20.2.2 .data and .bss Sections initialization from the "Mastering STM32" book explains how to do this.
As I follow along, my main.c
file now looks like this:
/* ------------------------------------------------------------ */
/* Minimal application */
/* for NUCLEO-F401RE */
/* ------------------------------------------------------------ */
typedef unsigned long uint32_t;
/* Memory and peripheral start addresses (common to all STM32 MCUs) */
#define FLASH_BASE 0x08000000
#define SRAM_BASE 0x20000000
#define PERIPH_BASE 0x40000000
/* Work out end of RAM address as initial stack pointer
* (specific of a given STM32 MCU) */
#define SRAM_SIZE 96*1024 //STM32F401RE has 96 KB of RAM
#define SRAM_END (SRAM_BASE + SRAM_SIZE)
/* RCC peripheral addresses applicable to GPIOA
* (specific of a given STM32 MCU) */
#define RCC_BASE (PERIPH_BASE + 0x23800)
#define RCC_APB1ENR ((uint32_t*)(RCC_BASE + 0x30))
/* GPIOA peripheral addresses
* (specific of a given STM32 MCU) */
#define GPIOA_BASE (PERIPH_BASE + 0x20000)
#define GPIOA_MODER ((uint32_t*)(GPIOA_BASE + 0x00))
#define GPIOA_ODR ((uint32_t*)(GPIOA_BASE + 0x14))
/* Function headers */
void __initialize_data(uint32_t*, uint32_t*, uint32_t*);
void _start (void);
int main(void);
void delay(uint32_t count);
/* Minimal vector table */
uint32_t *vector_table[] __attribute__((section(".isr_vector"))) = {
(uint32_t*)SRAM_END, // initial stack pointer (MSP)
(uint32_t*)_start // _start as Reset_Handler
};
/* Variables defined in linkerscript */
extern uint32_t _sidata;
extern uint32_t _sdata;
extern uint32_t _edata;
volatile uint32_t dataVar = 0x3f;
/* Data initialization */
inline void __initialize_data(uint32_t* flash_begin, uint32_t* data_begin, uint32_t* data_end) {
uint32_t *p = data_begin;
while(p < data_end)
*p++ = *flash_begin++;
}
/* Entry point */
void __attribute__((noreturn,weak)) _start (void) {
__initialize_data(&_sidata, &_sdata, &_edata);
main();
for(;;);
}
/* Main function */
int main() {
/* Enable clock on GPIOA peripheral */
*RCC_APB1ENR = 0x1;
/* Configure the PA5 as output pull-up */
*GPIOA_MODER |= 0x400; // Sets MODER[11:10] = 0x1
while(dataVar == 0x3f) { // Always true
*GPIOA_ODR = 0x20;
delay(200000);
*GPIOA_ODR = 0x0;
delay(200000);
}
}
void delay(uint32_t count) {
while(count--);
}
I've added the initialization code just above the main(..)
function. The linkerscript has also some modification:
/* ------------------------------------------------------------ */
/* Linkerscript */
/* for NUCLEO-F401RE */
/* ------------------------------------------------------------ */
/* Memory layout for STM32F401RE */
MEMORY
{
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
SRAM (xrw) : ORIGIN = 0x20000000, LENGTH = 96K
}
/* The ENTRY(..) directive overrides the default entry point symbol _start.
* In fact, the ENTRY(..) directive is meaningless for embedded chips,
* but it is informative for debuggers. */
ENTRY(_start)
SECTIONS
{
/* Program code into FLASH */
.text : ALIGN(4)
{
*(.isr_vector) /* Vector table */
*(.text) /* Program code */
*(.text*) /* Merge all .text.* sections inside the .text section */
KEEP(*(.isr_vector)) /* Don't allow other tools to strip this off */
} >FLASH
_sidata = LOADADDR(.data); /* Used by startup code to initialize data */
.data : ALIGN(4)
{
. = ALIGN(4);
_sdata = .; /* Create a global symbol at data start */
*(.data)
*(.data*)
. = ALIGN(4);
_edata = .; /* Define a global symbol at data end */
} >SRAM AT >FLASH
}
The little application doesn't compile anymore. Actually, the compilation from main.c
to main.o
is still okay. But the linking process gets stuck:
C:\Users\Kristof\myProject>build
----------------------------------------------------------------
)\ ***************************
( =_=_=_=< | * build NUCLEO-F401RE *
)( ***************************
""
Call the compiler on main.c
Call the linker
c:/gnu_arm_embedded_toolchain/bin/../lib/gcc/arm-none-eabi/6.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m/fpv4-sp/hard/crt0.o: In function `_start':
(.text+0x64): undefined reference to `__bss_start__'
c:/gnu_arm_embedded_toolchain/bin/../lib/gcc/arm-none-eabi/6.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m/fpv4-sp/hard/crt0.o: In function `_start':
(.text+0x68): undefined reference to `__bss_end__'
collect2.exe: error: ld returned 1 exit status
Post build
arm-none-eabi-objcopy: 'myApp.elf': No such file
arm-none-eabi-size: 'myApp.elf': No such file
----------------------------------------------------------------
3. What I've tried
I've omitted this part, otherwise this question gets too long ;-)
4. Solution
@berendi provided the solution. Thank you @berendi! Apparently I need to add the flags -nostdlib
and -ffreestanding
to gcc and the linker. The build.bat
file now looks like this:
@echo off
setlocal EnableDelayedExpansion
echo.
echo ----------------------------------------------------------------
echo. )\ ***************************
echo. ( =_=_=_=^< ^| * build NUCLEO-F401RE *
echo. )( ***************************
echo. ""
echo.
echo.
echo. Call the compiler on main.c
echo.
@arm-none-eabi-gcc main.c -o main.o -c -MMD -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -O0 -g3 -Wall -fmessage-length=0 -Werror-implicit-function-declaration -Wno-comment -Wno-unused-function -ffunction-sections -fdata-sections -ffreestanding -nostdlib
echo.
echo. Call the linker
echo.
@arm-none-eabi-gcc main.o -o myApp.elf -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -specs=nosys.specs -specs=nano.specs -T linkerscript.ld -Wl,-Map=output.map -Wl,--gc-sections -ffreestanding -nostdlib
echo.
echo. Post build
echo.
@arm-none-eabi-objcopy -O binary myApp.elf myApp.bin
arm-none-eabi-size myApp.elf
echo.
echo ----------------------------------------------------------------
Now it works!
In his answer, @berendi also gives a few interesting remarks about the main.c
file. I've applied most of them:
Missing
volatile
keywordEmpty loop
Missing Memory Barrier (did I put the memory barrier in the correct place?)
Missing delay after RCC enable
Misleading symbolic name (apparently it should be
RCC_AHB1ENR
instead ofRCC_APB1ENR
).The vector table: this part I've skipped. Right now I don't really need a
HardFault_Handler
,MemManage_Handler
, ... as this is just a tiny test for educational purposes.
Nevertheless, I did notice that @berendi put a few interesting modifications in the way he declares the vector table. But I'm not entirely grasping what he's doing exactly.
The main.c
file now looks like this:
/* ------------------------------------------------------------ */
/* Minimal application */
/* for NUCLEO-F401RE */
/* ------------------------------------------------------------ */
typedef unsigned long uint32_t;
/**
\brief Data Synchronization Barrier
\details Acts as a special kind of Data Memory Barrier.
It completes when all explicit memory accesses before this instruction complete.
*/
__attribute__((always_inline)) static inline void __DSB(void)
{
__asm volatile ("dsb 0xF":::"memory");
}
/* Memory and peripheral start addresses (common to all STM32 MCUs) */
#define FLASH_BASE 0x08000000
#define SRAM_BASE 0x20000000
#define PERIPH_BASE 0x40000000
/* Work out end of RAM address as initial stack pointer
* (specific of a given STM32 MCU) */
#define SRAM_SIZE 96*1024 //STM32F401RE has 96 KB of RAM
#define SRAM_END (SRAM_BASE + SRAM_SIZE)
/* RCC peripheral addresses applicable to GPIOA
* (specific of a given STM32 MCU) */
#define RCC_BASE (PERIPH_BASE + 0x23800)
#define RCC_AHB1ENR ((volatile uint32_t*)(RCC_BASE + 0x30))
/* GPIOA peripheral addresses
* (specific of a given STM32 MCU) */
#define GPIOA_BASE (PERIPH_BASE + 0x20000)
#define GPIOA_MODER ((volatile uint32_t*)(GPIOA_BASE + 0x00))
#define GPIOA_ODR ((volatile uint32_t*)(GPIOA_BASE + 0x14))
/* Function headers */
void __initialize_data(uint32_t*, uint32_t*, uint32_t*);
void _start (void);
int main(void);
void delay(uint32_t count);
/* Minimal vector table */
uint32_t *vector_table[] __attribute__((section(".isr_vector"))) = {
(uint32_t*)SRAM_END, // initial stack pointer (MSP)
(uint32_t*)_start // _start as Reset_Handler
};
/* Variables defined in linkerscript */
extern uint32_t _sidata;
extern uint32_t _sdata;
extern uint32_t _edata;
volatile uint32_t dataVar = 0x3f;
/* Data initialization */
inline void __initialize_data(uint32_t* flash_begin, uint32_t* data_begin, uint32_t* data_end) {
uint32_t *p = data_begin;
while(p < data_end)
*p++ = *flash_begin++;
}
/* Entry point */
void __attribute__((noreturn,weak)) _start (void) {
__initialize_data(&_sidata, &_sdata, &_edata);
asm volatile("":::"memory"); // <- Did I put this instruction at the right spot?
main();
for(;;);
}
/* Main function */
int main() {
/* Enable clock on GPIOA peripheral */
*RCC_AHB1ENR = 0x1;
__DSB();
/* Configure the PA5 as output pull-up */
*GPIOA_MODER |= 0x400; // Sets MODER[11:10] = 0x1
while(dataVar == 0x3f) { // Always true
*GPIOA_ODR = 0x20;
delay(200000);
*GPIOA_ODR = 0x0;
delay(200000);
}
}
void delay(uint32_t count) {
while(count--){
asm volatile("");
}
}
PS: The book "Mastering STM32" from Carmine Noviello is an absolute masterpiece. You should read it! => https://leanpub.com/mastering-stm32
-nostartfiles
,-nodefaultlibs
, or-nostdlib
? – Davon