The GNU ld (linker script) manual Section 3.5.5 Source Code Reference has some really important information on how to access linker script "variables" (which are actually just integer addresses) in C source code. I used this info. to extensively use linker script variables, and I wrote this answer here: How to get value of variable defined in ld linker script from C.
However, it is easy to do it wrong and make the mistake of trying to access a linker script variable's value (mistakenly) instead of its address, since this is a bit esoteric. The manual (link above) says:
This means that you cannot access the value of a linker script defined symbol - it has no value - all you can do is access the address of a linker script defined symbol.
Hence when you are using a linker script defined symbol in source code you should always take the address of the symbol, and never attempt to use its value.
The question: So, if you do attempt to access a linker script variable's value, is this "undefined behavior"?
Quick refresher:
Imagine in linker script (ex: STM32F103RBTx_FLASH.ld) you have:
/* Specify the memory areas */
MEMORY
{
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 128K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 20K
}
/* Some custom variables (addresses) I intend to access from my C source code */
__flash_start__ = ORIGIN(FLASH);
__flash_end__ = ORIGIN(FLASH) + LENGTH(FLASH);
__ram_start__ = ORIGIN(RAM);
__ram_end__ = ORIGIN(RAM) + LENGTH(RAM);
And in your C source code you do:
// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)&__flash_start__);
// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)__flash_start__);
// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", __flash_start__);
Sample printed output
(this is real output: it was actually compiled, run, and printed by an STM32 mcu):
__flash_start__ addr = 0x8000000
__flash_start__ addr = 0x8000000
__flash_start__ addr = 0x20080000
<== NOTICE LIKE I SAID ABOVE: this one is completely wrong (even though it compiles and runs)! <== Update Mar. 2020: actually, see my answer, this is just fine and right too, it just does something different is all.
Update:
Response to @Eric Postpischil's 1st comment:
The C standard does not define anything at all about linker script symbols. Any specification of behavior is up to the GNU tools. That said, if a linker script symbol identifies a place in memory where some valid object is stored, I would expect accessing the value of that object to work, if it were accessed with its proper type. Supposing flash_start is normally accessible memory, and except for any requirements of your system about what is at flash_start, you could, in theory, put a uint32_t (using appropriate input to the linker) and then access it via flash_start.
Yes, but that's not my question. I'm not sure if you're picking up the subtlety of my question. Take a look at the examples I provide. It is true you can access this location just fine, but make sure you understand how you do so, and then my question will become apparent. Look especially at example 3 above, which is wrong even though to a C programmer it looks right. To read a uint32_t
, for ex, at __flash_start__
, you'd do this:
extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)&__flash_start__); // correct, even though it *looks like* you're taking the address (&) of an address (__flash_start__)
OR this:
extern uint32_t __flash_start__[];
uint32_t u32 = *((uint32_t *)__flash_start__); // also correct, and my preferred way of doing it because it looks more correct to the trained "C-programmer" eye
But most definitely NOT this:
extern uint32_t __flash_start__;
uint32_t u32 = __flash_start__; // incorrect; <==UPDATE: THIS IS ALSO CORRECT! (and more straight-forward too, actually; see comment discussion under this question)
and NOT this:
extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)__flash_start__); // incorrect, but *looks* right
__flash_start__
is normally accessible memory, and except for any requirements of your system about what is at__flash_start__
, you could, in theory, put auint32_t
(using appropriate input to the linker) and then access it via__flash_start__
. – Rosabelleextern uint32_t __flash_start__; uint32_t u32 = *((uint32_t *)&__flash_start__);
is a correct way to read auint32_t
that exists at__flash_start__
andextern uint32_t __flash_start__; uint32_t u32 = __flash_start__;
is not? Apple LLVM 10.0.0 with clang-1000.11.45.5 generates identical assembly code for them. – Rosabelle__flash_start__
has no value--literally--accessing its "value" is illegal and has no meaning, according to the manual (sourceware.org/binutils/docs-2.32/ld/…), and the address of__flash_start__
is0x8000000
. Take a close read in the manual. sourceware.org/binutils/docs-2.32/ld/…. This is an esoteric concept that requires a very close study. Note also that the "Sample printed output" from my question is real output, compiled, run, and copied-pasted, not just typed up. – Michikomichon__flash_start__
is a valid memory address, and you have ensure there is storage for auint32_t
at that address, and it is a properly aligned address for auint32_t
, then it is okay to access__flash_start__
in C as if it were auint32_t
. That would not be defined by the C standard, but by the GNU tools. – Rosabelleextern uint32_t __app_start__; uint32_t u32_1 = __app_start__; uint32_t u32_2 = *((uint32_t *)&__app_start__); printf("u32_1 = 0x%lX\n", u32_1); printf("u32_2 = 0x%lX\n", u32_2);
, and bothu32_1
(what you were saying--which I initially thought was wrong) andu32_2
(which I always knew to be correct) produced the same result, so you are correct. Output:u32_1 = 0x20080000
andu32_2 = 0x20080000
. – Michikomichon