Is accessing the "value" of a linker script variable undefined behavior in C?
Asked Answered
M

1

8

The GNU ld (linker script) manual Section 3.5.5 Source Code Reference has some really important information on how to access linker script "variables" (which are actually just integer addresses) in C source code. I used this info. to extensively use linker script variables, and I wrote this answer here: How to get value of variable defined in ld linker script from C.

However, it is easy to do it wrong and make the mistake of trying to access a linker script variable's value (mistakenly) instead of its address, since this is a bit esoteric. The manual (link above) says:

This means that you cannot access the value of a linker script defined symbol - it has no value - all you can do is access the address of a linker script defined symbol.

Hence when you are using a linker script defined symbol in source code you should always take the address of the symbol, and never attempt to use its value.

The question: So, if you do attempt to access a linker script variable's value, is this "undefined behavior"?

Quick refresher:

Imagine in linker script (ex: STM32F103RBTx_FLASH.ld) you have:

/* Specify the memory areas */
MEMORY
{
    FLASH (rx)      : ORIGIN = 0x8000000,  LENGTH = 128K
    RAM (xrw)       : ORIGIN = 0x20000000, LENGTH = 20K
}

/* Some custom variables (addresses) I intend to access from my C source code */
__flash_start__ = ORIGIN(FLASH);
__flash_end__ = ORIGIN(FLASH) + LENGTH(FLASH);
__ram_start__ = ORIGIN(RAM);
__ram_end__ = ORIGIN(RAM) + LENGTH(RAM);

And in your C source code you do:

// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)&__flash_start__);

// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)__flash_start__);

// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", __flash_start__);

Sample printed output

(this is real output: it was actually compiled, run, and printed by an STM32 mcu):

  1. __flash_start__ addr = 0x8000000
  2. __flash_start__ addr = 0x8000000
  3. __flash_start__ addr = 0x20080000 <== NOTICE LIKE I SAID ABOVE: this one is completely wrong (even though it compiles and runs)! <== Update Mar. 2020: actually, see my answer, this is just fine and right too, it just does something different is all.

Update:

Response to @Eric Postpischil's 1st comment:

The C standard does not define anything at all about linker script symbols. Any specification of behavior is up to the GNU tools. That said, if a linker script symbol identifies a place in memory where some valid object is stored, I would expect accessing the value of that object to work, if it were accessed with its proper type. Supposing flash_start is normally accessible memory, and except for any requirements of your system about what is at flash_start, you could, in theory, put a uint32_t (using appropriate input to the linker) and then access it via flash_start.

Yes, but that's not my question. I'm not sure if you're picking up the subtlety of my question. Take a look at the examples I provide. It is true you can access this location just fine, but make sure you understand how you do so, and then my question will become apparent. Look especially at example 3 above, which is wrong even though to a C programmer it looks right. To read a uint32_t, for ex, at __flash_start__, you'd do this:

extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)&__flash_start__); // correct, even though it *looks like* you're taking the address (&) of an address (__flash_start__)

OR this:

extern uint32_t __flash_start__[];
uint32_t u32 = *((uint32_t *)__flash_start__); // also correct, and my preferred way of doing it because it looks more correct to the trained "C-programmer" eye

But most definitely NOT this:

extern uint32_t __flash_start__;
uint32_t u32 = __flash_start__; // incorrect; <==UPDATE: THIS IS ALSO CORRECT! (and more straight-forward too, actually; see comment discussion under this question)

and NOT this:

extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)__flash_start__); // incorrect, but *looks* right

Related:

  1. Why do STM32 gcc linker scripts automatically discard all input sections from these standard libraries: libc.a, libm.a, libgcc.a?
  2. [My answer] How to get value of variable defined in ld linker script from C
Michikomichon answered 10/4, 2019 at 22:32 Comment(9)
The C standard does not define anything at all about linker script symbols. Any specification of behavior is up to the GNU tools. That said, if a linker script symbol identifies a place in memory where some valid object is stored, I would expect accessing the value of that object to work, if it were accessed with its proper type. Supposing __flash_start__ is normally accessible memory, and except for any requirements of your system about what is at __flash_start__, you could, in theory, put a uint32_t (using appropriate input to the linker) and then access it via __flash_start__.Rosabelle
I'm not sure if you're picking up the subtlety of my question, and I needed more space to respond, so I've responded to your comment directly in the bottom of my question above.Michikomichon
Why do you think extern uint32_t __flash_start__; uint32_t u32 = *((uint32_t *)&__flash_start__); is a correct way to read a uint32_t that exists at __flash_start__ and extern uint32_t __flash_start__; uint32_t u32 = __flash_start__; is not? Apple LLVM 10.0.0 with clang-1000.11.45.5 generates identical assembly code for them.Rosabelle
Because __flash_start__ has no value--literally--accessing its "value" is illegal and has no meaning, according to the manual (sourceware.org/binutils/docs-2.32/ld/…), and the address of __flash_start__ is 0x8000000. Take a close read in the manual. sourceware.org/binutils/docs-2.32/ld/…. This is an esoteric concept that requires a very close study. Note also that the "Sample printed output" from my question is real output, compiled, run, and copied-pasted, not just typed up.Michikomichon
That documentation is not written very well, and you are taking the first sentence too literally. What is really happening here is that the linker’s notion of the “value” of a symbol and a programming language’s notion of the “value” of an identifier are different things. To the linker, the value of a symbol is simply a number associated with it. In a programming language, the value is a number (or other element in the set of values of some type) stored in the (sometimes notional) storage associated with the identifier.…Rosabelle
… The documentation is advising you that the linker’s value of a symbol appears inside a language like C as the address associated with the identifier, rather than the contents of its storage. It goes too far when it tells you to “never attempt to use its value.” It is correct that merely defining a linker symbol does not reserve the necessary storage for a programming language object, and therefore merely having a linker symbol does not provide you storage you can access. However if you ensure storage is allocated by some other means, then, sure,…Rosabelle
… it can work as a programming language object. There is no general prohibition on using a linker symbol as an identifier in C, including accessing its C value, if you have properly allocated storage and otherwise satisfied the requirements for this. If the linker value of __flash_start__ is a valid memory address, and you have ensure there is storage for a uint32_t at that address, and it is a properly aligned address for a uint32_t, then it is okay to access __flash_start__ in C as if it were a uint32_t. That would not be defined by the C standard, but by the GNU tools.Rosabelle
You are correct. I tested the following: extern uint32_t __app_start__; uint32_t u32_1 = __app_start__; uint32_t u32_2 = *((uint32_t *)&__app_start__); printf("u32_1 = 0x%lX\n", u32_1); printf("u32_2 = 0x%lX\n", u32_2);, and both u32_1 (what you were saying--which I initially thought was wrong) and u32_2 (which I always knew to be correct) produced the same result, so you are correct. Output: u32_1 = 0x20080000 and u32_2 = 0x20080000.Michikomichon
I just wrote an answer to wrap up my findings. This was rather confusing, but also enlightening in the end. Thanks.Michikomichon
M
14

Shorter answer:

Accessing the "value" of a linker script variable is NOT undefined behavior, and is fine to do, so long as you want the actual data stored at that location in memory and not the address of that memory or the "value" of a linkerscript variable which happens to be seen by C code as an address in memory only and not a value.

Yeah, that's kind of confusing, so re-read that 3 times carefully. Essentially, if you want to access the value of a linker script variable just ensure your linker script is set up to prevent anything you don't want from ending up in that memory address so that whatever you DO want there is in fact there. This way, reading the value at that memory address will provide you something useful you expect to be there.

BUT, if you're using linker script variables to store some sort of "values" in and of themselves, the way to grab the "values" of these linker script variables in C is to read their addresses, because the "value" you assign to a variable in a linker script IS SEEN BY THE C COMPILER AS THE "ADDRESS" of that linker script variable, since linker scripts are designed to manipulate memory and memory addresses, NOT traditional C variables.

Here's some really valuable and correct comments under my question which I think are worth posting in this answer so they never get lost. Please go upvote his comments under my question above.

The C standard does not define anything at all about linker script symbols. Any specification of behavior is up to the GNU tools. That said, if a linker script symbol identifies a place in memory where some valid object is stored, I would expect accessing the value of that object to work, if it were accessed with its proper type. Supposing __flash_start__ is normally accessible memory, and except for any requirements of your system about what is at __flash_start__, you could, in theory, put a uint32_t (using appropriate input to the linker) and then access it via __flash_start__.
– Eric Postpischil

That documentation is not written very well, and you are taking the first sentence too literally. What is really happening here is that the linker’s notion of the “value” of a symbol and a programming language’s notion of the “value” of an identifier are different things. To the linker, the value of a symbol is simply a number associated with it. In a programming language, the value is a number (or other element in the set of values of some type) stored in the (sometimes notional) storage associated with the identifier. The documentation is advising you that the linker’s value of a symbol appears inside a language like C as the address associated with the identifier, rather than the contents of its storage...

THIS PART IS REALLY IMPORTANT and we should get the GNU linker script manual updated:

It goes too far when it tells you to “never attempt to use its value.”

It is correct that merely defining a linker symbol does not reserve the necessary storage for a programming language object, and therefore merely having a linker symbol does not provide you storage you can access. However if you ensure storage is allocated by some other means, then, sure, it can work as a programming language object. There is no general prohibition on using a linker symbol as an identifier in C, including accessing its C value, if you have properly allocated storage and otherwise satisfied the requirements for this. If the linker value of __flash_start__ is a valid memory address, and you have ensure there is storage for a uint32_t at that address, and it is a properly aligned address for a uint32_t, then it is okay to access __flash_start__ in C as if it were a uint32_t. That would not be defined by the C standard, but by the GNU tools.
– Eric Postpischil

Long answer:

I said in the question:

// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)&__flash_start__);

// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)__flash_start__);

// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", __flash_start__);

(See discussion under the question for how I came to this).

Looking specifically at #3 above:

Well, actually, if your goal is to read the address of __flash_start__, which is 0x8000000 in this case, then yes, this is completely wrong. But, it is NOT undefined behavior! What it is actually doing, instead, is reading the contents (value) of that address (0x8000000) as a uint32_t type. In other words, it's simply reading the first 4 bytes of the FLASH section, and interpreting them as a uint32_t. The contents (uint32_t value at this address) just so happen to be 0x20080000 in this case.

To further prove this point, the following are exactly identical:

// Read the actual *contents* of the `__flash_start__` address as a 4-byte value!

// forward declaration to make a variable defined in the linker script
// accessible in the C code
extern uint32_t __flash_start__; 

// These 2 read techniques do the exact same thing.
uint32_t u32_1 = __flash_start__;                 // technique 1
uint32_t u32_2 = *((uint32_t *)&__flash_start__); // technique 2
printf("u32_1 = 0x%lX\n", u32_1);
printf("u32_2 = 0x%lX\n", u32_2);

The output is:

u32_1 = 0x20080000
u32_2 = 0x20080000

Notice they produce the same result. They each are producing a valid uint32_t-type value which is stored at address 0x8000000.

It just so turns out, however, that the u32_1 technique shown above is a more straight-forward and direct way of reading the value is all, and again, is not undefined behavior. Rather, it is correctly reading the value (contents of) that address.

I seem to be talking in circles. Anyway, mind blown, but I get it now. I was convinced before I was supposed to use the u32_2 technique shown above only, but it turns out they are both just fine, and again, the u32_1 technique is clearly more straight-forward (there I go talking in circles again). :)

Cheers.


Digging deeper: Where did the 0x20080000 value stored right at the start of my FLASH memory come from?

One more little tidbit. I actually ran this test code on an STM32F777 mcu, which has 512KiB of RAM. Since RAM starts at address 0x20000000, this means that 0x20000000 + 512K = 0x20080000. This just so happens to also be the contents of the RAM at address zero because Programming Manual PM0253 Rev 4, pg. 42, "Figure 10. Vector table" shows that the first 4 bytes of the Vector Table contain the "Initial SP [Stack Pointer] value". See here:

enter image description here

I know that the Vector Table sits right at the start of the program memory, which is located in Flash, so that means that 0x20080000 is my initial stack pointer value. This makes sense, because the Reset_Handler is the start of the program (and its vector just so happens to be the 2nd 4-byte value at the start of the Vector Table, by the way), and the first thing it does, as shown in my "startup_stm32f777xx.s" startup assembly file, is set the stack pointer (sp) to _estack:

Reset_Handler:  
  ldr   sp, =_estack      /* set stack pointer */

Furthermore, _estack is defined in my linker script as follows:

/* Highest address of the user mode stack */
_estack = ORIGIN(RAM) + LENGTH(RAM);    /* end of RAM */

So there you have it! The first 4-byte value in my Vector Table, right at the start of Flash, is set to be the initial stack pointer value, which is defined as _estack right in my linker script file, and _estack is the address at the end of my RAM, which is 0x20000000 + 512K = 0x20080000. So, it all makes sense! I've just proven I read the right value!

See also:

  1. [my answer] How to get value of variable defined in ld linker script from C
Michikomichon answered 11/4, 2019 at 0:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.