reading a 64 bit volatile variable on cortex-m3

Asked 7/3, 2017 at 14:38 Answered 31/8, 2020 at 21:3

I have a 64 bit integer variable on a 32 bit Cortex-M3 ARM controller (STM32L1), which can be modified asynchronously by an interrupt handler.

volatile uint64_t v;
void some_interrupt_handler() {
    v = v + something;
}

Obviously, I need a way to access it in a way that prevents getting inconsistent, halfway updated values.

Here is the first attempt

static inline uint64_t read_volatile_uint64(volatile uint64_t *x) {
    uint64_t y;
    __disable_irq();
    y = *x;
    __enable_irq();
    return y;
}

The CMSIS inline functions __disable_irq() and __enable_irq() have an unfortunate side effect, forcing a memory barrier on the compiler, so I've tried to come up with something more fine-grained

static inline uint64_t read_volatile_uint64(volatile uint64_t *x) {
    uint64_t y;
    asm (   "cpsid i\n"
            "ldrd %[value], %[addr]\n"
            "cpsie i\n"
            : [value]"=r"(y) : [addr]"m"(*x));
    return y;
}

It still disables interrupts, which is not desirable, so I'm wondering if there's a way doing it without resorting to cpsid. The Definitive Guide to ARM Cortex-M3 and Cortex-M4 Processors, Third Edition by Joseph Yiu says

If an interrupt request arrives when the processor is executing a multiple cycle instruction, such as an integer divide, the instruction could be abandoned and restarted after the interrupt handler completes. This behavior also applies to load double-word (LDRD) and store double-word (STRD) instructions.

Does it mean that I'll be fine by simply writing this?

static inline uint64_t read_volatile_uint64(volatile uint64_t *x) {
    uint64_t y;
    asm (   "ldrd %[value], %[addr]\n"
            : [value]"=&r"(y) : [addr]"m"(*x));
    return y;
}

(Using "=&r" to work around ARM errata 602117)

Is there some library or builtin function that does the same portably? I've tried atomic_load() in stdatomic.h, but it fails with undefined reference to '__atomic_load_8'.

Indebtedness answered 7/3, 2017 at 14:38 Comment(10)

if the other side of this is accessed updated 64 bits at a time, then just using ldrd should work yes (without messing with interrupt enable/disable)? Have one side use strd the other ldrd. Or you could try strex/ldrex if you dont want to use strd/ldrd. – Norge 7/3, 2017 at 15:9

Using strd does not help when ldrd can be interrupted, and strex checking would introduce additional delays and complexities, since I'd need separate semaphores. – Indebtedness 7/3, 2017 at 15:22

well you can do some sort of a ping/pong mailbox deal where you indicate which one you read last, the interrupt modifies the other and then you swap... – Norge 7/3, 2017 at 15:47

note if it is an aligned access, and depending on the width of the bus, ldrd wouldnt be able to be interrupted. if it is really a 32 bit bus that is serialized somewhere, sure...reading up to see if/how ldrd is interrupted (vs how ldm is) – Norge 7/3, 2017 at 15:49

I think only the application side would need to use ldrex/strex the interrupt could simply strd... – Norge 7/3, 2017 at 15:50

On exception return, the instruction that generated the sequence of accesses is re-executed and so any accesses that had already been performed before the exception was taken might be repeated. – Norge 7/3, 2017 at 15:51

ldrd will restart if it is interrupted, so you will never get half a value. Other mechanism are to read the high, then low, read high again and compare to first high value. If they are different, then retry. Note, this only works for interrupt increment (decrement) and mainline read. It should work with a ring buffer as well. – Kriemhild 7/3, 2017 at 23:19

So I'm feeling left a little high and dry on this question. I thought ldrexd/strexd was the prescribed architectural way to do this on a thumb-2 architecture. As for strexd causing "additional delays and complexity", I'm not sure what that means. Most time there will be no delay and the complexity is just a few instructions to make the test and loop when there has been overlap. Besides, do you really want to tempt fate by picking some other method? – Confront 9/3, 2017 at 20:0

There is no ldrexd/strexd on armv7-m. – Indebtedness 10/3, 2017 at 8:32

@berendi -- Sorry you are so right. And that does leave you with having to create a semaphore to control access to the shared double word. But I'm not sure I would trust any technique that didn't use ldrex/strex. – Confront 10/3, 2017 at 16:59

Yes, using a simple ldrd is safe in this application since it will be restarted (not resumed) if interrupted, hence it will appear atomic from the interrupt handler's point of view.

This holds more generally for all load instructions except those that are exception-continuable, which are a very restricted subset:

only ldm, pop, vldm, and vpop can be continuable
an instruction inside an it-block is never continuable
an ldm/pop whose first loaded register is also the base register (e.g. ldm r0, { r0, r1 }) is never continuable

This gives plenty of options for atomically reading a multi-word variable that's modified by an interrupt handler on the same core. If the data you wish to read is not a contiguous array of words then you can do something like:

1:      ldrex   %[val0], [%[ptr]]       // can also be byte/halfword
        ... more loads here ...
        strex   %[retry], %[val0], [%[ptr]]
        cbz     %[retry], 2f
        b       1b
2:

It doesn't really matter which word (or byte/halfword) you use for the ldrex/strex since an exception will perform an implicit clrex.

The other direction, writing a variable that's read by an interrupt handler is a lot harder. I'm not 100% sure but I think the only stores that are guaranteed to appear atomic to an interrupt handler are those that are "single-copy atomic", i.e. single byte, aligned halfword, and aligned word. Anything bigger would require disabling interrupts or using some clever lock-free structure.

Lineation answered 31/8, 2020 at 21:3 Comment(0)

Atomicity is not guaranteed on LDRD according to the ARMv7m reference manual. (A3.5.1)

The only ARMv7-M explicit accesses made by the ARM processor which exhibit single-copy atomicity are:

• All byte transactions

• All halfword transactions to 16-bit aligned locations

• All word transactions to 32-bit aligned locations

LDM, LDC, LDRD, STM, STC, STRD, PUSH and POP operations are seen to be a sequence of 32-bit
transactions aligned to 32 bits. Each of these 32-bit transactions are guaranteed to exhibit single-copy
atomicity. Sub-sequences of two or more 32-bit transactions from the sequence also do not exhibit
single-copy atomicity

What you can do is use a byte to indicate to the ISR you're reading it.

non_isr(){
    do{
        flag = 1
        foo = doubleword
    while(flag > 1)
    flag = 0
}

isr(){
    if(flag == 1) 
        flag++;
    doubleword = foo
}

Source (login required): http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0403e.b/index.html

Libbielibbna answered 10/4, 2017 at 5:56 Comment(3)

While the bus access performed by an ldrd is not atomic, it will be restarted if interrupted (since it is not exception-continuable like ldm), and therefore it is atomic w.r.t. interrupts hence safe in this application. – Lineation 31/8, 2020 at 20:51

@Lineation You can also use the exclusive instructions. – Libbielibbna 1/9, 2020 at 5:37

There are lots of ways to implement this interaction, but he asked whether a simple ldrd suffices and it does indeed. Using exclusives is a more general solution to make an arbitrary sequence of loads restart if interrupted, but for the special case of reading a 64-bit integer it is excessively complicated. – Lineation 2/9, 2020 at 14:58

I was also trying to use a 64-bit (2 x 32-bit) system_tick, but on an STM32L4xx (ARM cortex M3). I found that when I tried to use just "volatile uint64_t system_tick", compiler injected assembly instruction LDRD, which may have been enough, since getting interrupted after reading the first word is supposed to cause both words to be read again.

I asked the tech at IAR software support and he responded that I should use C11 atomics;

#include "stdatomic.h"
#ifdef __STDC_NO_ATOMICS__
static_assert(__STDC_NO_ATOMICS__ != 1);
#endif

volatile atomic_uint_fast64_t system_tick;
/**
* \brief Increment system_timer
* \retval none
*/
void HAL_IncTick(void)
{
    system_tick++;
}

/**
 * \brief Read 64-bit system_tick
 * \retval system_tick
 */
uint64_t HAL_GetSystemTick(void)
{
    return system_tick;
}

/**
 * \brief Read 32 least significant bits of system_tick
 * \retval (uint64_t) system_tick
 */
uint32_t HAL_GetTick(void)
{
    return (uint32_t)system_tick;
}

But what I found was a colossal amount of code was added to make the read "atomic".

Way back in the day of 8-bit micro-controllers, the trick was to read the high byte, read the low byte, then read the high byte until the high byte was the same twice - proving that there was no rollover created by the ISR. So if you are against disabling IRQ, reading system_tick, then enabling IRQ, try this trick:

/**
 * \brief Read 64-bit system_tick
 * \retval system_tick
 */
uint64_t HAL_GetSystemTick(void)
{
    uint64_t tick;

    do {
        tick = system_tick;
    } while ((uint32_t)(system_tick >> 32) != (uint32_t)(tick >> 32));

    return tick;
}

The idea is that if the most significant word does not roll over, then then whole 64-bit system_timer must be valid. If HAL_IncTick() did anything more than a simple increment, this assertion would not be possible.

Cyte answered 25/9, 2019 at 19:37 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags