Which variable types/sizes are atomic on STM32 microcontrollers?

Asked 12/10, 2018 at 17:43 Answered 19/10, 2018 at 7:16

Here are the data types on STM32 microcontrollers: http://www.keil.com/support/man/docs/armcc/armcc_chr1359125009502.htm.

These microcontrollers use 32-bit ARM core processors.

Which data types have automatic atomic read and atomic write access?

I'm pretty sure all 32-bit data types do (since the processor is 32-bits), and all 64-bit data types do NOT (since it would take at least 2 processor operations to read or write a 64-bit word), but what about bool (1 byte), and uint16_t/int16_t (2 bytes)?

Context: I'm sharing variables between multiple threads (single core, but multiple threads, or "tasks" as they are called, in FreeRTOS) on the STM32 and need to know if I need to enforce atomic access by turning off interrupts, using mutexes, etc.

UPDATE:

Refering to this sample code:

volatile bool shared_bool;
volatile uint8_t shared u8;
volatile uint16_t shared_u16;
volatile uint32_t shared_u32;
volatile uint64_t shared_u64;
volatile float shared_f; // 32-bits
volatile double shared_d; // 64-bits

// Task (thread) 1
while (true)
{
    // Write to the values in this thread.
    //
    // What I write to each variable will vary. Since other threads are reading
    // these values, I need to ensure my *writes* are atomic, or else I must
    // use a mutex to prevent another thread from reading a variable in the
    // middle of this thread's writing.
    shared_bool = true;
    shared_u8 = 129;
    shared_u16 = 10108;
    shared_u32 = 130890;
    shared_f = 1083.108;
    shared_d = 382.10830;
}

// Task (thread) 2
while (true)
{
    // Read from the values in this thread.
    //
    // What thread 1 writes into these values can change at any time, so I need
    // to ensure my *reads* are atomic, or else I'll need to use a mutex to
    // prevent the other thread from writing to a variable in the midst of
    // reading it in this thread.
    if (shared_bool == whatever)
    {
        // do something
    }
    if (shared_u8 == whatever)
    {
        // do something
    }
    if (shared_u16 == whatever)
    {
        // do something
    }
    if (shared_u32 == whatever)
    {
        // do something
    }
    if (shared_u64 == whatever)
    {
        // do something
    }
    if (shared_f == whatever)
    {
        // do something
    }
    if (shared_d == whatever)
    {
        // do something
    }
}

In the code above, which variables can I do this for without using a mutex? My suspicion is as follows:

volatile bool: safe--no mutex required
volatile uint8_t: safe--no mutex required
volatile uint16_t: safe--no mutex required
volatile uint32_t: safe--no mutex required
volatile uint64_t: UNSAFE--YOU MUST USE A Critical section or MUTEX!
volatile float: safe--no mutex required
volatile double: UNSAFE--YOU MUST USE A Critical section or MUTEX!

Example critical section with FreeRTOS:

https://www.freertos.org/taskENTER_CRITICAL_taskEXIT_CRITICAL.html

// Force atomic access with these critical section atomic access guards. taskENTER_CRITICAL(); // do the (now guaranteed to be safe) read or write here taskEXIT_CRITICAL();

Related, but not answering my question:

Atomic operations in ARM
ARM: Is writing/reading from int atomic?
(My own question and answer on atomicity in 8-bit AVR [and Arduino] microcontrollers): https://mcmap.net/q/18816/-c-decrementing-an-element-of-a-single-byte-volatile-array-is-not-atomic-why-also-how-do-i-force-atomicity-in-atmel-avr-mcus-arduino
https://stm32f4-discovery.net/2015/06/how-to-properly-enabledisable-interrupts-in-arm-cortex-m/

Keefe answered 12/10, 2018 at 17:43 Comment(6)

This would be what the ARM instruction set manual for your particular chip is for? – Ruggles 12/10, 2018 at 18:1

Possible duplicate of ARM: Is writing/reading from int atomic? – Purpleness 12/10, 2018 at 18:28

You have to look at the assembly code. – Ribald 12/10, 2018 at 18:35

Are you trying to defend against two cores operating on the same data, or being interrupted in the middle of a write to yield to the other thread on the same core? – Purpleness 12/10, 2018 at 18:39

The latter: "being interrupted in the middle of a write to yield to the other thread on the same core". – Keefe 12/10, 2018 at 18:44

Consider it to be code running on an STM32F767ZI with FreeRTOS to handle multi-threading. – Keefe 12/10, 2018 at 18:47

For the final, definitive answer to this question, jump straight down to the section below titled "Final answer to my question".

UPDATE 30 Oct. 2018: I was accidentally referencing the (slightly) wrong documents (but which said the exact same thing), so I've fixed them in my answer here. See "Notes about the 30 Oct. 2018 changes" at bottom of this answer for details.

I definitely don't understand every word here, but the ARM v7-M Architecture Reference Manual (Online source; PDF file direct download) (NOT the Technical Reference Manual [TRM], since it doesn't discuss atomicity) validates my assumptions:

So...I think my 7 assumptions at the bottom of my question are all correct. [30 Oct. 2018: Yes, that is correct. See below for details.]

UPDATE 29 Oct. 2018:

One more little tidbit: FreeRTOS is sure on this

...and it's used in thousands of safety-critical applications world-wide.

Richard Barry, FreeRTOS founder, expert, and core developer, states in tasks.c in two different places (ex: here in the official FreeRTOS V11.0.1 release) that:

/* A critical section is not required because the variables are of type BaseType_t. */

And, for most (all?) 32-bit microcontrollers, such as STM32F4 ARM Cortex-M4 with floating point unit (hence the folder name ARM_CM4F), you can see here in FreeRTOS-Kernel/portable/GCC/ARM_CM4F/portmacro.h that BaseType_t is typedefed as long, and UBaseType_t is typedefed as unsigned long:

typedef long             BaseType_t;
typedef unsigned long    UBaseType_t;

...and in the code where the above "critical section is not required" comments are, the variables in question are of type UBaseType_t. Furthermore, long for these chips is int32_t (4 bytes), and unsigned long is uint32_t (4 bytes). So, this means that Richard Barry is saying that 4-byte reads and writes are atomic on these 32-bit microcontrollers. This means that he, at least, is 100% sure 4-byte reads and writes are atomic on STM32. He doesn't mention smaller-byte reads, but for 4-byte reads he is conclusively sure. I have to assume that 4-byte variables being the native processor width, and also, word-aligned, is critical to this being true.

Note that the FreeRTOS version number is found in task.h, here. Here are the two code and comment snippets from tasks.c in FreeRTOS V11.0.1 where he states that a critical section is not required because the variables are of type BaseType_t (or UBaseType_t):

void vTaskSuspendAll( void )
{
    traceENTER_vTaskSuspendAll();

    #if ( configNUMBER_OF_CORES == 1 )
    {
        /* A critical section is not required as the variable is of type
         * BaseType_t.  Please read Richard Barry's reply in the following link to a
         * post in the FreeRTOS support forum before reporting this as a bug! -
         * https:// goo.gl/wu4acr */

        /* portSOFTWARE_BARRIER() is only implemented for emulated/simulated ports that
         * do not otherwise exhibit real time behaviour. */
        portSOFTWARE_BARRIER();

        /* The scheduler is suspended if uxSchedulerSuspended is non-zero.  An increment
         * is used to allow calls to vTaskSuspendAll() to nest. */
        ++uxSchedulerSuspended;

        /* Enforces ordering for ports and optimised compilers that may otherwise place
         * the above increment elsewhere. */
        portMEMORY_BARRIER();
    }
...

UBaseType_t uxTaskGetNumberOfTasks( void )
{
    traceENTER_uxTaskGetNumberOfTasks();

    /* A critical section is not required because the variables are of type
     * BaseType_t. */
    traceRETURN_uxTaskGetNumberOfTasks( uxCurrentNumberOfTasks );

    return uxCurrentNumberOfTasks;
}

The short goo.gl link in the first comment above leads to this full link: FreeRTOS Support Archive: Concerns about the atomicity of vTaskSuspendAll(). The key here is that Richard is relying on each individual 4-byte read or write being naturally atomic on this hardware.

Final answer to my question: all types <= 4 bytes (all bolded types in the list of 9 rows below) are atomic.

Furthermore, upon closer inspection of the TRM on p141 as shown in my screenshot above, the key sentences I'd like to point out are:

In ARMv7-M, the single-copy atomic processor accesses are:
• all byte accesses.
• all halfword accesses to halfword-aligned locations.
• all word accesses to word-aligned locations.

And, per this link, the following is true for "basic data types implemented in ARM C and C++" (ie: on STM32):

bool/_Bool is "byte-aligned" (1-byte-aligned)
int8_t/uint8_t is "byte-aligned" (1-byte-aligned)
int16_t/uint16_t is "halfword-aligned" (2-byte-aligned)
int32_t/uint32_t is "word-aligned" (4-byte-aligned)
int64_t/uint64_t is "doubleword-aligned" (8-byte-aligned) <-- NOT GUARANTEED ATOMIC
float is "word-aligned" (4-byte-aligned)
double is "doubleword-aligned" (8-byte-aligned) <-- NOT GUARANTEED ATOMIC
long double is "doubleword-aligned" (8-byte-aligned) <-- NOT GUARANTEED ATOMIC
all pointers are "word-aligned" (4-byte-aligned)

This means that I now have and understand the evidence I need to conclusively state that all bolded rows just above have automatic atomic read and write access (but NOT increment/decrement of course, which is multiple operations). This is the final answer to my question. The only exception to this atomicity might be in packed structs I think, in which case these otherwise-naturally-aligned data types may not be naturally aligned.

Also note that when reading the Technical Reference Manual, "single-copy atomicity" apparently just means "single-core-CPU atomicity", or "atomicity on a single-CPU-core architecture." This is in contrast to "multi-copy atomicity", which refers to a "mutliprocessing system", or multi-core-CPU architecture. Wikipedia states "multiprocessing is the use of two or more central processing units (CPUs) within a single computer system" (https://en.wikipedia.org/wiki/Multiprocessing).

My architecture in question, STM32F767ZI (with ARM Cortex-M7 core), is a single-core architecture, so apparently "single-copy atomicity", as I've quoted above from the TRM, applies.

Notes about the 30 Oct. 2018 changes:

I had this reference: ARMv7 TRM (Technical Reference Manual). However, this is wrong in 2 ways: 1) This isn't a TRM at all! The TRM is a short (~200 pgs) Technical Reference Manual. This, however, is the "Architecture Reference Manual", NOT the TRM. It is a much longer and more generic document, as Architecture reference manuals are on the order of ~1000~2000 pgs it turns out. 2) This is for the ARMv7-A and ARMv7-R processors, but the manual I need for the STM32 mcu in question is for the ARMv7-M processor.
Here is the correct link to the ARM Cortex-M7 Processor Technical Reference Manual. Online: https://developer.arm.com/docs/ddi0489/latest. PDF: https://static.docs.arm.com/ddi0489/d/DDI0489D_cortex_m7_trm.pdf.
The correct TRM just above, on p99 (5-36) says, "For more information on atomicity, see the ARM®v7-M Architecture Reference Manual." So, here is that manual. Online download link: https://developer.arm.com/products/architecture/cpu-architecture/m-profile/docs/ddi0403/latest/armv7-m-architecture-reference-manual. PDF: https://static.docs.arm.com/ddi0489/d/DDI0489D_cortex_m7_trm.pdf. It discusses atomicity on p79-80 (A3-79 to A3-80).

To create atomic access guards (usually by turning off interrupts when reads and writes are not atomic) see:

Keefe answered 12/10, 2018 at 19:26 Comment(27)

In a single-core environment, execution can't be interrupted in the middle of an instruction, so any C construct that builds down to one instruction is atomic. 32-bit ARM doesn't have single instructions that can manipulate more than 32 bits of memory at once, so that sets an obvious upper bound on what can be atomic: notably, 64-bit manipulations can't. – Purpleness 12/10, 2018 at 19:37

However, there are still C operations that will compile to more than one instruction even if they manipulate 32 bits or less (like a += 1 with int a), and you need to be careful with these. A less obvious example is if you use a structure with unaligned fields: your compiler will need to generate at least two loads and two stores to handle reading/writing them. It would also be possible that copying a struct that fits in 32 bits could use more than one instruction at some optimization levels. For numeric variables, neither is usually a concern, though. – Purpleness 12/10, 2018 at 19:39

@Purpleness wrong. Some of the instructions can be interrupted for example division. – Whitefaced 13/10, 2018 at 14:28

@P__J__, teach me something and show me an architecture that does that. – Purpleness 13/10, 2018 at 14:36

@P__J__, what exactly happens when a division is interrupted? Do you get corrupted state, or is state rolled back such that being interrupted in the middle is completely indistinguishable from being interrupted just before? – Purpleness 13/10, 2018 at 17:55

TRM - Technical Reference Manual – Whitefaced 13/10, 2018 at 17:58

the stm32 is most definitely not an ARMv7-AR...you are looking at the wrong manual. – Georgia 30/10, 2018 at 17:43

@old_timer, that's probably the most useful feedback I've received on this question thus far. :) Thank you for pointing that out. I'm going to see if I can find the right TRM now. It looks like you downvoted my question. Please explain why. – Keefe 30/10, 2018 at 18:4

That's the most unhelpful thing I've ever heard. Sounds like it's coming from an old-timer. There are something like 6000+ pgs of documentation for this chip. This is exactly what Stack Overflow is for. I'm not afraid to read a manual, but it's views like yours that make Stack Overflow an elitist place instead of a place where valuable and hard-to-find knowledge can be passed on. When I am an old-timer someday, and somone puts effort into this like I have, I will give them a link to a manual, provide a helpful response, and upvote their thoughtful question. – Keefe 30/10, 2018 at 18:33

@old_timer, I've updated my answer with the proper links to the correct Technical Reference Manual (which, it turns out, I don't need in this case), and the correct Architecture Reference Manual. I hope you reconsider your votes on this question and answer, and in the future, vote based on correctness, not on ability to decipher dozens of manuals, knowledge of which manuals exist, and knowledge of where to read in the 6000~8000 pgs of cryptic manuals. Prior to asking this question I was neither aware of ARM TRMs nor Architecture Manuals, & I had already downloaded 6000 pgs of STM32 manuals. – Keefe 30/10, 2018 at 20:1

Links are bad in both (stackoverflow) questions and answers as they change over time relative to the question or answer. – Georgia 30/10, 2018 at 20:55

as you can see from a simple mouser search or st or other the stm32 family covers from the cortex-m0,m0+,m3,m4,m7 and soon m23 and on and on...The m0 and m0+ are armv6-m based which you know from the documentation for the part you are using if you have one of them and the cortex-m3/m4/m7 are armv7-m based as you know from the documentation from st on the part you are using (should never start without the documentation for the part). This is advice if I simply gave you a list of instructions thats like giving you a fish without teaching how to catch one. – Georgia 30/10, 2018 at 20:59

so chip docs usually two minimum with various names based on the vendor, datasheet usually has at least the electrical and pinout, sometimes has the programmer info as well. sometimes others are called reference manuals or users guides. arm based parts like the huge stm32 family will tell you what core is used, you go to arms website or sometimes at st, and get the trm, in that it tells you which architecture and you get that document, bare minimum set of documents before day one of programming one of these boards. – Georgia 30/10, 2018 at 21:1

99.9999% of bare metal programming is reading documentation, if you want to do this work then 6000 pages is nothing you just learn to search through it and narrow in on what you are after, sometimes that is faster than just reading it..(usually for well written manuals) – Georgia 30/10, 2018 at 21:2

Lastly assume that no processor or no modern processor has atomic access. Then if you find one then good for you. Also this is purchased IP, so most of the logic you are asking about is not arms it is ST or other purchased IP they used to interface to the arm's busses. The arm bus documentation is on arms website look for axi/amba/ahb, the trm should hopefully say which flavor in a vague way but the busses are mostly the same in concept, send out an address wait for that to be acked then either data comes back eventually or you then write on the write bus – Georgia 30/10, 2018 at 21:6

the memories, flash, peripherals are all chip vendor not arm, that doesnt mean arm cant put something into their logic to isolate transactions, and some have a feature for this usually for bit modification in a port for example. But this is not available in all cores and the chip vendor can choose to not enable this feature. At least the feature I am talking about which is not the one you found. – Georgia 30/10, 2018 at 21:7

There is no reason to expect there to be a global answer to your question that covers such a broad range of different products that span what a decade? The specific chip should have been part of the original question. – Georgia 30/10, 2018 at 21:10

lastly the chip vendors get the source code to the core, so in addition to the features that are documented as options for that core, the chip vendors may or may not make modifications. At the end of the day it comes down to what do you think you need atomic functions for and maybe you dont. (note ldrex/strex are not swp replacements, be very careful reading on how to use them, in this day and age atomic operations are bad design (performance and other negative affects.), you solve the problem other ways). – Georgia 30/10, 2018 at 21:15

sadly it is rare that the chip vendors tell you specifically which version of the core they used as you want the right rev of documentation as well as newer revs to compare with. With most of these cortex-ms there are cpuid registers that along with the arm documentation can tell you which core is really there, now what features the vendor compiled in are not necessarily detectable. – Georgia 30/10, 2018 at 21:16

at the end of the day though your question sounds like a freertos question not an arm/processor question. – Georgia 30/10, 2018 at 21:20

Let us continue this discussion in chat. – Keefe 1/11, 2018 at 23:31

Despite the confusing name, "single-copy atomicity" really does mean it's atomic across all cores: when you store, a load on any other core will return either the old or the new value, nothing else. I don't know how they came up with that term; I haven't seen it elsewhere. It's in contrast to "multi-copy atomicity" which would be more or less a global total order on all stores, and which they apparently discuss in the manual only for the purpose of saying "we don't do that". – Rachelrachele 28/3, 2022 at 3:41

Thanks for your thorough explanation! Is it the case too with bitfields? Imagine I wanna set one or several bits from my bitfield, is the operation guaranteed to be atomic if the bitfield is aligned and of correct dimension? – Plymouth 4/4, 2023 at 10:1

@Getter, I don't know. I've never really used bitfields. I just use regular types in structs. If I want to toggle bits I just use macros which do bitshifting and stuff, like bitRead(), bitSet(), bitClear(), and bitWrite(), shown in my answer here. – Keefe 4/4, 2023 at 16:55

@Getter...and those operations are not atomic. They must be protected with atomic access guards, just like increment (++) and decrement (--) operations, unless you use the C _Atomic types or C++ std::atomic<> types, which make increment and decrement atomic. In C++, std::atomic<> types also have atomic |= and &= operations (see here), but I'm not sure about that in C. – Keefe 4/4, 2023 at 17:21

(Update: for C it may depend on the compiler): "Implementations are recommended to ensure that the representation of _Atomic(T) in C is same as that of std::atomic<T> in C++ for every possible type T. The mechanisms used to ensure atomicity and memory ordering should be compatible." (see: en.cppreference.com/w/cpp/atomic/atomic) – Keefe 4/4, 2023 at 17:27

Thanks a lot GabrielStaples! – Plymouth 5/4, 2023 at 6:27

Depending what you mean by atomic.

If it is not the simple load or store operation like

a += 1;

then all types are not atomic.

If it is simple store or load oparation 32bits, 16 bits and 8 bits data types are atomic. If the value in the register will have to be normalized 8 & 16 bits store and load may be not atomic.

If your hardware supports bitbanding then if the bitbanding is used the bit operations (set and reset)int the memory areas supporting bitbanding are atomic

Note.

if your code does not allow unaligned operations 8 & 16 bit operations may be not atomic.

Whitefaced answered 12/10, 2018 at 17:54 Comment(11)

Thanks for your answer. Please see mu updated question and see if you can verify my suspicions more explicitly. – Keefe 12/10, 2018 at 18:24

Incidentally, a += 1 is two operations and is not atomic. – Purpleness 12/10, 2018 at 18:28

Agreed. I learned this the hard way a few years back on an 8-bit AVR processor by incrementing (not an atomic operation) an otherwise atomic-read-write-capable 8-bit variable. – Keefe 12/10, 2018 at 18:29

@Purpleness no, only if the operation is RMW, otherwise it is atomic. It may be not coherent (cache) but atomic. – Whitefaced 12/10, 2018 at 18:30

@Purpleness Incidentially a+=1 is at least three operations not two. – Whitefaced 12/10, 2018 at 18:34

If that operation was atomic, multiple cores attempting it at the same time would succeed. That's not the case, if you have two cores doing this in a loop you are certain to lose some increments. – Purpleness 12/10, 2018 at 18:36

In addition to that, it's impossible to have unaligned 8-bit accesses on ARM. – Purpleness 12/10, 2018 at 18:36

As @Purpleness says, a += 1 is not atomic. Here's my previous experience with that one: stackoverflow.com/questions/36381932/… – Keefe 12/10, 2018 at 18:40

Incrementing/decrementing is never atomic: https://mcmap.net/q/18816/-c-decrementing-an-element-of-a-single-byte-volatile-array-is-not-atomic-why-also-how-do-i-force-atomicity-in-atmel-avr-mcus-arduino – Keefe 12/10, 2018 at 18:43

@Purpleness they are atomic. The other core has to wait for the access. The problem is coherence as the cores work on the cached data. This is another problem and another measures have to be taken. But it is outside the scope of this question – Whitefaced 12/10, 2018 at 19:0

@P__J__, feel free to expand your answer to be "outside the scope of the question," as you see it. The more knowledge you can provide, the better. – Keefe 12/10, 2018 at 22:7

Atomic "arithmetic" can be processed by CPU Core registers!

It can be any types one or four bytes depends on architecture and instruction set

BUT modification of any variable located in memory take at least 3 system steps: RMW = Read memory to register, Modify register and Write register to memory.

Therefore atomic modification can possible only if you control using of CPU registers it does means need use pure assembler and don't use C or Cpp compiler.

When you use C\Cpp compiler it placed global or global static variable in memory so C\Cpp don't provide any atomic actions and types

Note: you can use for example "FPU registers" for atomic modification (if you really need it), but you must hide from the compiler and RTOS that architecture has FPU.

Rotten answered 19/10, 2018 at 7:16 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++