Unaligned access causes error on ARM Cortex-M4

Asked 16/8, 2013 at 8:39 Answered 14/11, 2022 at 1:42

Solved c++c arm memory-alignment cortex-m

I have an object that has an address that is not 4-byte aligned. This causes a HardFault error in the cpu when there is a STR instruction saving 2 registers.

This is the generated code:

   00000000 <_ZN8BaseAreaC1EPcmm>:
   0:   b510            push    {r4, lr}
   2:   4604            mov     r4, r0
   4:   6042            str     r2, [r0, #4]
   6:   e9c4 3102       strd    r3, r1, [r4, #8]
   a:   2001            movs    r0, #1
   c:   7420            strb    r0, [r4, #16]
   e:   b921            cbnz    r1, 1a <_ZN8BaseAreaC1EPcmm+0x1a>

These are the registers when at line "4: 6042..."

R0   08738B82  R8          0  
R1   08738BAE  R9          0  
R2          0  R10  082723E0  
R3       2FCC  R11         0  
R4   08738B82  R12         0  
R5   20007630  R13  2000CB38

As seen the target register for STR-instructions are not aligned on 4-byte. The instruction STR r2, [r0, #4] is executed fine. But it HardFaults on the next STRD r3, r1, [r4, #8]. If I manually change register R4 to 08738B80 it does not hardfault.

This is the C++ code that generates the above asm:

BaseArea::BaseArea(char * const pAddress, unsigned long startOffset, unsigned long endOffset) : 
m_pAddress(pAddress), m_start(startOffset), m_end(endOffset), m_eAreaType(BASE_AREA) {

And m_start is the first variable in the class and has the same address as this (0x08738B82), m_end follows after on 0x08738B86.

How do I get the object aligned on 4-byte? Anyone have some other solution to this?

Zelikow answered 16/8, 2013 at 8:39 Comment(4)

Are you actually programming in assembler, or is this code generated by e.g. a C compiler? – Ragwort 16/8, 2013 at 8:42

It may also help if you tell us what compiler you are using (e.g. gcc, armcc, etc) – Herbivorous 16/8, 2013 at 8:47

Can you please post the structure of BaseArea, and also where the constructor is being called? – Herbivorous 16/8, 2013 at 9:32

How are you instantiating the object? Maybe it's part of a packed struct or something like that? Otherwise I think this registers as a toolchain bug, because the language guarantees that objects will be allocated at addresses that fulfill their alignment requirements (except for over-aligned objects but that is not the case here). In any case you can force a specific alignment using alignas (C++11) or a compiler specific equivalent when creating the object. – Gaskin 13/2, 2018 at 10:5

On ARM-based systems you frequently cannot address a 32-bit word that is not aligned to a 4-byte boundary (as your error is telling you). On x86 you can access non-aligned data, however there is a huge hit on performance. Where an ARM part does support unaligned accesses (e.g. single word normal load), there is a performance penalty and there should be a configurable exception trap.

Example of boundary error on ARM (here), TLDR: storing a pointer to an unsigned char and then attempting to convert it to a double * (double pointer).

To solve your problem, you would need to request a block of memory that is 4-byte aligned and copy the non-aligned bytes + fill it with garbage bytes to ensure it is 4 byte-aligned (hence perform data structure alignment manually). Then, you can interpret that object as 4-byte aligned from its new address.

From TurboJ in comments, the explicit error:

Cortex-M3 and M4 allow unaligned access by default. But they do not allow unalinged access with the STRD instruction, hence the fault.

You may also find it helpful to look into this for forcing data structure alignment on ARM.

Ephram answered 16/8, 2013 at 8:39 Comment(9)

on x86 the performance hit is only when your access crosses a 64B line boundary (or - god forbid - a page boundary). Merely accessing a couple of unaligned bytes inside a line doesn't matter, you just get the entire line cached. – Malinin 16/8, 2013 at 9:46

@Leeor, I put my initial answer as a community wiki since ARM is not a field I am too familiar with (letting other people contribute towards one collective, strong answer). There are quite a few details I am sure that I am overlooking or not stating, feel free to edit the original post respectively. – Ephram 16/8, 2013 at 9:49

Thanks! The problem was that the unaligned address originated from a number of calculations that included sizeof() and the address was aligned to 2-byte. I will modify that part of the code to align on 4-byte. – Zelikow 16/8, 2013 at 13:56

I'm still curious why the first STR with one register works but not the STR with 2 registers. Any ideas on that? Are there any compiler options for these alignment issues? – Zelikow 16/8, 2013 at 13:58

@Leeor: OP stated that this is code generated by a C++ compiler, so I'm not sure how that last link in your answer is relevant. – Hermelindahermeneutic 16/8, 2013 at 14:15

Cortex-M3 and M4 allow unaligned access by default. But they do not allow unalinged access with the STRD instruction, hence the fault. – Connally 16/8, 2013 at 15:3

ARMs are not microcoded, x86 have been and/or are. arm and x86 are going to have same/similar punishments for unaligned accesses, in both cases it depends on the bus width as to whether it has to do one or two complete cycles. The cache isnt the issue with the performance hit, cached or not, arm or x86, the performance hit is there if two transfers are required rather than one. The cache can simply multiply the punishment by some amount. – Bicollateral 16/8, 2013 at 18:26

@dwelch - the penalty is not merely doing another access, in some cases a split makes some internal checks more complicated (coherence, forwarding, locking, etc..), so an out-of-order CPU (say, an i7) may penalize these loads to be performed in-order. That's a significant problem as it serializes otherwise parallel operations. Arm-M3/4 won't have this specific problem (vs. properly aligned accesses) – Malinin 21/8, 2013 at 18:58

there are penalties, some worse than others depending on the architecture/platform and the situation at the time of the instruction/transfer. – Bicollateral 21/8, 2013 at 19:8

Following is true for ARM architecture at least (verified on cortex M0):

When using load and store instructions, the memory that we access must be divisible by the number of bytes we are trying to access from/to the memory, or we will get an hard fault exception.

eg:

LDR r0, = 0x1001
LDR r1, [r0]

The second line in the above code will give hard fault since are trying to read 4 bytes but the memory address is not divisible by 4

If we change the second line in above code to the following

LDRB r1, [r0];//Load 1 byte from address

The above line will not produce a hard fault, since we are trying to access 1 byte(1 byte can be accessed from any memory location)

Also notice the following example;

LDR r0,= 0x1002
LDRH r1,[r0];   //Load half word from 0x1002

The above line will not produce a hard fault, since the memory access is 2 bytes and the address is divisible by 2.

Ramonramona answered 26/8, 2014 at 2:54 Comment(1)

Looks like for Cortex-M3 atleast non-word aligned address with LDR and STR instructions support unaligned access and only generate alignment faults only when the UNALIGN_TRP bit is '1' in Configuration Control Register. – Globular 25/7, 2015 at 12:40

As you've discovered, Cortex-M4 supports 4-byte unaligned access but not 8-byte unaligned access. The latter is explained in the documentation of the UFSR.UNALIGNED bit:

UNALIGNED - Indicates an unaligned access operation occurred. Unaligned multiple word accesses, such as accessing a uint64_t that is not 8-byte aligned, will always generate this fault. With the exception of Cortex-M0 MCUs, whether or not unaligned accesses below 4 bytes generate a fault is also configurable.

The 8-byte access can be a STR instruction (as in your example) or simply accessing a uint64_t.

Reticule answered 14/11, 2022 at 1:42 Comment(0)

Recommended topics

Hot tags