Do the ARM instructions ldrex/strex have to operate on cache aligned data?

S

3

On Intel, the arguments to CMPXCHG must be cache line aligned (since Intel uses MESI to implement CAS).

On ARM, ldrex and strex operate on exclusive reservation granuales.

To be clear, does this then mean on ARM the data being operated upon does not have to be cache line aligned?

Sloppy answered 8/7, 2012 at 12:23 Comment(0)

S

1

It says so right in the ARM Architecture Reference Manual A.3.2.1 "Unaligned data access". LDREX and STREX require word alignment. Which makes sense, because an unaligned data access can span exclusive reservation granules.

Sorensen answered 8/7, 2012 at 14:4 Comment(12)

I read ERG length is between 8 and 2048 bytes, in multiples of two. If ERG length is say 10 bytes, you would cross the ERG boundary with an aligned access. Is ERG length something other than multiples of two? – Sloppy 8/7, 2012 at 14:18

word alignment and cache line aligned are two different things – Hegelian 8/7, 2012 at 14:30

in the x86 world a "word" is 16 bits, two bytes. in arm a "word" is 32 bits, so word aligned means the two lsbits of the address are zero – Hegelian 8/7, 2012 at 14:31

@BlankXavier A.3.4.3 says that the ERG size is a power of two. – Sorensen 8/7, 2012 at 14:40

@dwelch: on Intel I cache-line align the CAS targets and pad their cache line so they're not disturbed by other activity nor do they disturb others. On ARM, I was doing the same (force of habit) but then ran into the problem of needing to align against cache-line AND ERG boundary, and wanting to compute that value in a #define (which is impossible) so people could if they wished use the stack for allocation. My concern here really is not normal alignment (e.g. word alignment) but cache-line alignment (a la Intel) - is it necessary on ARM. The answer is no (although word alignment is). – Sloppy 8/7, 2012 at 14:46

@Chen: can you provide a ref to your source? I have this link infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0008a/… where I find "The ERG is implementation defined, in the range 8-2048 bytes, in multiples of two bytes." – Sloppy 8/7, 2012 at 14:48

@BlankXavier I thought I already gave the reference: A.3.4.3 of the ARM Architecture Reference Manual. "Tagged_address = Memory_address[31:a]. The value of a in this assignment is IMPLEMENTATION DEFINED, between a minimum value of 3 and a maximum value of 11. The size of the tagged memory block called the Exclusives Reservation Granule." – Sorensen 8/7, 2012 at 15:2

@BlankXavier Finding an URL for the ARM Architecture Reference Manual is left as an exercise. – Sorensen 8/7, 2012 at 15:3

@Chen: I think I have the URL for the ARM ARM on arm.com - the reason I ask for a ref is because it is not obvious how to find that subsection, or that it exists in the on-line version; and a google.com site:arm.com search does not find that string or substrings of it. – Sloppy 8/7, 2012 at 15:6

@BlankXavier I had no trouble searching on the text. – Sorensen 8/7, 2012 at 15:12

@Chen: I limited that particular substring search to arm.com... :-/ also I was looking for the ARM ARM; I saw when searching references to the errata but dismissed them... anyways, I have it now. Thankyou! – Sloppy 8/7, 2012 at 15:18

there are multiple arm-arms, should use the one most closely related to the family/core. Also will need the trm, and the amba/axi spec. start by searching for ldrex or strex (in the arm arm or trm) and then maybe the word exclusive or shared. – Hegelian 8/7, 2012 at 15:36

H

2

Exclusive access restrictions

The following restrictions apply to exclusive accesses:

• The size and length of an exclusive write with a given ID must be the same as the size and length of the preceding exclusive read with the same ID.

• The address of an exclusive access must be aligned to the total number of bytes in the transaction.

• The address for the exclusive read and the exclusive write must be identical.

• The ARID field of the read portion of the exclusive access must match the AWID of the write portion.

• The control signals for the read and write portions of the exclusive access must be identical.

• The number of bytes to be transferred in an exclusive access burst must be a power of 2, that is, 1, 2, 4, 8, 16, 32, 64, or 128 bytes.

• The maximum number of bytes that can be transferred in an exclusive burst is 128.

• The value of the ARCACHE[3:0] or AWCACHE[3:0] signals must guarantee that the slave that is monitoring the exclusive access sees the transaction. For example, an exclusive access being monitored by a slave must not have an ARCACHE[3:0] or AWCACHE[3:0] value that indicates that the transaction is cacheable.

Failure to observe these restrictions causes Unpredictable behavior.

The above is from the AMBA/AXI spec. You will find that AWLOCK/ARLOCK is ignored by some vendors (meaning ldrex/strex wont work outside the core). I have some code that demonstrates this, or at least will if you find a system that doesnt support exclusive access.

https://github.com/dwelch67/raspberrypi/tree/master/extest

Depending on the task and how portable you want to be you may need to provide swp and ldrex/strex solutions surrounded by ifdefs and/or use the plethora of registers available (runtime) to tell you what instructions are or are not supported by the core you are running on. (you may find in at least one case neither swp nor ldrex/strex are supported).

Hegelian answered 8/7, 2012 at 15:32 Comment(0)