Range of immediate values in ARMv8 A64 assembly
Asked Answered
J

3

9

My understanding is that immediate parameters in ARMv8 A64 assembly can be 12 bits long. If that is the case, why does this line of assembly code:

AND X12, X10, 0xFEF 

Produce this error (when compiled with gcc)

Error:  immediate out of range at operand 3 -- `AND X12, X10, 0xFEF'

Interestingly enough, this line of assembly code compiles fine:

ADD X12, X10, 0xFEF

I'm using aarch64-linux-gnu-gcc (Linaro GCC 2014.11) 4.9.3 (prerelease)

Jerid answered 18/6, 2015 at 0:37 Comment(0)
L
16

Unlike A32's "flexible second operand", there is no common immediate format in A64. For immediate-operand data-processing instructions (ignoring the boring and straightforward ones like shifts),

  • Arithmetic instructions (add{s}, sub{s}, cmp, cmn) take a 12-bit unsigned immediate with an optional 12-bit left shift.
  • Move instructions (movz, movn, movk) take a 16-bit immediate optionally shifted to any 16-bit-aligned position within the register.
  • Address calculations (adr, adrp) take a 21-bit signed immediate, although there's no actual syntax to specify it directly - to do so you'd have to resort to assembler expression trickery to generate an appropriate "label".
  • Logical instructions (and{s}, orr, eor, tst) take a "bitmask immediate", which I'm not sure I can even explain, so I'll just quote the mind-bogglingly complicated definition:

Such an immediate is a 32-bit or 64-bit pattern viewed as a vector of identical elements of size e = 2, 4, 8, 16, 32, or 64 bits. Each element contains the same sub-pattern: a single run of 1 to e-1 non-zero bits, rotated by 0 to e-1 bits. This mechanism can generate 5,334 unique 64-bit patterns (as 2,667 pairs of pattern and their bitwise inverse).

Levi answered 18/6, 2015 at 1:32 Comment(6)
This explanation makes a little more sense: "The logical immediate instructions accept a bitmask immediate bimm32 or bimm64. Such an immediate consists EITHER of a single consecutive sequence with at least one non-zero bit, and at least one zero bit, within an element of 2, 4, 8, 16, 32 or 64 bits; the element then being replicated across the register width, or the bitwise inverse of such a value. "Jerid
The arm bitmask immedate field is 13 bits (from what I can tell). Does anybody know precisely how those bits are interpreted (i.e., the algorithm for converting these 13 bits into a 32 or 64-bit value)? Why isn't that algorithm easy to find?Jerid
I found some code here that may be helpful: llvm.org/docs/doxygen/html/…Jerid
@Jerid As with everything, the full, authoritative, definition can be found in the instruction pseudocode in the ARM ARM (free to download, but you have to sign up to accept the license). In this case it's the DecodeBitMasks() function in the pseudocode appendix (page J8-5588 in issue A.f).Levi
Immediates for bitwise instructions aren't that hard to at least summarize: a repeating pattern, where within one element the set bits have to be contiguous.Tredecillion
There's a fifth one, used for floating point immediates. It's weird.Preceptive
J
15

Here is a piece of code to dump all legal bitmask immediates following the mechanism quoted in Notlikethat's answer. Hope it helps to understand how the rule for generating bitmask immediates work.

#include <stdio.h>
#include <stdint.h>

// Dumps all legal bitmask immediates for ARM64
// Total number of unique 64-bit patterns: 
//   1*2 + 3*4 + 7*8 + 15*16 + 31*32 + 63*64 = 5334

const char *uint64_to_binary(uint64_t x) {
  static char b[65];
  unsigned i;
  for (i = 0; i < 64; i++, x <<= 1)
    b[i] = (0x8000000000000000ULL & x)? '1' : '0';
  b[64] = '\0';
  return b;
}

int main() {
  uint64_t result;
  unsigned size, length, rotation, e;
  for (size = 2; size <= 64; size *= 2)
    for (length = 1; length < size; ++length) {
      result = 0xffffffffffffffffULL >> (64 - length);
      for (e = size; e < 64; e *= 2)
        result |= result << e;
      for (rotation = 0; rotation < size; ++rotation) {
        printf("0x%016llx %s (size=%u, length=%u, rotation=%u)\n",
            (unsigned long long)result, uint64_to_binary(result),
            size, length, rotation);
        result = (result >> 63) | (result << 1);
      }
    }
  return 0;
}
Jailbird answered 21/10, 2015 at 16:54 Comment(1)
You can run the output of this through sort to make it easier to read.Ponzo
C
3

An alternative explanation of bitmask immediates, now that is is morning and I finally understood the "mind-boggingly complicated" definition. (See Notlikethat's answer.) Maybe it would be easier for some to understand.

It is X>0 consecutive zeros followed by Y>0 consecutive ones, where X+Y is a power of 2, repeated to fill the whole argument and then rotated arbitrarily.

Also note that optional shifts in other immediate formats are by exact amounts of bits, not "up to". That is, the 16-bit immediates can be shifted by 0, 16, 32 or 48 bits exactly, while 12-bit immediates only by 0 or 12 bits.

Craddock answered 22/11, 2021 at 10:20 Comment(2)
Interestingly, and x13, x13, #0 and #-1 aren't encodeable. That's good, they didn't waste coding space on useless immediates: 0 as an operand for bitwise-booleans is either a NOP or a zeroing operation, and eor x13, x13, x13 or sub can still produce a zero with a data-dependency on the input (std::memory_order_consume) in one instruction. And that's so rarely needed it would have been fine to need two instructions for that. eor with -1 is just a NOT, which there's an instruction for. or with -1 would produce -1 with a data dependency; can be done in 2 insns.Tredecillion
That makes sense: a 6-bit field can encode numbers from 0..63 or 1..64, but not 0..64. And making one of the encodings special, you still have enough room to encode 1..63, with handles bit-ranges from one 1 (rest 0s) to one 0 (rest 1s). I didn't check if that's how the machine code actually works.Tredecillion

© 2022 - 2024 — McMap. All rights reserved.