How to force NASM to encode [1 + rax*2] as disp32 + index*2 instead of disp8 + base + index?
Asked Answered
I

1

9

To efficiently do x = x*10 + 1, it's probably optimal to use

lea   eax, [rax + rax*4]   ; x*=5
lea   eax, [1 + rax*2]     ; x = x*2 + 1

3-component LEA has higher latency on modern Intel CPUs, e.g. 3 cycles vs. 1 on Sandybridge-family, so disp32 + index*2 is faster than disp8 + base + index*1 on SnB-family, i.e. most of the mainstream x86 CPUs we care about optimizing for. (This mostly only applies to LEA, not loads/stores, because LEA runs on ALU execution units, not the AGUs in most modern x86 CPUs.) AMD CPUs have slower LEA with 3 components or scale > 1 (http://agner.org/optimize/)

But NASM and YASM will optimize for code-size by using [1 + rax + rax*1] for the 2nd LEA, which only needs a disp8 instead of a disp32. (Addressing modes always have a base register or a disp32).

i.e. they always split reg*2 into base+index, because that's never worse for code-size.

I can force using a disp32 with lea eax, [dword 1 + rax*2], but that doesn't stop NASM or YASM from splitting the addressing mode. The NASM manual doesn't seem to document a way to use the strict keyword on the scale factor, and [1 + strict rax*2] doesn't assemble. Is there a way to use strict or some other syntax to force the desired encoding of an addressing mode?


nasm -O0 to disable optimizations doesn't work. Apparently that only controls multi-pass branch-displacement optimization, not all optimizations NASM makes. Of course you don't want to do that in the first place for a whole source file, even if it did work. I still get

8d 84 00 01 00 00 00    lea    eax,[rax+rax*1+0x1]

The only workaround I can think of is to encode it manually with db. This is quite inconvenient. For the record, the manual-encoding is:

db 0x8d, 0x04, 0x45  ; opcode, modrm, SIB  for lea eax, [disp32 + rax*2]
dd 1                 ; disp32

The scale factor is encoded in the high 2 bits of the SIB byte. I assembled lea eax, [dword 1 + rax*4] to get the machine code for the right registers, because NASM's optimization only works for *2. The SIB was 0x85, and decrementing that 2-bit field at the top of the byte reduced the scale factor from 4 to 2.


But the question is: how to write it in a nicely readable way that makes it easy to change registers, and get NASM to encode the addressing mode for you? (I suppose a giant macro could do this with text processing and manual db encoding, but that's not really the answer I'm looking for. I don't actually need this for anything right now, I mostly want to know if NASM or YASM has syntax to force this.)

Other optimizations I'm aware of, like mov rax, 1 assembling to 5-byte mov eax,1 are pure wins on all CPUs unless you want longer instructions to get padding without NOPs, and can be disabled with mov rax, strict dword 1 to get the 7-byte sign-extended encoding, or strict qword for 10-byte imm64.


gas doesn't do this or most other optimizations (only sizes of immediates and branch displacements): lea 1(,%rax,2), %eax assembles to
8d 04 45 01 00 00 00 lea eax,[rax*2+0x1], and same for the .intel_syntax noprefix version.

Answers for MASM or other assemblers would also be interesting, though.

Insensate answered 18/2, 2018 at 3:44 Comment(0)
J
8

NOSPLIT:

Similarly, NASM will split [eax*2] into [eax+eax] because that allows the offset field to be absent and space to be saved; in fact, it will also split [eax*2+offset] into [eax+eax+offset].
You can combat this behaviour by the use of the NOSPLIT keyword: [nosplit eax*2] will force [eax*2+0] to be generated literally.
[nosplit eax*1] also has the same effect. In another way, a split EA form [0, eax*2] can be used, too. However, NOSPLIT in [nosplit eax+eax] will be ignored because user's intention here is considered as [eax+eax].

lea eax, [NOSPLIT 1+rax*2]
lea eax, [1+rax*2]

00000000  8D044501000000    lea eax,[rax*2+0x1]
00000007  8D440001          lea eax,[rax+rax+0x1]
Jimmyjimsonweed answered 18/2, 2018 at 12:16 Comment(4)
Thanks, I thought I remembered seeing syntax for this mentioned somewhere. I missed it when searching today because I assumed it would involve strict. (And I didn't search super hard because I wanted to write up the performance part on SO :P)Insensate
You are welcome @Peter. Nasm doc is missing a keyword glossary IMO, I had to look at the source code. I was looking at what the other assembler do: TASM doesn't optimize, YASM has NOSPLIT, MASM 5 or older shouldn't optimize, new MASM I don't know (Not sure if I can find it w/o Visual Studio and make it work on Debian).Jimmyjimsonweed
Hmm, yeah, a glossary would be useful as well as the existing index, where you have to know what you're looking for. In hindsight, it would have made sense to look at the "effective address" section of the manual.Insensate
@MargaretBloom €ASM allows to switch off the optimization with keyword operand SCALE=VERBATIM.Nevus

© 2022 - 2024 — McMap. All rights reserved.