MOVing between two memory addresses
Asked Answered
O

6

33

I'm trying to learn assembly (so bear with me) and I'm getting a compile error on this line:

mov byte [t_last], [t_cur]

The error is

error: invalid combination of opcode and operands

I suspect that the cause of this error is simply that its not possible for a mov instruction to move between two memory addresses, but half an hour of googling and I haven't been able to confirm this - is this the case?

Also, assuming I'm right that means I need to use a register as an intermediate point for copying memory:

mov cl, [t_cur]
mov [t_last], cl

Whats the recommended register to use (or should I use the stack instead)?

Owing answered 19/8, 2009 at 10:46 Comment(4)
sometimes is better go to the source instead of googling, here for example is Intel 64 & IA-32 instructions A-M, where you can see operand combinations for mov, intel.com/Assets/PDF/manual/253666.pdfWearing
There're exceptions to the rule that an instruction cannot take two memory operands; see here.Cuzco
Another question about multiple memory operands is here: #52574054Cere
Basically a duplicate of Why isn't movl from memory to memory allowed? which explains some CPU-architecture / ISA-design reasons why not.Heckman
A
40

Your suspicion is correct, you can't move from memory to memory.

Any general-purpose register will do. Remember to PUSH the register if you are not sure what's inside it and to restore it back once done.

Apophysis answered 19/8, 2009 at 10:48 Comment(2)
Is there any advantage to using a register over pushing the data itself onto the stack?Owing
Pushing on and later popping from the stack adds two additional memory accesses.Clausewitz
T
7

It's really simple in 16 bit, just do the following:

     push     di
     push     si
     push     cx
     mov      cx,(number of bytes to move)
     lea      di,(destination address)
     lea      si,(source address)
     rep      movsb
     pop      cx
     pop      si
     pop      di

Note: the pushes & pops are neceessary if you need to save the contents of the registers.

Turf answered 10/7, 2012 at 0:49 Comment(3)
+1, since in some circumstances it's good to know all tools in your toolbox. rep movsb/movsw are 1 byte opcodes, IIRCCanzona
Depending on the architecture, you can use pusha instead of pushing all the registers individually and popa instead of popping them all.Main
This works in 32 and 64 bit as well, except it uses the registers for that bit systemTabatha
H
7

That's correct, x86 machine code can't encode an instruction with two explicit memory operands (arbitrary addresses specified in [])

Whats the recommended register

Any register you don't need to save/restore.

In all the mainstream 32-bit and 64-bit calling conventions, EAX, ECX, and EDX are call-clobbered, so AL, CL, and DL are good choices. For a byte or word copy, you typically want a movzx load into a 32-bit register, then an 8-bit or 16-bit store. This avoids a false dependency on the old value of the register. Only use a narrow 16 or 8-bit mov load if you actively want to merge into the low bits of another value. x86's movzx is the analogue of instructions like ARM ldrb.

    movzx   ecx,  byte [rdi]       ; load CL, zero-extending into RCX
    mov    [rdi+10], cl

In 64-bit mode, SIL, DIL, r8b, r9b and so on are also fine choices, but require a REX prefix in the machine code for the store so there's a minor code-size reason to avoid them.

Generally avoid writing AH, BH, CH, or DH for performance reasons, unless you've read and understood the following links and any false dependencies or partial-register merging stalls aren't going to be a problem or happen at all in your code.


(or should I use the stack instead)?

First of all, you can't push a single byte at all, so there's no way you could do a byte load / byte store from the stack. For a word, dword, or qword (depending on CPU mode), you could push [src] / pop [dst], but that's a lot slower than copying via a register. It introduces an extra store/reload store-forwarding latency before the data can be read from the final destination, and takes more uops.

Unless somewhere on the stack is the desired destination and you can't optimize that local variable into a register, in which case push [src] is just fine to copy it there and allocate stack space for it.

See https://agner.org/optimize/ and other x86 performance links in the x86 tag wiki

Heckman answered 25/11, 2018 at 18:42 Comment(0)
G
6

It is technically possible to move from memory to memory.

Try using MOVS (move string), and setting [E]SI and [E]DI, depending on whether you want to transfer byte(s), word(s), etc.

    mov si, t_cur    ; Load SI with address of 't_cur'
    mov di, t_last   ; Load DI with address of 't_last'
    movsb            ; Move byte from [SI] to [DI]

    ; Some dummy data
    t_cur    db 0x9a ; DB tells NASM that we want to declare a byte
    t_last   db 0x7f ; (See above)

Note however that this is less efficient than executing MOV twice, but it does execute the copy in a single instruction.

Here's how MOVS should be used, and how it works: https://www.felixcloutier.com/x86/movs:movsb:movsw:movsd:movsq

The instruction MOVS is almost never used on its own, and is for the most part used in conjunction with a REP prefix.

Modern CPUs have fairly efficient implementations of rep movs that is close to the speed of a loop using AVX vector load/store instructions.

    ; - Assuming that 't_src' and 't_dst' are valid pointers
    mov esi, t_src  ; Load ESI with the address of 't_src'
    mov edi, t_dst  ; Load EDI with the address of 't_dst'
    mov ecx, 48     ; Load [ER]CX with the count (let's say 48 dwords =   blocks)
    rep movsd       ; Repeat copying until ECX == 0

Logically the copy happens in 48 copies of 4-byte dword chunks, but really modern CPUs (fast strings / ERMSB) will use 16 or 32-byte chunks for efficiency.

This manual explains how REP should be used, and how it works: https://www.felixcloutier.com/x86/rep:repe:repz:repne:repnz

Garrulity answered 16/6, 2019 at 18:33 Comment(2)
+1 for explaining what movsb is and linking the manual, unlike other existing answers that propose movs. I made some edits to your answer to not actually recommend doing this, because there's no benefit vs. mov-load + mov-store with a temp register like AL. (Except maybe atomicity wrt. interrupts on a uniprocessor system, but that's a very specific use-case and not something that helps in general.) But anyway, welcome to Stack Overflow :)Heckman
You also have to make sure the segment register DS and ES are set appropriately. That may not be the case if writing DOS programs that may use multiple segments where ES != DS. The move copies from DS:E(SI) to ES:E(DI). The OP didn't make mention of OS so one may not be ble to guarantee DS==ESGalah
D
4

There's also a MOVS command from moving data from memory to memory:

MOV SI, OFFSET variable1
MOV DI, OFFSET variable2
MOVS
Divinadivination answered 19/8, 2009 at 11:1 Comment(4)
Will work, but it requires extra care: you need to save si and di registers. I guess it's not worth it for copying one byte.Intercollegiate
The string commands on x86 can be considered obsolete. Never use them. They are never faster than copying "by hand", but in most cases much slower.Clausewitz
@hirschhornsalz, sorry to necromance, but do you have any detailed info about the string commands being essentially obsolete?Schear
string operations with rep are fast again thanks to new features like ERMS and FSRM. see clapdrop's answer belowDifference
V
-1

Just want to discuss "memory barrier" with you. In c code

a = b;//Take data from b and puts it in a

would be assembled to

mov %eax, b # suppose %eax is used as the temp
mov a, %eax

The system cannot guarantee the atomicity of the assignment. That's why we need a rmb (read barrier)

Vadnee answered 17/7, 2013 at 7:34 Comment(2)
x86 can't atomically copy from memory to memory. Barriers don't create atomicity, they only stop reordering (compile time or run-time or both, depending on the barrier).Heckman
@YuvalKeysar: your edit left one bug unfixed (which I hadn't noticed before): in AT&T syntax, destination comes 2nd. This asm actually stores EAX into b, then loads EAX from a. This answer just needs to be deleted, IMO, because the discussion about barriers is nonsense.Heckman

© 2022 - 2024 — McMap. All rights reserved.