Is it possible to call a relative address with each instruction at most 3 bytes long, in 32-bit mode?

Asked 29/7, 2019 at 21:23 Answered 30/7, 2019 at 4:19

I'm working on an exercise in x86 assembly (using NASM) that has the niche requirement of limiting each instruction to a maximum of 3 bytes.

I'd like to call a label, but the normal way to do this (shown in the code example) always results in an instruction size of 5 bytes. I'm trying to find out if there's a series of instructions, 3 bytes or less each, that can accomplish this.

I've attempted to load the label address into a register and then call that register, but it seems like the address is then interpreted as an absolute address, instead of a relative one.

I looked around to see if there's a way to force call to interpret the address in the register as a relative address, but couldn't find anything. I have thought about simulating a call by pushing a return address to the stack and using jmp rel8, but am unsure how to get the absolute address of where I want to return to.

Here is the normal way to do what I want:

[BITS 32]

call func     ; this results in a 5-byte call rel32 instruction
; series of instructions here that I would like to return to

func:
  ; some operations here
  ret

I have tried things like this:

[BITS 32]

mov eax, func          ; 5-byte  mov r32, imm32
call eax               ; 2-byte  call r32
          ; this fails, seems to interpret func's relative address as an absolute
 ...   ; series of instructions here that I would like to return to

func:
  ; some operations here
  ret

I have a feeling there may be a way to do this using some sort of LEA magic, but I'm relatively new to assembly so I couldn't figure it out.

Any tips are appreciated!

Trilateration answered 29/7, 2019 at 21:23 Comment(12)

mov eax, func; call func indeed uses absolute address but that should still work. Unless you load the code elsewhere than where it was assembled for. In that case you want position independent code. However mov eax, func will again be 5 bytes exceeding your limit. – Mesarch 29/7, 2019 at 21:26

There is no such thing as relative indirect near call. – Bridesmaid 29/7, 2019 at 21:27

In 32 bit mode the usual trick to get the current EIP for position independence unfortunately relies on a call that is exactly the one you want to avoid. Depending on your exact requirements and environment (e.g. executable stack) you can cheat by creating longer instructions dynamically :) – Mesarch 29/7, 2019 at 21:31

Also the 16 bit call which would fit in 3 bytes unfortunately zeroes the top 16 bits of EIP, it does not simply limit the offset to 16 bits which would work for us. And you'd need a prefix so that would again be more than 3 bytes. Eh. – Mesarch 29/7, 2019 at 21:32

What is the requirement about instructions being 3 bytes or less for? – Bridesmaid 29/7, 2019 at 21:38

@MichaelPetch The concept behind the instruction size limit came from an old university practice set I found online, I expanded on it a bit and ran into this problem. Was just curious if it is in fact (realistically) impossible to accomplish. – Trilateration 29/7, 2019 at 21:44

How will this code be assembled and built? Is this targeting Linux? – Bridesmaid 29/7, 2019 at 21:49

mov eax, func ; call eax is correct. If that doesn't work (after linking), you're building your code wrong. Related (maybe duplicate): Shorter x86 call instruction - mov eax, func is a win when amortized over two 2-byte call eax instructions. – Exegetics 29/7, 2019 at 23:55

@PeterCordes : mov eax, func is a 5 byte instruction. The OP is doing this where the maximum instruction size is 3 bytes. This question isn't a duplicate of the one you show. The OP isn't trying to find the shortest encoding, but an encoding where no one instruction exceeds a 3 byte encoding but still can call a function. – Bridesmaid 30/7, 2019 at 0:5

I'm voting to close this question as off-topic because it's code golf. – Provost 30/7, 2019 at 1:55

@gaskini: I don't understand your reasoning for saying call eax fails. Obviously mov eax, immediate is only encodeable as 5-byte mov r32, imm32. After putting the absolute address into a register, of course call needs to be absolute. (And that's all x86 has available). I edited your example to comment the instruction lengths. What's missing is a 3-byte encoding for lea r32, [EIP + rel8] (because there's no such addressing mode, and 32-bit mode doesn't have PC-relative address generation other than call). – Exegetics 30/7, 2019 at 4:0

I don't think a call with a rel32 (instead of an address) in a register would be useful for anything outside of this code-golf problem, and you'd still have to generate the relative offset in a register somehow. – Exegetics 30/7, 2019 at 4:1

There is no such thing as relative indirect near CALL. You will have to find some other mechanism to do the call to the label func. One method I can think of is building the absolute address in a register and doing an absolute indirect call through the register:

It is unclear what the target of your code is. This assumes you are generating a 32-bit Linux program. I use a linker script to compute the individual bytes of the target label. Those bytes will be used by the program to build a return address in EAX and then an indirect near call via EAX will be performed. A couple methods of building the address are presented.

A linker script link.ld that breaks a label's address into individual bytes:

SECTIONS
{
  . = 0x8048000;
  func_b0 =  func & 0x000000ff;
  func_b1 = (func & 0x0000ff00) >> 8;
  func_b2 = (func & 0x00ff0000) >> 16;
  func_b3 = (func & 0xff000000) >> 24;
}

Assembly code file myprog.asm:

[BITS 32]
global func
extern func_b0, func_b1, func_b2, func_b3

_start:
    ; Method 1
    mov al, func_b3            ; EAX = ######b3
    mov ah, func_b2            ; EAX = ####b2b3
    bswap eax                  ; EAX = b3b2####
    mov ah, func_b1            ; EAX = b3b2b1##
    mov al, func_b0            ; EAX = b3b2b1b0
    call eax

    ; Method 2
    mov ah, func_b3            ; EAX = ####b3##
    mov al, func_b2            ; EAX = ####b3b2
    shl eax, 16                ; EAX = b3b20000
    mov ah, func_b1            ; EAX = b3b2b100
    mov al, func_b0            ; EAX = b3b2b1b0
    call eax

    ; series of instructions here that I would like to return to
    xor eax, eax
    mov ebx, eax               ; EBX = 0 return value
    inc eax                    ; EAX = 1 exit system call
    int 0x80                   ; Do exit system call

func:
    ; some operations here
    ret

Assemble and link with:

nasm -f elf32 -F dwarf myprog.asm -o myprog.o
gcc -m32 -nostartfiles -g -Tlink.ld myprog.o -o myprog

If you run objdump -Mintel -Dx the information of interest would look something similar to:

00000020 g       *ABS*  00000000 func_b0
00000004 g       *ABS*  00000000 func_b2
08048020 g       .text  00000000 func
00000080 g       *ABS*  00000000 func_b1
00000008 g       *ABS*  00000000 func_b3

...

08048000 <_start>:
 8048000:       b0 08                   mov    al,0x8
 8048002:       b4 04                   mov    ah,0x4
 8048004:       0f c8                   bswap  eax
 8048006:       b4 80                   mov    ah,0x80
 8048008:       b0 20                   mov    al,0x20
 804800a:       ff d0                   call   eax
 804800c:       b4 08                   mov    ah,0x8
 804800e:       b0 04                   mov    al,0x4
 8048010:       c1 e0 10                shl    eax,0x10
 8048013:       b4 80                   mov    ah,0x80
 8048015:       b0 20                   mov    al,0x20
 8048017:       ff d0                   call   eax
 8048019:       31 c0                   xor    eax,eax
 804801b:       89 c3                   mov    ebx,eax
 804801d:       40                      inc    eax
 804801e:       cd 80                   int    0x80

08048020 <func>:
 8048020:       c3                      ret

Bridesmaid answered 29/7, 2019 at 23:47 Comment(0)

In 32-bit x86, the only way to read your current instruction pointer is to do a call instruction and read the stack. Unless you have the address of a suitable gadget already in a register, you will have to use an immediate relative offset, which is a 5-byte instruction.

(In 64-bit x86, you can also use lea rax, [rip], but that is a 7-byte instruction.)

However, it might be possible to cheat here. If the code that calls your NASM binary always calls your code with something like call edi, then you can just calculate from that register. It's a hack, but so is restricting yourself to 3-byte instructions.

By the way, for a little trick, this is how you can load 32-bit constants in 3-byte (or 2-byte) instructions (loading 0xDEADBEEF as an example):

mov al, 0xDE
mov ah, 0xAD
bswap eax
mov ah, 0xBE
mov al, 0xEF

Tenement answered 29/7, 2019 at 22:9 Comment(2)

Not the only way to read EIP. The only sane way yes, but if you're running in kernel mode then an int instruction will push a return address onto the stack for you. And you can create an interrupt handler that returns that address using only small instructions. – Exegetics 30/7, 2019 at 4:3

@PeterCordes Just thought of another way: on a 64-bit OS, sysret used by the kernel requires the return address be in rcx. Do a system call that the kernel ends with sysret instead of iretq, then read rcx. Crazy but effective. – Tenement 30/7, 2019 at 18:47

A linker script link.ld that breaks a label's address into individual bytes:

SECTIONS
{
  . = 0x8048000;
  func_b0 =  func & 0x000000ff;
  func_b1 = (func & 0x0000ff00) >> 8;
  func_b2 = (func & 0x00ff0000) >> 16;
  func_b3 = (func & 0xff000000) >> 24;
}

Assembly code file myprog.asm:

[BITS 32]
global func
extern func_b0, func_b1, func_b2, func_b3

_start:
    ; Method 1
    mov al, func_b3            ; EAX = ######b3
    mov ah, func_b2            ; EAX = ####b2b3
    bswap eax                  ; EAX = b3b2####
    mov ah, func_b1            ; EAX = b3b2b1##
    mov al, func_b0            ; EAX = b3b2b1b0
    call eax

    ; Method 2
    mov ah, func_b3            ; EAX = ####b3##
    mov al, func_b2            ; EAX = ####b3b2
    shl eax, 16                ; EAX = b3b20000
    mov ah, func_b1            ; EAX = b3b2b100
    mov al, func_b0            ; EAX = b3b2b1b0
    call eax

    ; series of instructions here that I would like to return to
    xor eax, eax
    mov ebx, eax               ; EBX = 0 return value
    inc eax                    ; EAX = 1 exit system call
    int 0x80                   ; Do exit system call

func:
    ; some operations here
    ret

Assemble and link with:

nasm -f elf32 -F dwarf myprog.asm -o myprog.o
gcc -m32 -nostartfiles -g -Tlink.ld myprog.o -o myprog

If you run objdump -Mintel -Dx the information of interest would look something similar to:

00000020 g       *ABS*  00000000 func_b0
00000004 g       *ABS*  00000000 func_b2
08048020 g       .text  00000000 func
00000080 g       *ABS*  00000000 func_b1
00000008 g       *ABS*  00000000 func_b3

...

08048000 <_start>:
 8048000:       b0 08                   mov    al,0x8
 8048002:       b4 04                   mov    ah,0x4
 8048004:       0f c8                   bswap  eax
 8048006:       b4 80                   mov    ah,0x80
 8048008:       b0 20                   mov    al,0x20
 804800a:       ff d0                   call   eax
 804800c:       b4 08                   mov    ah,0x8
 804800e:       b0 04                   mov    al,0x4
 8048010:       c1 e0 10                shl    eax,0x10
 8048013:       b4 80                   mov    ah,0x80
 8048015:       b0 20                   mov    al,0x20
 8048017:       ff d0                   call   eax
 8048019:       31 c0                   xor    eax,eax
 804801b:       89 c3                   mov    ebx,eax
 804801d:       40                      inc    eax
 804801e:       cd 80                   int    0x80

08048020 <func>:
 8048020:       c3                      ret

Bridesmaid answered 29/7, 2019 at 23:47 Comment(0)

In 64-bit code, 2-byte syscall will set RCX = RIP (which the kernel usually uses for sysret), so under most OSes you can make an invalid system call to get RCX=RIP. (e.g. by setting EAX or RAX to -1 with 3-byte or eax,-1, so under Linux syscall will return with RAX = -ENOSYS.) Credit to @Myria for this idea.

It depends on the OS whether this method works: an OS can always return with iret after doing anything it wants to the registers, so it would be possible to design a kernel ABI where this doesn't work. But AFAIK it should work under any of the mainstream OSes. But again, only in long mode. AMD CPUs support syscall in 32-bit mode but it works differently.

In 32-bit code, the only normal/sane way to read EIP is with a call instruction. So it's generally impossible to create position-independent code without using 5-byte call rel32 to get your own address.

(Even self-modifying code would eventually execute a call rel32).

Other answers show ways to jump to a given absolute address using only small instructions. But the target address isn't relative to the address of the machine code, except insofar as the absolute address of the machine code is also known so you can calculate the jump distance at build time.

The same machine code would jump to the same address if loaded somewhere else, not to the same offset relative to its own address.

Perhaps that's all your exercise was asking for.

If not, since we've rules out sane ways to write fully-PIC code, we need to consider insane ways.

Interrupts also push EIP onto the (kernel) stack, where an interrupt handler could access it.

If you're writing a kernel that can include interrupt handlers, you can include one that puts your current address into a register (for example EAX) by reading it from the stack with short instructions (like 3-byte mov eax, [ebp+4] or whatever after setting up a stack-frame).

Then your normal code can invoke that interrupt handler with int 0x81 or whatever (a 3-byte instruction).

Setting up an interrupt-descriptor table should be possible if necessary: we can construct any value in registers using mov r8,imm8 and shifts as shown in other answer. Using this + 2 or 3-byte mov r/m32, r32 or 3-byte mov r/m8, imm8 we can store anything to any absolute address we choose by constructing the address (and optionally value) in a register. This is setup to facilitate being able to run code that queries its own address with a compact "system call" instead of a call rel32.

Actually installing an IDT is possible with 3-byte lidt (0F 01 /3 with a simple addressing mode that uses ModRM + no extra byte). Or query the current location with sidt (same-length encoding).

iret is just 1-byte 0xCF. I don't think any of the necessary system-setup instructions have a minimum length of more than 3 bytes.

Exegetics answered 30/7, 2019 at 4:19 Comment(2)

How do you get the address of the handler to put into the IDT? – Tidemark 31/7, 2019 at 1:48

@prl: you choose a fixed constant address for the handler itself, and for the IDT. This infrastructure then supports PIC code, but is not itself PIC. – Exegetics 31/7, 2019 at 1:49

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags