x86-32 / x86-64 polyglot machine-code fragment that detects 64bit mode at run-time?
Asked Answered
A

1

11

Is it possible for the same bytes of machine code to figure out whether they're running in 32 or 64 bit mode, and then do different things?

i.e. write polyglot machine code.

Normally you can detect at build time with #ifdef macros. Or in C, you could write an if() with a compile-time constant as the condition, and have the compiler optimize away the other side of it.

This is only useful for weird cases, like maybe code injection, or just to see if it's possible.


See also: a polyglot ARM / x86 machine code to branch to different addresses depending on which architecture is decoding the bytes.

Adila answered 27/6, 2016 at 21:27 Comment(0)
A
13

The easiest way is by using the one-byte inc opcodes that are repurposed as REX prefixes in 64bit mode. A REX prefix has no effect on jcc, so you can do:

xor    eax,eax       ; clear ZF
db  0x40             ; 32bit: inc eax.   64bit: useless REX prefix
jz   .64bit_mode     ; REX jcc  works fine

See also a 3-way polyglot that returns 16, 32, or 64 according to the mode it executes in: Determine your language's version on codegolf.SE.


Reminder: normally you don't want this as part of a compiled binary. Detect mode at build time so any decision based on this can optimize away instead of being done at runtime. e.g. with #ifdef __x86_64__ and/or sizeof(void*) (but don't forget that the ILP32 x32 ABI has 32-bit pointers in long mode).


Here's a full Linux/NASM program that uses syscall to exit(1) if run as 64bit, or int 0x80 to exit(0) if run as 32bit.

The use of BITS 32 and BITS 64 ensure that it assembles to the same machine code either way. (And yes, I checked with objdump -d to show the raw machine-code bytes)

Even so, I used db 0x40 instead of inc eax, to make it clearer what's special.

BITS 32
global _start
_start:
        xor    eax,eax          ; clear ZF
        db 0x40                 ; 32bit: inc eax.  64bit: useless REX prefix
        jz      .64bit_mode     ; REX jcc  still works

        ;jmp .64bit_mode   ; uncomment to test that the 64bit code does fault in a 32bit binary

.32bit_mode:
        xor     ebx,ebx
        mov     eax, 1          ; exit(0)
        int     0x80


BITS 64
.64bit_mode:
        lea  rdx, [rel _start]      ; An instruction that won't assemble in 32-bit mode.
        ;; arbitrary 64bit code here

        mov  edi, 1
        mov  eax, 231    ;  exit_group(1).
        syscall          ; This does SIGILL if this is run in 32bit mode on Intel CPUs

;;;;; Or as a callable function:
BITS 32
am_i_32bit:  ;; returns false only in 64bit mode
        xor     eax,eax

        db 0x40                 ; 32bit: inc eax
                                ; 64bit: REX.W=0
        ;nop                     ; REX nop  is  REX xchg eax,eax
        ret                     ; REX ret works normally, too

Tested and working. I build it twice to get different ELF metadata around the same machine code.

$ yasm -felf64 -Worphan-labels -gdwarf2 x86-polyglot-32-64.asm && ld -o x86-polyglot.64bit x86-polyglot-32-64.o
$ yasm -felf32 -Worphan-labels -gdwarf2 x86-polyglot-32-64.asm && ld -melf_i386 -o x86-polyglot.32bit x86-polyglot-32-64.o
$ ./x86-polyglot.32bit && echo 32bit || echo 64bit
32bit
$ ./x86-polyglot.64bit && echo 32bit || echo 64bit
64bit

(build commands from Assembling 32-bit binaries on a 64-bit system (GNU toolchain), linked from the FAQ section in the tag wiki).

Adila answered 27/6, 2016 at 21:27 Comment(8)
Minor correction: syscall is valid on most AMD cpus in 32 bit mode.Shayna
@Jester: Thanks, I was wondering why it disassembled without complaint in 32bit mode, and assembled in an earlier version of the code. But it did work for me (on an Intel Merom) to confirm that I got a SIGILL from running the wrong branch in 32bit mode. (the lea just decodes to a dec and an lea with a different but still valid addressing mode.) Anyway, fixed the comment :)Adila
AMD invented syscall and Intel invented sysenter. Of course AMD kept syscall when creating the 64 bit mode so when Intel adopted that, they also got syscall.Shayna
@Jester: Right, but I thought they only added it as part of AMD64, not as a new instruction in 32bit mode as well. (I usually only look at Intel's insn set reference, where it just says "invalid" in compat/legacy mode, with no footnotes, so I guess that's how I arrived at that conclusion.)Adila
Haven't checked history, but I think the syscall/sysenter came before 64 bit.Shayna
@Jester: Actually I'm not sure about that. They definitely modified the behaviour of syscall while designing AMD64. e.g. feedback from Linux kernel devs is what led to saving RFLAGS into R11. (and masking RFLAGS on entry into the kernel, to close a window of vulnerability). See this post for links to relevant mailing list archives. If syscall existed in 32bit hardware before those posts, it must have different behaviour in legacy mode.Adila
Apparently the 32 bit syscall specification has been published as SYSCALL and SYSRET Instruction Specification Application Note, order# 21086 in May 1997, three years before the mailing list discussion you linked for the 64 bit code, and the 32 bit AMD K6-2 series processors supported it since 1998. PS: yes, it works differently in 32 bit mode.Shayna
See also marcinchmiel.com/articles/2017-07/polyglot-assembly-101 for using REX in a polyglot.Regeneration

© 2022 - 2024 — McMap. All rights reserved.