Can you enter x64 32-bit "long compatibility sub-mode" outside of kernel mode?
Asked Answered
P

1

13

This might be an exact duplicate of Is it possible to execute 32-bit code in 64-bit process by doing mode-switching?, but that question is from a year ago and only has one answer that doesn't give any source code. I'm hoping for more detailed answers.

I'm running 64-bit Linux (Ubuntu 12.04, if it matters). Here's some code that allocates a page, writes some 64-bit code into it, and executes that code.

#include <assert.h>
#include <malloc.h>
#include <stdio.h>
#include <sys/mman.h>  // mprotect
#include <unistd.h>  // sysconf

unsigned char test_function[] = { 0xC3 };  // RET
int main()
{
    int pagesize = sysconf(_SC_PAGE_SIZE);
    unsigned char *buffer = memalign(pagesize, pagesize);
    void (*func)() = (void (*)())buffer;

    memcpy(buffer, test_function, sizeof test_function);

    // func();  // will segfault 
    mprotect(buffer, pagesize, PROT_EXEC);
    func();  // works fine
}

Now, purely for entertainment value, I'd like to do the same thing but with buffer containing arbitrary 32-bit (ia32) code, instead of 64-bit code. This page implies that you can execute 32-bit code on a 64-bit processor by entering "long compatibility sub-mode", by setting the bits of the CS segment descriptor as LMA=1, L=0, D=1. I am willing to wrap my 32-bit code in a prologue/epilogue that performs this setup.

But can I do this setup, in Linux, in usermode? (BSD/Darwin answers will also be accepted.) This is where I start to get really hazy on the concepts. I think the solution involves adding a new segment descriptor to the GDT (or is it the LDT?), and then switching to that segment via an lcall instruction. But can all that be done in usermode?

Here's a sample function that should return 4 when successfully run in compatibility sub-mode, and 8 when run in long mode. My goal is to get the instruction pointer to take this codepath and come out the other side with %rax=4, without ever dropping into kernel mode (or doing so only via documented system calls).

unsigned char behave_differently_depending_on_processor_mode[] = {
    0x89, 0xE0,  // movl %esp, %eax
    0x56,        // push %{e,r}si
    0x29, 0xE0,  // subl %esp, %eax
    0x5E,        // pop %{e,r}si
    0xC3         // ret
};
Poulin answered 3/10, 2012 at 20:31 Comment(6)
Isn't that related to x32 en.wikipedia.org/wiki/X32_ABI ? Or the same question as answered in https://mcmap.net/q/19861/-allocating-a-data-page-in-linux-with-nx-bit-turned-off ?Porche
My use of mprotect solves the question answered in https://mcmap.net/q/19861/-allocating-a-data-page-in-linux-with-nx-bit-turned-off (how to get a new executable page); my main question is about compatibility sub-mode. I believe the x32 ABI is completely irrelevant — x32 is just a wacky ABI used by wacky systems in the regular 64-bit long mode, whereas what I want to do is actually switch the decoder into 32-bit compatibility sub-mode. (In other words, my question is not related to the ABI at all; it's related to the processor mode.)Poulin
One thing not mentioned in the other question is that, in order for this to work, your buffer has to be in the low 4GB of virtual memory, since the rest is not available in 32-bit mode. Unless you can guarantee that, your code would be unreliable at best. The instructions for setting both the GDT and LDT are kernel-only, so unless the kernel already has a 32-bit code segment at a known location, or provides access to the LDT, this is impossible. I don't know whether linux provides either of these things though, so I can't give you a straight answer.Suribachi
@Suribachi That's a good point about the low 4GB (well, really the lowest 2GB or the highest 2GB). Referring to static data would get ugly fast. But most x86 code ends up being position-independent without even trying, so let's just pretend that I can ensure that my code is PIC, somehow. Re "access to the LDT", Darwin has i386_set_ldt and Linux has modify_ldt but I don't understand what they do.Poulin
PIC isn't enough, your code and data, including the stack, has to be in that area, because everything, including the instruction pointer, will be truncated to 32 bits. I was referring to the physical ability to do that. For example, I know that OS X reserves that area in 64-bit processes, so this would be impossible. As for the LDT, you would need to set descriptors for both code and data. See here for information on the descriptor format. Once you get that set up and your data in the right place, you just need to lcall and set your data descriptors (ds,ss)Suribachi
On Linux, you can use mmap with the flags MAP_ANONYMOUS|MAP_32BIT to acquire a page in low(high) memory. Re the LDT, I'm still holding out hope that someone will give me teh codes. I think I sort of understand what the actual lgdt and lldt instructions do, but I think modify_ldt is something higher-level that doesn't wipe out the entire table; you can actually add new entries to the existing LDT somehow. Perhaps I should study this sample code.Poulin
B
11

Yes, you can. It's even doable using fully supported interfaces. Use modify_ldt to install a 32-bit code segment into the LDT, then set up a far pointer to your 32-bit code, then do an indirect jump to it using ljumpl *(%eax) in AT&T notation.

You'll face all kinds of snafus, though. The high bits of your stack pointer are likely to get destroyed. You probably need a data segment if you actually want to run real code. And you'll need to do another far jump to get back to 64-bit mode.

A fully worked-out example is in my linux-clock-tests in test_vsyscall.cc. (It's a little broken on any released kernel: int cc will crash. You should change that to something else more clever, like "nop". Look in intcc32.

Boost answered 13/11, 2012 at 4:38 Comment(2)
In theory you can switch between 64 and 32bit mode using the OS's existing GDT entries. Also later in that comment thread, Ross Ridge points out that modify_ldt doesn't support setting the L bit, so it may not work. (At least, not for 32bit code that wants to switch from compat to long mode.) I haven't tested the code in this answer.Yockey
If you're going from 64->32, you'd use ljmpq *(%rax), or maybe ljmpl, but there's no reason to use a 32-bit addressing mode to load the m16:64 or m16:32 far jump target.Yockey

© 2022 - 2024 — McMap. All rights reserved.