Can you enter x64 32-bit "long compatibility sub-mode" outside of kernel mode?

This might be an exact duplicate of Is it possible to execute 32-bit code in 64-bit process by doing mode-switching?, but that question is from a year ago and only has one answer that doesn't give any source code. I'm hoping for more detailed answers.

I'm running 64-bit Linux (Ubuntu 12.04, if it matters). Here's some code that allocates a page, writes some 64-bit code into it, and executes that code.

#include <assert.h>
#include <malloc.h>
#include <stdio.h>
#include <sys/mman.h>  // mprotect
#include <unistd.h>  // sysconf

unsigned char test_function[] = { 0xC3 };  // RET
int main()
{
    int pagesize = sysconf(_SC_PAGE_SIZE);
    unsigned char *buffer = memalign(pagesize, pagesize);
    void (*func)() = (void (*)())buffer;

    memcpy(buffer, test_function, sizeof test_function);

    // func();  // will segfault 
    mprotect(buffer, pagesize, PROT_EXEC);
    func();  // works fine
}

Now, purely for entertainment value, I'd like to do the same thing but with buffer containing arbitrary 32-bit (ia32) code, instead of 64-bit code. This page implies that you can execute 32-bit code on a 64-bit processor by entering "long compatibility sub-mode", by setting the bits of the CS segment descriptor as LMA=1, L=0, D=1. I am willing to wrap my 32-bit code in a prologue/epilogue that performs this setup.

But can I do this setup, in Linux, in usermode? (BSD/Darwin answers will also be accepted.) This is where I start to get really hazy on the concepts. I think the solution involves adding a new segment descriptor to the GDT (or is it the LDT?), and then switching to that segment via an lcall instruction. But can all that be done in usermode?

Here's a sample function that should return 4 when successfully run in compatibility sub-mode, and 8 when run in long mode. My goal is to get the instruction pointer to take this codepath and come out the other side with %rax=4, without ever dropping into kernel mode (or doing so only via documented system calls).

unsigned char behave_differently_depending_on_processor_mode[] = {
    0x89, 0xE0,  // movl %esp, %eax
    0x56,        // push %{e,r}si
    0x29, 0xE0,  // subl %esp, %eax
    0x5E,        // pop %{e,r}si
    0xC3         // ret
};

Yes, you can. It's even doable using fully supported interfaces. Use modify_ldt to install a 32-bit code segment into the LDT, then set up a far pointer to your 32-bit code, then do an indirect jump to it using ljumpl *(%eax) in AT&T notation.

You'll face all kinds of snafus, though. The high bits of your stack pointer are likely to get destroyed. You probably need a data segment if you actually want to run real code. And you'll need to do another far jump to get back to 64-bit mode.

A fully worked-out example is in my linux-clock-tests in test_vsyscall.cc. (It's a little broken on any released kernel: int cc will crash. You should change that to something else more clever, like "nop". Look in intcc32.

Recommended topics

Hot tags