Linux Kernel: System call hooking example
Asked Answered
M

5

78

I'm trying to write some simple test code as a demonstration of hooking the system call table.

"sys_call_table" is no longer exported in 2.6, so I'm just grabbing the address from the System.map file, and I can see it is correct (Looking through the memory at the address I found, I can see the pointers to the system calls).

However, when I try to modify this table, the kernel gives an "Oops" with "unable to handle kernel paging request at virtual address c061e4f4" and the machine reboots.

This is CentOS 5.4 running 2.6.18-164.10.1.el5. Is there some sort of protection or do I just have a bug? I know it comes with SELinux, and I've tried putting it in to permissive mode, but it doesn't make a difference

Here's my code:

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/unistd.h>

void **sys_call_table;

asmlinkage int (*original_call) (const char*, int, int);

asmlinkage int our_sys_open(const char* file, int flags, int mode)
{
   printk("A file was opened\n");
   return original_call(file, flags, mode);
}

int init_module()
{
    // sys_call_table address in System.map
    sys_call_table = (void*)0xc061e4e0;
    original_call = sys_call_table[__NR_open];

    // Hook: Crashes here
    sys_call_table[__NR_open] = our_sys_open;
}

void cleanup_module()
{
   // Restore the original call
   sys_call_table[__NR_open] = original_call;
}
Male answered 20/1, 2010 at 17:8 Comment(6)
Have you experimented with LD_PRELOAD or ptrace? Do they not satisfy what you are trying to do?Dispersion
Not really, the purpose of the exercise is to load a kernel module that will hook a system call for the entire system. It doesn't really matter what it does at that point.Male
Please note that for teaching purposes it might be OK to look into this, but it has both technical and licensing problems. Don't use this in the real world!Gallnut
what could be use case of this code? Can I hook any linux system call this way?Quiddity
@robert.berger, what? Care to expand on that a little?Compendium
@Tyler, I was thinking the same thing. I assume OP is building his own kernel to generate the System map file anyway.Ava
M
67

I finally found the answer myself.

https://web.archive.org/web/20190921093739/http://www.linuxforums.org/forum/linux-kernel/133982-cannot-modify-sys_call_table.html

The kernel was changed at some point so that the system call table is read only.

cypherpunk:

Even if it is late but the Solution may interest others too: In the entry.S file you will find: Code:

.section .rodata,"a"
#include "syscall_table_32.S"

sys_call_table -> ReadOnly You have to compile the Kernel new if you want to "hack" around with sys_call_table...

The link also has an example of changing the memory to be writable.

nasekomoe:

Hi everybody. Thanks for replies. I solved the problem long ago by modifying access to memory pages. I have implemented two functions that do it for my upper level code:

#include <asm/cacheflush.h>
#ifdef KERN_2_6_24
#include <asm/semaphore.h>
int set_page_rw(long unsigned int _addr)
{
    struct page *pg;
    pgprot_t prot;
    pg = virt_to_page(_addr);
    prot.pgprot = VM_READ | VM_WRITE;
    return change_page_attr(pg, 1, prot);
}

int set_page_ro(long unsigned int _addr)
{
    struct page *pg;
    pgprot_t prot;
    pg = virt_to_page(_addr);
    prot.pgprot = VM_READ;
    return change_page_attr(pg, 1, prot);
}

#else
#include <linux/semaphore.h>
int set_page_rw(long unsigned int _addr)
{
    return set_memory_rw(_addr, 1);
}

int set_page_ro(long unsigned int _addr)
{
    return set_memory_ro(_addr, 1);
}

#endif // KERN_2_6_24

Here's a modified version of the original code that works for me.

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/unistd.h>
#include <asm/semaphore.h>
#include <asm/cacheflush.h>

void **sys_call_table;

asmlinkage int (*original_call) (const char*, int, int);

asmlinkage int our_sys_open(const char* file, int flags, int mode)
{
   printk("A file was opened\n");
   return original_call(file, flags, mode);
}

int set_page_rw(long unsigned int _addr)
{
   struct page *pg;
   pgprot_t prot;
   pg = virt_to_page(_addr);
   prot.pgprot = VM_READ | VM_WRITE;
   return change_page_attr(pg, 1, prot);
}

int init_module()
{
    // sys_call_table address in System.map
    sys_call_table = (void*)0xc061e4e0;
    original_call = sys_call_table[__NR_open];

    set_page_rw(sys_call_table);
    sys_call_table[__NR_open] = our_sys_open;
}

void cleanup_module()
{
   // Restore the original call
   sys_call_table[__NR_open] = original_call;
}
Male answered 20/1, 2010 at 18:6 Comment(4)
Note that in the provided link, Linuxerlive claims that change_page_attr will not work for kernels > 2.6.24, because it's depriciated.Male
+1 for documenting the solution that you came to for others to see.Chandra
Note that when you call set_memory_rw() and the address is not page aligned, you'll get this: WARNING: at arch/x86/mm/pageattr.c:877 change_page_attr_set_clr+0x343/0x530() (Not tainted). I'm using 2.6.32, still working out a solution (as the memory still appears to be read-only after I call this on it).Zayin
Awesome response to your own question. Very detailed. +1 fer sure. Cheers man.Leavenworth
Z
27

Thanks Stephen, your research here was helpful to me. I had a few problems, though, as I was trying this on a 2.6.32 kernel, and getting WARNING: at arch/x86/mm/pageattr.c:877 change_page_attr_set_clr+0x343/0x530() (Not tainted) followed by a kernel OOPS about not being able to write to the memory address.

The comment above the mentioned line states:

// People should not be passing in unaligned addresses

The following modified code works:

int set_page_rw(long unsigned int _addr)
{
    return set_memory_rw(PAGE_ALIGN(_addr) - PAGE_SIZE, 1);
}

int set_page_ro(long unsigned int _addr)
{
    return set_memory_ro(PAGE_ALIGN(_addr) - PAGE_SIZE, 1);
}

Note that this still doesn't actually set the page as read/write in some situations. The static_protections() function, which is called inside of set_memory_rw(), removes the _PAGE_RW flag if:

  • It's in the BIOS area
  • The address is inside .rodata
  • CONFIG_DEBUG_RODATA is set and the kernel is set to read-only

I found this out after debugging why I still got "unable to handle kernel paging request" when trying to modify the address of kernel functions. I was eventually able to solve that problem by finding the page table entry for the address myself and manually setting it to writable. Thankfully, the lookup_address() function is exported in version 2.6.26+. Here is the code I wrote to do that:

void set_addr_rw(unsigned long addr) {

    unsigned int level;
    pte_t *pte = lookup_address(addr, &level);

    if (pte->pte &~ _PAGE_RW) pte->pte |= _PAGE_RW;

}

void set_addr_ro(unsigned long addr) {

    unsigned int level;
    pte_t *pte = lookup_address(addr, &level);

    pte->pte = pte->pte &~_PAGE_RW;

}

Finally, while Mark's answer is technically correct, it'll case problem when ran inside Xen. If you want to disable write-protect, use the read/write cr0 functions. I macro them like this:

#define GPF_DISABLE write_cr0(read_cr0() & (~ 0x10000))
#define GPF_ENABLE write_cr0(read_cr0() | 0x10000)

Hope this helps anyone else who stumbles upon this question.

Zayin answered 19/7, 2011 at 3:47 Comment(6)
Hi, with regard to your comment about Mark's answer, just curious: what's the problem it causes when ran inside Xen?Stereotype
On the xen kernels I've tried, it causes a "general protection fault". If you'll notice, xen defines its own xen_write_cr0() function that doesn't disable write-protect, as the hypervisor handles that, and the guest OS doesn't have that sort of access to the CPU registers.Zayin
Corey, thank you very much for sharing your findings...wish I could upvote 100 more times!Bleary
You can find it here: github.com/cormander/tpe-lkm/blob/… Note that I have those functions twice, for different kernel versions.Zayin
Thanks for the code, solved my problem. Why did you use unsigned long for the parameter addr. This causes a lot of warnings. I used void** as the type for the addr parameter. Is there a special reason for using unsigned long?Teletypesetter
Memory addresses are unsigned long. I'm not getting any compiler warnings.Zayin
B
21

Note that the following will also work instead of using change_page_attr and cannot be depreciated:

static void disable_page_protection(void) {

    unsigned long value;
    asm volatile("mov %%cr0,%0" : "=r" (value));
    if (value & 0x00010000) {
            value &= ~0x00010000;
            asm volatile("mov %0,%%cr0": : "r" (value));
    }
}

static void enable_page_protection(void) {

    unsigned long value;
    asm volatile("mov %%cr0,%0" : "=r" (value));
    if (!(value & 0x00010000)) {
            value |= 0x00010000;
            asm volatile("mov %0,%%cr0": : "r" (value));
    }
}
Benedick answered 22/10, 2010 at 21:1 Comment(5)
what kind of vodou is done here? what loa spirit is called by 0x00010000 spell?Inkstand
@Inkstand cr0 is a control register. The 16th bit controls page protection enforcement - toggle it and suddenly pages being "read only" no longer matters. You can do this in kernel space because the code is marked at privilege level (ring) 0. Normal programs cannot do this to themselves. So basically, turn off write protection, trample over "read only" memory, turn it back on again, voila. You can't deprecate this because it's part of the kernel design, being monolithic, that all modules run in ring 0.Bromidic
If you are going to do this you should disable interrupts, cli, before modifying cr0 and reenable interrupts, sti, once you are done. See vulnfactory.org/blog/2011/08/12/wp-safe-or-not for details.Skindeep
Does modifying cr0 in this way imply its acting on the current page?Ava
if you modify cr0 it counts for cpu, so all instrucitons on the cpu when its disabled will have these protections disabled regardless of the address. (en.wikipedia.org/wiki/Control_register)Overplay
O
17

If you are dealing with kernel 3.4 and later (it can also work with earlier kernels, I didn't test it) I would recommend a smarter way to acquire the system callы table location.

For example

#include <linux/module.h>
#include <linux/kallsyms.h>

static unsigned long **p_sys_call_table;
/* Aquire system calls table address */
p_sys_call_table = (void *) kallsyms_lookup_name("sys_call_table");

That's it. No addresses, it works fine with every kernel I've tested.

The same way you can use a not exported Kernel function from your module:

static int (*ref_access_remote_vm)(struct mm_struct *mm, unsigned long addr,
                void *buf, int len, int write);
ref_access_remote_vm = (void *)kallsyms_lookup_name("access_remote_vm");

Enjoy!

Oxysalt answered 29/8, 2016 at 8:51 Comment(3)
Is it that kallsyms_lookup_name will search in both code and data segments ?Pacifism
Hum, I thought this was only possible if you have KALLSYMS_ALL=yes in your .config when you compiled the kernel. I do not know if it works if you do not have the symbol present in /proc/kallsyms.Zoroastrian
Of all the answers on the Internet, only this worked for me! Copying the address of sys_call_table from System.map generates a page fault oops in Kernel.Instinct
T
1

As others have hinted, the whole story is a bit different now on modern kernels. I'll be covering x86-64 here, for syscall hijacking on modern arm64 refer to this other answer of mine.

Latest x86 kernels

See also this other answer of mine.

Since Linux v6.9, commit 1e3ad78334a69b36e107232e337f9d693dcc9df2 introduced a security mitigation against speculative execution on x86 that removed the use of syscall tables. This has been backported on other stable kernels too (notably v6.8.5+, v6.6.26+, v6.1.85+, v5.15.154+).

It is not possible to simply hijack the entries of sys_call_table anymore, but much nicer ways to do the same thing still exist. One of those is kprobes.

In short, something like the following should work:

#include <linux/kprobes.h>
#include <linux/ptrace.h>
// ...

static int sys_read_kprobe_pre_handler(struct kprobe *p, struct pt_regs *regs)
{
    pr_info("read(%ld, 0x%lx, 0x%lx)\n", regs->di, regs->si, regs->dx);
    return 0;
}

struct kprobe syscall_kprobe = {
    .symbol_name = "__x64_sys_read",
    .pre_handler = sys_read_kprobe_pre_handler,
};

static int __init my_module_init(void)
{
    int err;

    err = register_kprobe(&syscall_kprobe);
    if (err) {
        pr_err("register_kprobe() failed: %d\n", err);
        return err;
    }

    return 0;
}

static void __exit my_module_exit(void)
{
    unregister_kprobe(&syscall_kprobe);
}

Older x86 kernels (< v6.9)

Now this is plain and simple syscall table hijacking. Non-invasive hooking can still be done in a much nicer way using kprobes as described in the section above.

Since Linux v4.17, x86 (both 64 and 32 bit) now uses syscall wrappers that take a struct pt_regs * as the only argument (see commit 1, commit 2). You can see arch/x86/include/asm/syscall.h for the definitions.

Additionally, as others have described already in different answers, the simplest way to modify sys_call_table is to temporarily disable CR0 WP (Write-Protect) bit, which could be done using read_cr0() and write_cr0(). However, since Linux v5.3, [native_]write_cr0 will check sensitive bits that should never change (like WP) and refuse to change them (commit). In order to work around this, we need to write CR0 manually using inline assembly.

Here is a working kernel module (tested on Linux 5.10 and 5.18) that does syscall hijacking on modern Linux x86-64 considering the above caveats and assuming that you already know the address of sys_call_table (if you also want to find that in the module, see Proper way of getting the address of non-exported kernel symbols in a Linux kernel module):

// SPDX-License-Identifier: (GPL-2.0 OR MIT)
/**
 * Test syscall table hijacking on x86-64. This module will replace the `read`
 * syscall with a simple wrapper which logs every invocation of `read` using
 * printk().
 *
 * Tested on Linux x86-64 v5.10, v5.18.
 *
 * Usage:
 *
 *     sudo cat /proc/kallsyms | grep sys_call_table # grab address
 *     sudo insmod syscall_hijack.ko sys_call_table_addr=0x<address_here>
 */

#include <linux/init.h>          // module_{init,exit}()
#include <linux/module.h>        // THIS_MODULE, MODULE_VERSION, ...
#include <linux/kernel.h>        // printk(), pr_*()
#include <asm/special_insns.h>   // {read,write}_cr0()
#include <asm/processor-flags.h> // X86_CR0_WP
#include <asm/unistd.h>          // __NR_*

#ifdef pr_fmt
#undef pr_fmt
#endif
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

typedef long (*sys_call_ptr_t)(const struct pt_regs *);

static sys_call_ptr_t *real_sys_call_table;
static sys_call_ptr_t original_read;

static unsigned long sys_call_table_addr;
module_param(sys_call_table_addr, ulong, 0);
MODULE_PARM_DESC(sys_call_table_addr, "Address of sys_call_table");

// Since Linux v5.3 [native_]write_cr0 won't change "sensitive" CR0 bits, need
// to re-implement this ourselves.
static void write_cr0_unsafe(unsigned long val)
{
    asm volatile("mov %0,%%cr0": "+r" (val) : : "memory");
}

static long myread(const struct pt_regs *regs)
{
    pr_info("read(%ld, 0x%lx, 0x%lx)\n", regs->di, regs->si, regs->dx);
    return original_read(regs);
}

static int __init modinit(void)
{
    unsigned long old_cr0;

    real_sys_call_table = (typeof(real_sys_call_table))sys_call_table_addr;

    pr_info("init\n");

    // Temporarily disable CR0 WP to be able to write to read-only pages
    old_cr0 = read_cr0();
    write_cr0_unsafe(old_cr0 & ~(X86_CR0_WP));

    // Overwrite syscall and save original to be restored later
    original_read = real_sys_call_table[__NR_read];
    real_sys_call_table[__NR_read] = myread;

    // Restore CR0 WP
    write_cr0_unsafe(old_cr0);
    pr_info("init done\n");

    return 0;
}

static void __exit modexit(void)
{
    unsigned long old_cr0;

    pr_info("exit\n");

    old_cr0 = read_cr0();
    write_cr0_unsafe(old_cr0 & ~(X86_CR0_WP));

    // Restore original syscall
    real_sys_call_table[__NR_read] = original_read;

    write_cr0_unsafe(old_cr0);

    pr_info("goodbye\n");
}

module_init(modinit);
module_exit(modexit);
MODULE_VERSION("0.1");
MODULE_DESCRIPTION("Test syscall table hijacking on x86-64.");
MODULE_AUTHOR("Marco Bonelli");
MODULE_LICENSE("Dual MIT/GPL");
Theosophy answered 19/6, 2022 at 15:5 Comment(4)
it does not work on my android 12 linux kernel 4.14.180. Marco, I sent you a friend request on facebook.Derangement
@Derangement this is meant for 5.10 x86-64, I have not tested it on older kernels, and I have not tested it on ARM/ARM64. If you are having some trouble you should post a question and explain your issue in detail.Theosophy
Do you have a good link to research how to parse the pt_regs*? Let's say I'd like to hook the "mkdir" syscall and have my driver log the path that was given as a parameter.Kibbutz
@TalHadad research? Open the kernel source code and look at the struct. What do you mean with "research"?Theosophy

© 2022 - 2024 — McMap. All rights reserved.