eBPF: raw_tracepoint arguments
Asked Answered
S

1

6

I am getting into eBPF programming and want to use raw tracepoints, but I do not really understand, how to use them and how to access the arguments correctly. I would appreciate any help and hints to documantation.

My questions:

  1. How do I get the arguments from the syscall by using a raw_tracepoint instead of a tracepoint?
  2. BTW: What is the uint16_t common_type; of a raw tracepoint?

System: Ubuntu 2004 with Kernel 5.4 generic, x86_64

Explanation/Example:

I started with the "normal tracepoint" sys_enter_kill, where I can create the struct with arguments from sudo cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_kill/format:


// sudo cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_kill/format
// name: sys_enter_kill
// ID: 184
// format:
//  field:unsigned short common_type;   offset:0;   size:2; signed:0;
//  field:unsigned char common_flags;   offset:2;   size:1; signed:0;
//  field:unsigned char common_preempt_count;   offset:3;   size:1;signed:0;
//  field:int common_pid;   offset:4;   size:4; signed:1;

//  field:int __syscall_nr; offset:8;   size:4; signed:1;
//  field:pid_t pid;    offset:16;  size:8; signed:0;
//  field:int sig;  offset:24;  size:8; signed:0;

struct syscalls_enter_kill_args
{
    unsigned short common_type;
    unsigned char common_flags;
    unsigned char common_preempt_count;
    int common_pid;

    long syscall_nr;
    long pid;
    long sig;
};

SEC("tracepoint/xxx")
int main_entry(struct syscalls_enter_kill_args *ctx)
{
    if(ctx->sig != 9)
        return 0;
    u64 pid_tgid = bpf_get_current_pid_tgid();
    u32 pid = pid_tgid;
    bpf_printk("Catched function call; PID = : %d.\n", pid);
    return 0;
}

This simple bpf program just outputs some text, whenever a kill signal is invoked. It only logs SIGKILL, not SIGINT, SIGQUIT, ... Now I want to do the same functionality with the raw tracepoint sys_enter.

// sudo cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/format
// name: sys_enter
// ID: 22
// format:
//  field:unsigned short common_type;   offset:0;   size:2; signed:0;
//  field:unsigned char common_flags;   offset:2;   size:1; signed:0;
//  field:unsigned char common_preempt_count;   offset:3;   size:1;signed:0;
//  field:int common_pid;   offset:4;   size:4; signed:1;

//  field:long id;  offset:8;   size:8; signed:1;
//  field:unsigned long args[6];    offset:16;  size:48;    signed:0;


struct sys_enter_args
{
    uint16_t common_type;
    uint8_t common_flags;
    uint8_t common_preempt_count;
    int32_t common_pid;
    int64_t id;
    uint64_t args[6];   // Je 4 Bytes
};

SEC("raw_tracepoint/xxx")
int main_entry_raw(struct sys_enter_args *ctx)
{
    if(ctx->id != SYS_kill)    // 62
        return 0;

    u64 pid_tgid = bpf_get_current_pid_tgid();
    u32 pid = pid_tgid;
    bpf_printk("Catched function call; PID = : %d.\n", pid);
    bpf_printk("  type: %u\n", ctx->common_type);
    bpf_printk("  id: %u\n", ctx->id);

    uint64_t* args = ctx->args;
    uint64_t arg3 = 0;
    bpf_probe_read(&arg3, sizeof(uint64_t), args + 3);
    bpf_printk("  Arg3: %u \n", arg3);
}

I thought, I might get the signal (SIGKILL/SIGINT/SIGQUIT/...) via field:int sig; offset:24; size:8; signed:0; from args[]: Offset=24 => Byte 3; size 8 => Type u64 = unsigned long. However, this results no useful values. So how do I get the value of the signal, which I can access in the tracepoint, also in the raw_tracepoint?

Thanks for help!

Spireme answered 10/1, 2022 at 13:1 Comment(0)
G
5

I think I worked it out, based on this article.

The ctx of a raw_tracepoint program is struct bpf_raw_tracepoint_args. Which is defined in bpf.h as

struct bpf_raw_tracepoint_args {
    __u64 args[0];
};

So basically just an array of numbers/pointers. The meaning of these arguments are depend on how the tracepoint prototype is defined. When looking at the source code where the tracepoint is defined we find:

TRACE_EVENT_FN(sys_enter,

    TP_PROTO(struct pt_regs *regs, long id),

    TP_ARGS(regs, id),

    TP_STRUCT__entry(
        __field(    long,       id      )
        __array(    unsigned long,  args,   6   )
    ),

    TP_fast_assign(
        __entry->id = id;
        syscall_get_arguments(current, regs, __entry->args);
    ),

    TP_printk("NR %ld (%lx, %lx, %lx, %lx, %lx, %lx)",
          __entry->id,
          __entry->args[0], __entry->args[1], __entry->args[2],
          __entry->args[3], __entry->args[4], __entry->args[5]),

    syscall_regfunc, syscall_unregfunc
);

Lets focus on TP_PROTO(struct pt_regs *regs, long id), this means that args[0] is struct pt_regs *regs and args[1] is long id. struct pt_regs is a copy of the CPU registers at the time sys_enter was called. id is the ID of the syscall.

We can get to the arguments of the syscall by extracting them from the CPU registers. The System V ABI specifies which parameters should be present in which CPU registers. To make our lives easier, libbpf defines PT_REGS_PARM{1..5} macros in bpf_tracing.h

So, if believe this should be a correct program:

SEC("raw_tracepoint/sys_enter")
int main_entry_raw(struct bpf_raw_tracepoint_args *ctx)
{
    unsigned long syscall_id = ctx->args[1];
    struct pt_regs *regs;

    if(syscall_id != SYS_kill)    // 62
        return 0;

    regs = (struct pt_regs *)ctx->args[0];
    
    u64 pid_tgid = bpf_get_current_pid_tgid();
    u32 pid = pid_tgid;
    bpf_printk("Catched function call; PID = : %d.\n", pid);
    bpf_printk("  id: %u\n", syscall_id);

    uint64_t arg3 = 0;
    bpf_probe_read(&arg3, sizeof(uint64_t), PT_REGS_PARM3(regs));
    bpf_printk("  Arg3: %u \n", arg3);
}
Ger answered 10/1, 2022 at 20:10 Comment(5)
Thank you very much for working it out! :D There is only one thing to change: I misunderstood the relation between the raw tracepoint and the tracepoint of the syscall. With the raw tracepoint, the format of the tracepoint sys_enter_kill is not relevant any more, but instead the call format of the syscall itself. So, we need argument 2 instead of argument 3 to read the argument "signal" from the syscall following man7.org/linux/man-pages/man2/kill.2.htmlSpireme
` uint64_t arg2 = 0; arg2 = PT_REGS_PARM2_CORE(regs); if(arg2 != 9) return 0; `Spireme
Another question: Why did you mark regs as volatile?Spireme
I based the example on the examples given in sysdig.com/blog/the-art-of-writing-ebpf-programs-a-primer. At one point they use volatile because they are not using the assigned variable, but it is not relevant in our case, will remove it.Ger
Thank you very much :)Spireme

© 2022 - 2024 — McMap. All rights reserved.