Can eBPF modify the return value or parameters of a syscall?

Asked 24/3, 2017 at 15:46 Answered 17/3, 2021 at 23:6

To simulate some behavior I would like to attach a probe to a syscall and modify the return value when certain parameters are passed. Alternatively, it would also be enough to modify the parameters of the function before they are processes.

Is this possible with BPF?

Polenta answered 24/3, 2017 at 15:46 Comment(2)

systemtap can do it – Leucocratic 2/8, 2017 at 1:50

If anyone comes here interested in using eBPF for networking, there is XDP which allows some control over the packets. Forwarding example – Bronchi 17/2, 2018 at 12:57

I believe that attaching eBPF to kprobes/kretprobes gives you read access to function arguments and return values, but that you cannot tamper with them. I am NOT 100% sure; good places to ask for confirmation would be the IO Visor project mailing list or IRC channel (#iovisor at irc.oftc.net).

As an alternative solution, I know you can at least change the return value of a syscall with strace, with the -e option. Quoting the manual page:

-e inject=set[:error=errno|:retval=value][:signal=sig][:when=expr]
       Perform syscall tampering for the specified set of syscalls.

Also, there was a presentation on this, and fault injection, at Fosdem 2017, if it is of any interest to you. Here is one example command from the slides:

strace -P precious.txt -efault=unlink:retval=0 unlink precious.txt

Edit: As stated by Ben, eBPF on kprobes and tracepoints is definitively read only, for tracing and monitoring use cases. I also got confirmation about this on IRC.

Enolaenormity answered 26/3, 2017 at 14:31 Comment(0)

Within kernel probes (kprobes), the eBPF virtual machine has read-only access to the syscall parameters and return value.

However the eBPF program will have a return code of it's own. It is possible to apply a seccomp profile that traps BPF (NOT eBPF; thanks @qeole) return codes and interrupt the system call during execution.

The allowed runtime modifications are:

SECCOMP_RET_KILL: Immediate kill with SIGSYS
SECCOMP_RET_TRAP: Send a catchable SIGSYS, giving a chance to emulate the syscall
SECCOMP_RET_ERRNO: Force errno value
SECCOMP_RET_TRACE: Yield decision to ptracer or set errno to -ENOSYS
SECCOMP_RET_ALLOW: Allow

https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt

The SECCOMP_RET_TRACE method enables modifying the system call performed, arguments, or return value. This is architecture dependent and modification of mandatory external references may cause an ENOSYS error.

It does so by passing execution up to a waiting userspace ptrace, which has the ability to modify the traced process memory, registers, and file descriptors.

The tracer needs to call ptrace and then waitpid. An example:

ptrace(PTRACE_SETOPTIONS, tracee_pid, 0, PTRACE_O_TRACESECCOMP);
waitpid(tracee_pid, &status, 0);

http://man7.org/linux/man-pages/man2/ptrace.2.html

When waitpid returns, depending on the contents of status, one can retrieve the seccomp return value using the PTRACE_GETEVENTMSG ptrace operation. This will retrieve the seccomp SECCOMP_RET_DATA value, which is a 16-bit field set by the BPF program. Example:

ptrace(PTRACE_GETEVENTMSG, tracee_pid, 0, &data);

Syscall arguments can be modified in memory before continuing operation. You can perform a single syscall entry or exit with the PTRACE_SYSCALL step. Syscall return values can be modified in userspace before resuming execution; the underlying program won't be able to see that the syscall return values have been modified.

An example implementation: Filter and Modify System Calls with seccomp and ptrace

Cuneate answered 7/4, 2017 at 1:59 Comment(2)

Minor correction: seccomp only supports the “classic” version of BPF. There is no support for eBPF (“extended BPF”) at this time. – Enolaenormity 7/4, 2017 at 7:9

Updated! Thanks Qeole. – Cuneate 8/4, 2017 at 2:56

As an alternative solution, I know you can at least change the return value of a syscall with strace, with the -e option. Quoting the manual page:

-e inject=set[:error=errno|:retval=value][:signal=sig][:when=expr]
       Perform syscall tampering for the specified set of syscalls.

Also, there was a presentation on this, and fault injection, at Fosdem 2017, if it is of any interest to you. Here is one example command from the slides:

strace -P precious.txt -efault=unlink:retval=0 unlink precious.txt

Edit: As stated by Ben, eBPF on kprobes and tracepoints is definitively read only, for tracing and monitoring use cases. I also got confirmation about this on IRC.

Enolaenormity answered 26/3, 2017 at 14:31 Comment(0)

It is possible to modify some user space memory using eBPF. As stated in the bpf.h header file:

 * int bpf_probe_write_user(void *dst, const void *src, u32 len)
 *  Description
 *      Attempt in a safe way to write *len* bytes from the buffer
 *      *src* to *dst* in memory. It only works for threads that are in
 *      user context, and *dst* must be a valid user space address.
 *
 *      This helper should not be used to implement any kind of
 *      security mechanism because of TOC-TOU attacks, but rather to
 *      debug, divert, and manipulate execution of semi-cooperative
 *      processes.
 *
 *      Keep in mind that this feature is meant for experiments, and it
 *      has a risk of crashing the system and running programs.
 *      Therefore, when an eBPF program using this helper is attached,
 *      a warning including PID and process name is printed to kernel
 *      logs.
 *  Return
 *      0 on success, or a negative error in case of failure.

Also, quoting from the BPF design Q&A:

Tracing BPF programs can overwrite the user memory of the current task with bpf_probe_write_user(). Every time such program is loaded the kernel will print warning message, so this helper is only useful for experiments and prototypes. Tracing BPF programs are root only.

Your eBPF may write data into user space memory locations. Note that you still cannot modify kernel structures from within you eBPF program.

Jacquelinejacquelyn answered 2/3, 2020 at 8:14 Comment(0)

It is possible to inject errors into a system call invocation using eBPF: https://lwn.net/Articles/740146/

There is a bpf function called bpf_override_return(), which can override the return value of an invocation. This is an example using bcc as the front-end: https://github.com/iovisor/bcc/blob/master/tools/inject.py

According to the Linux manual page:

bpf_override_return() is only available if the kernel was compiled with the CONFIG_BPF_KPROBE_OVERRIDE configuration option, and in this case it only works on functions tagged with ALLOW_ERROR_INJECTION in the kernel code.

Also, the helper is only available for the architectures having the CONFIG_FUNCTION_ERROR_INJECTION option. As of this writing, x86 architecture is the only one to support this feature.

It is possible to add a function to the error injection framework. More information could be found here: https://github.com/iovisor/bcc/issues/2485

Degrading answered 17/3, 2021 at 23:6 Comment(3)

Thanks! Important caveat, this only works for a small number of whitelisted functions in the kernel. – Anglicism 18/3, 2021 at 14:29

Thanks @GeorgSchölly :) I have updated the answer about the white list. But I haven't found the exact default white list for the error injection feature. It seems to support most of the syscall functions in the kernel. – Degrading 18/3, 2021 at 15:1

It looks like it's just a handful of functions, most of them related to the btrfs file system. These functions are annotated with ALLOW_ERROR_INJECTION in the source code. – Anglicism 26/3, 2021 at 8:38

Recommended topics

Hot tags