How to trace just system call events with ftrace without showing any other functions in the Linux kernel?
Asked Answered
I

2

4

For example, to monitor all mkdir calls made, the best I could come up with was:

#!/bin/sh
set -eux

d=debug/tracing

mkdir -p debug
if ! mountpoint -q debug; then
  mount -t debugfs nodev debug
fi

# Stop tracing.
echo 0 > "${d}/tracing_on"

# Clear previous traces.
echo > "${d}/trace"

# Enable tracing mkdir
echo sys_enter_mkdir > "${d}/set_event"

# Set tracer type.
echo function > "${d}/current_tracer"

# Filter only sys_mkdir as a workaround.
echo SyS_mkdir > "${d}/set_ftrace_filter"

# Start tracing.
echo 1 > "${d}/tracing_on"

# Generate two mkdir calls.
rm -rf /tmp/a
rm -rf /tmp/b
mkdir /tmp/a
mkdir /tmp/b

# View the trace.
cat "${d}/trace"

# Stop tracing.
echo 0 > "${d}/tracing_on"

umount debug

And then after running with sudo it gives:

# tracer: function
#
# entries-in-buffer/entries-written: 4/4   #P:16
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
           mkdir-31254 [015] .... 2010985.576760: sys_mkdir(pathname: 7ffc54b32c77, mode: 1ff)
           mkdir-31254 [015] .... 2010985.576763: SyS_mkdir <-tracesys_phase2
           mkdir-31255 [007] .... 2010985.578363: sys_mkdir(pathname: 7fff02d90c77, mode: 1ff)
           mkdir-31255 [007] .... 2010985.578365: SyS_mkdir <-tracesys_phase2

My problem with this is that it output two lines for each syscall:

  • sys_mkdir which is the event that I want
  • SyS_mkdir which is the filtered function workaround, which I don't want to see

If I instead try to do:

echo > "${d}/set_ftrace_filter"

or don't touch that file at at all, then it shows a ton of functions and makes it hard to fint the syscall at all.

If there a nicer way to disable regular functions, and keep just syscall events?

I could use just SyS_mkdir and disable the syscall event I guess, but it feels cleaner if I could use the more specific event? Also:

  • the event shows arguments, which is nicer.
  • syscall function names change across kernel versions. E.g., it is already __x64_sys_mkdir instead of SyS_mkdir on Linux v4.18.

Related:

Tested on Ubuntu 18.04, Linux kernel 4.15.

Injure answered 11/10, 2018 at 16:9 Comment(4)
Is there a reason you cannot use the syscall tracing events directly? i.e. echo 1 > /sys/kernel/debug/tracing/events/syscalls/sys_enter_mkdir/enable.Laborer
@Laborer I can use that, but from my experiments I thought it was equivalent to the echo sys_enter_mkdir > "${d}/set_event" line I currently have.Injure
Ahh, yeah, but then you don't need to set the function tracer, and can keep the nop tracer, so you don't need to set any filters.Laborer
@Laborer OK, I'll give it a try later on. Feel free to answer.Injure
I
5

Use the nop tracer

As proposed by sruffell, all we have to do is to use the nop tracer instead of function, and that will disable the function traces, but not the events.

Run with sudo:

#!/bin/sh
set -eux

d=debug/tracing

mkdir -p debug
if ! mountpoint -q debug; then
  mount -t debugfs nodev debug
fi

# Stop tracing.
echo 0 > "${d}/tracing_on"

# Clear previous traces.
echo > "${d}/trace"

# Find the tracer name.
cat "${d}/available_tracers"

# Disable tracing functions, show only system call events.
echo nop > "${d}/current_tracer"

# Find the event name with.
grep mkdir "${d}/available_events"

# Enable tracing mkdir.
# Both statements below seem to do the exact same thing,
# just with different interfaces.
# https://www.kernel.org/doc/html/v4.18/trace/events.html
echo sys_enter_mkdir > "${d}/set_event"
# echo 1 > "${d}/events/syscalls/sys_enter_mkdir/enable"

# Start tracing.
echo 1 > "${d}/tracing_on"

# Generate two mkdir calls by two different processes.
rm -rf /tmp/a /tmp/b
mkdir /tmp/a
mkdir /tmp/b

# View the trace.
cat "${d}/trace"

# Stop tracing.
echo 0 > "${d}/tracing_on"

umount debug

This gives the desired output:

   mkdir-26064 [007] .... 2014370.909743: sys_mkdir(pathname: 7fffbd461c77, mode: 1ff)
   mkdir-26065 [014] .... 2014370.911615: sys_mkdir(pathname: 7ffea53bac77, mode: 1ff)

Alternative not best solution

This also works, but definitely less nice, replace:

echo SyS_mkdir > "${d}/set_ftrace_filter"

with:

echo '*' > "${d}/set_ftrace_notrace"

This turns off all functions as mentioned in the docs: https://www.kernel.org/doc/html/v4.18/trace/ftrace.html#the-file-system

set_ftrace_notrace:

This has an effect opposite to that of set_ftrace_filter. Any function that is added here will not be traced. If a function exists in both set_ftrace_filter and set_ftrace_notrace, the function will not be traced.

Injure answered 11/10, 2018 at 17:0 Comment(0)
H
4

In addition, it's worth mention another concise way to gain such info. One can do something like:

stap -e 'probe syscall.mkdir { printf("%s[%d] -> %s(%s)\n", execname(), pid(), name, argstr) }'

The output:

systemd-journal[318] -> mkdir("/var/log/journal/c8d2562a041649cdbfd1ac5e24dbe0db", 0755)
systemd-journal[318] -> mkdir("/var/log/journal/c8d2562a041649cdbfd1ac5e24dbe0db", 0755)
mkdir[4870] -> mkdir("wtf", 0777)
...

Another way:

stap -e 'probe kernel.function("sys_mkdir") { printf("%s[%d] (%s)\n", execname(), pid(), $$parms) }'

The output:

systemd-journal[318] (pathname=0x55b74f7ab8b0 mode=0x1ed)
systemd-journal[318] (pathname=0x55b74f7ab8b0 mode=0x1ed)
mkdir[8532] (pathname=0x7ffcf30af761 mode=0x1ff)
...

You can customize the output as you like.

P.S. Systemtap is based on kprobes. Architecture doc will help to understand its internals.

More about the SystemTap.

Heartsease answered 12/10, 2018 at 8:59 Comment(2)
It would also be good to say what is the underlying kernel interface used by SystemTap. ftrace, or something else?Injure
@CiroSantilli新疆改造中心六四事件法轮功 EditedHeartsease

© 2022 - 2024 — McMap. All rights reserved.