How to trace per-file IO operations in Linux?
Asked Answered
F

7

15

I need to track read system calls for specific files, and I'm currently doing this by parsing the output of strace. Since read operates on file descriptors I have to keep track of the current mapping between fd and path. Additionally, seek has to be monitored to keep the current position up-to-date in the trace.

Is there a better way to get per-application, per-file-path IO traces in Linux?

Freshwater answered 8/3, 2012 at 7:26 Comment(0)
H
6

First, you probably don't need to keep track because mapping between fd and path is available in /proc/PID/fd/.

Second, maybe you should use the LD_PRELOAD trick and overload in C open, seek and read system call. There are some article here and there about how to overload malloc/free.

I guess it won't be too different to apply the same kind of trick for those system calls. It needs to be implemented in C, but it should take far less code and be more precise than parsing strace output.

Hexad answered 8/3, 2012 at 7:53 Comment(1)
Agreed. I am now using LD_PRELOAD as the alternative, but was hoping there was some out-of-the-box solutions. Thanks.Freshwater
S
13

You could wait for the files to be opened so you can learn the fd and attach strace after the process launch like this:

strace -p pid -e trace=file -e read=fd

Suppose answered 8/3, 2012 at 7:55 Comment(3)
This can be tough if you have a multithreaded program; e.g., most anything in java.Gentlefolk
To learn the fd corresponding to the file path, look into /proc/$PID/fd/Melly
-y option prints paths associated with file descriptor argumentsRene
H
6

First, you probably don't need to keep track because mapping between fd and path is available in /proc/PID/fd/.

Second, maybe you should use the LD_PRELOAD trick and overload in C open, seek and read system call. There are some article here and there about how to overload malloc/free.

I guess it won't be too different to apply the same kind of trick for those system calls. It needs to be implemented in C, but it should take far less code and be more precise than parsing strace output.

Hexad answered 8/3, 2012 at 7:53 Comment(1)
Agreed. I am now using LD_PRELOAD as the alternative, but was hoping there was some out-of-the-box solutions. Thanks.Freshwater
B
6

systemtap - a kind of DTrace reimplementation for Linux - could be of help here.

As with strace you only have the fd, but with the scripting ability it is easy to maintain the filename for an fd (unless with fun stuff like dup). There is the example script iotime that illustates it.

#! /usr/bin/env stap

/*
 * Copyright (C) 2006-2007 Red Hat Inc.
 * 
 * This copyrighted material is made available to anyone wishing to use,
 * modify, copy, or redistribute it subject to the terms and conditions
 * of the GNU General Public License v.2.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
 *
 * Print out the amount of time spent in the read and write systemcall
 * when each file opened by the process is closed. Note that the systemtap 
 * script needs to be running before the open operations occur for
 * the script to record data.
 *
 * This script could be used to to find out which files are slow to load
 * on a machine. e.g.
 *
 * stap iotime.stp -c 'firefox'
 *
 * Output format is:
 * timestamp pid (executabable) info_type path ...
 *
 * 200283135 2573 (cupsd) access /etc/printcap read: 0 write: 7063
 * 200283143 2573 (cupsd) iotime /etc/printcap time: 69
 *
 */

global start
global time_io

function timestamp:long() { return gettimeofday_us() - start }

function proc:string() { return sprintf("%d (%s)", pid(), execname()) }

probe begin { start = gettimeofday_us() }

global filehandles, fileread, filewrite

probe syscall.open.return {
  filename = user_string($filename)
  if ($return != -1) {
    filehandles[pid(), $return] = filename
  } else {
    printf("%d %s access %s fail\n", timestamp(), proc(), filename)
  }
}

probe syscall.read.return {
  p = pid()
  fd = $fd
  bytes = $return
  time = gettimeofday_us() - @entry(gettimeofday_us())
  if (bytes > 0)
    fileread[p, fd] += bytes
  time_io[p, fd] <<< time
}

probe syscall.write.return {
  p = pid()
  fd = $fd
  bytes = $return
  time = gettimeofday_us() - @entry(gettimeofday_us())
  if (bytes > 0)
    filewrite[p, fd] += bytes
  time_io[p, fd] <<< time
}

probe syscall.close {
  if ([pid(), $fd] in filehandles) {
    printf("%d %s access %s read: %d write: %d\n",
           timestamp(), proc(), filehandles[pid(), $fd],
           fileread[pid(), $fd], filewrite[pid(), $fd])
    if (@count(time_io[pid(), $fd]))
      printf("%d %s iotime %s time: %d\n",  timestamp(), proc(),
             filehandles[pid(), $fd], @sum(time_io[pid(), $fd]))
   }
  delete fileread[pid(), $fd]
  delete filewrite[pid(), $fd]
  delete filehandles[pid(), $fd]
  delete time_io[pid(),$fd]
}

It only works up to a certain number of files because the hash map is size limited.

Bydgoszcz answered 18/3, 2012 at 7:52 Comment(0)
G
2

strace now has new options to track file descriptors:

--decode-fds=set
                   Decode various information associated with file descriptors.  The default is decode-fds=none.  set can include the following elements:

                   path    Print file paths.
                   socket  Print socket protocol-specific information,
                   dev     Print character/block device numbers.
                   pidfd   Print PIDs associated with pidfd file descriptors.

This is useful as file descriptors are reused after being closed, and /proc/$PID/fd only provides one snapshot in time, which is useless when debugging something in realtime.

Sample output, note how file names are displayed in angular brackets and FD 3 is reused for all of /etc/ld.so.cache, /lib/x86_64-linux-gnu/libc.so.6, /usr/lib/locale/locale-archive, /home/florian/hello.

$ strace -e trace=desc --decode-fds=all cat hello 1>/dev/null
execve("/usr/bin/cat", ["cat", "hello"], 0x7fff42e20710 /* 102 vars */) = 0
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3</etc/ld.so.cache>
newfstatat(3</etc/ld.so.cache>, "", {st_mode=S_IFREG|0644, st_size=167234, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 167234, PROT_READ, MAP_PRIVATE, 3</etc/ld.so.cache>, 0) = 0x7f22edeee000
close(3</etc/ld.so.cache>)              = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3</usr/lib/x86_64-linux-gnu/libc-2.33.so>
read(3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\206\2\0\0\0\0\0"..., 832) = 832
pread64(3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
pread64(3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, "\4\0\0\0 \0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0"..., 48, 848) = 48
pread64(3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0+H)\227\201T\214\233\304R\352\306\3379\220%"..., 68, 896) = 68
newfstatat(3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, "", {st_mode=S_IFREG|0755, st_size=1983576, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f22edeec000
pread64(3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
mmap(NULL, 2012056, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, 0) = 0x7f22edd00000
mmap(0x7f22edd26000, 1486848, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, 0x26000) = 0x7f22edd26000
mmap(0x7f22ede91000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, 0x191000) = 0x7f22ede91000
mmap(0x7f22ededd000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, 0x1dc000) = 0x7f22ededd000
mmap(0x7f22edee3000, 33688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f22edee3000
close(3</usr/lib/x86_64-linux-gnu/libc-2.33.so>) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f22edcfe000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3</usr/lib/locale/locale-archive>
newfstatat(3</usr/lib/locale/locale-archive>, "", {st_mode=S_IFREG|0644, st_size=6055600, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 6055600, PROT_READ, MAP_PRIVATE, 3</usr/lib/locale/locale-archive>, 0) = 0x7f22ed737000
close(3</usr/lib/locale/locale-archive>) = 0
fstat(1</dev/null<char 1:3>>, {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x3), ...}) = 0
openat(AT_FDCWD, "hello", O_RDONLY)     = 3</home/florian/hello>
fstat(3</home/florian/hello>, {st_mode=S_IFREG|0664, st_size=6, ...}) = 0
fadvise64(3</home/florian/hello>, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f22edef5000
read(3</home/florian/hello>, "world\n", 131072) = 6
write(1</dev/null<char 1:3>>, "world\n", 6) = 6
read(3</home/florian/hello>, "", 131072) = 0
close(3</home/florian/hello>)           = 0
close(1</dev/null<char 1:3>>)           = 0
close(2</dev/pts/5<char 136:5>>)        = 0
+++ exited with 0 +++
Gpo answered 19/5, 2021 at 9:35 Comment(0)
E
1

I think overloading open, seek and read is a good solution. But just FYI if you want to parse and analyze the strace output programmatically, I did something similar before and put my code in github: https://github.com/johnlcf/Stana/wiki

(I did that because I have to analyze the strace result of program ran by others, which is not easy to ask them to do LD_PRELOAD.)

Earhart answered 8/3, 2012 at 9:45 Comment(0)
P
0

Probably the least ugly way to do this is to use fanotify. Fanotify is a Linux kernel facility that allows cheaply watching filesystem events. I'm not sure if it allows filtering by PID, but it does pass the PID to your program so you can check if it's the one you're interested in.

Here's a nice code sample: http://bazaar.launchpad.net/~pitti/fatrace/trunk/view/head:/fatrace.c

However, it seems to be under-documented at the moment. All the docs I could find are http://www.spinics.net/lists/linux-man/msg02302.html and http://lkml.indiana.edu/hypermail/linux/kernel/0811.1/01668.html

Photoactinic answered 21/8, 2012 at 9:55 Comment(0)
P
-1

Parsing command-line utils like strace is cumbersome; you could use ptrace() syscall instead. See man ptrace for details.

Photoactinic answered 21/8, 2012 at 9:59 Comment(1)
So, parsing a utillity that uses ptrace is cumbersome, but writing another one from scratch is not?..Melly

© 2022 - 2024 — McMap. All rights reserved.