How should strace be used?
Asked Answered
W

12

300

A colleague once told me that the last option when everything has failed to debug on Linux was to use strace.

I tried to learn the science behind this strange tool, but I am not a system admin guru and I didn’t really get results.

So,

  • What is it exactly and what does it do?
  • How and in which cases should it be used?
  • How should the output be understood and processed?

In brief, in simple words, how does this stuff work?

Woadwaxen answered 6/10, 2008 at 16:7 Comment(3)
strace -p <pid> will tell you what's going on right now with your program..... quick and easy than GDBTrottier
I personally find man strace really easy to read and useful. (P.S. haven't knew about strace before yesterday and not a Linux expert)Kier
"strace is a system call tracer" - it just shows you what kernel functions are being called (with their arguments) as a result of your program.Dopester
N
204

Strace Overview
strace can be seen as a light weight debugger. It allows a programmer / user to quickly find out how a program is interacting with the OS. It does this by monitoring system calls and signals.

Uses
Good for when you don't have source code or don't want to be bothered to really go through it.
Also, useful for your own code if you don't feel like opening up GDB, but are just interested in understanding external interaction.

A good little introduction
Here is a gentle introduction to using strace to debug process hangs: strace introduction

Nectareous answered 6/10, 2008 at 16:16 Comment(5)
So what if you use something below the layer that strace monitors?Beehive
In that case @Beehive check ltrace https://mcmap.net/q/82551/-how-should-strace-be-usedGoggleeyed
It's great for debugging low-level programs that exist only / mostly to make interesting system calls, or experiment with new options for syscalls to see what your OS does. It basically saves the trouble of writing logging / error-check code for one-off experiments. (Or if you're writing in asm or something where there's a good chance you accidentally passed the wrong args or even call number.) strace is much faster than GDB because it looks up errno codes for you, e.g. -EFAULT (oops, read-only buffer) or -ENOENT (oops, ran from the wrong directory where the relative path didn't work).)Mikiso
Link seems to be dead @John Mulder. Do you mind updating it ? would love to take a look into itOstiary
Couldn’t find the original page, but I changed the link to another introductory howtoNectareous
S
69

In simple words, strace traces all system calls issued by a program along with their return codes. Think things such as file/socket operations and a lot more obscure ones.

It is most useful if you have some working knowledge of C since here system calls would more accurately stand for standard C library calls.

Let's say your program is /usr/local/bin/cough. Simply use:

strace /usr/local/bin/cough <any required argument for cough here>

or

strace -o <out_file> /usr/local/bin/cough <any required argument for cough here>

to write into 'out_file'.

All strace output will go to stderr (beware, the sheer volume of it often asks for a redirection to a file). In the simplest cases, your program will abort with an error and you'll be able to see what where its last interactions with the OS in strace output.

More information should be available with:

man strace
Sigmund answered 6/10, 2008 at 16:17 Comment(0)
N
38

strace lists all system calls done by the process it's applied to. If you don't know what system calls mean, you won't be able to get much mileage from it.

Nevertheless, if your problem involves files or paths or environment values, running strace on the problematic program and redirecting the output to a file and then grepping that file for your path/file/env string may help you see what your program is actually attempting to do, as distinct from what you expected it to.

Nucleate answered 6/10, 2008 at 16:11 Comment(4)
And for non-trivial programs this is often like drinking from a fire-hose, so you have your work cut out for you going through the results...Kendrakendrah
strace <prog_name> to trace a program. strace -o <out_file> <prog_name> to out put to a fileRica
strace prog 2>&1 | grep ^open\(Beautician
Or just: strace -e open myprog OR for all file related sys calls : strace -e file myprogNumberless
J
19

Strace stands out as a tool for investigating production systems where you can't afford to run these programs under a debugger. In particular, we have used strace in the following two situations:

  • Program foo seems to be in deadlock and has become unresponsive. This could be a target for gdb; however, we haven't always had the source code or sometimes were dealing with scripted languages that weren't straight-forward to run under a debugger. In this case, you run strace on an already running program and you will get the list of system calls being made. This is particularly useful if you are investigating a client/server application or an application that interacts with a database
  • Investigating why a program is slow. In particular, we had just moved to a new distributed file system and the new throughput of the system was very slow. You can specify strace with the '-T' option which will tell you how much time was spent in each system call. This helped to determine why the file system was causing things to slow down.

For an example of analyzing using strace see my answer to this question.

Jaquiss answered 7/10, 2008 at 5:4 Comment(0)
U
17

I use strace all the time to debug permission issues. The technique goes like this:

$ strace -e trace=open,stat,read,write gnome-calculator

Where gnome-calculator is the command that you want to run.

Uretic answered 4/5, 2015 at 15:33 Comment(0)
S
10

strace -tfp PID will monitor the PID process's system calls, thus we can debug/monitor our process/program status.

Schoonmaker answered 19/6, 2014 at 14:44 Comment(0)
S
9

Minimal runnable example

If a concept is not clear, there is a simpler example that you haven't seen that explains it.

In this case, that example is the Linux x86_64 assembly freestanding (no libc) hello world:

hello.S

.text
.global _start
_start:
    /* write */
    mov $1, %rax    /* syscall number */
    mov $1, %rdi    /* stdout */
    mov $msg, %rsi  /* buffer */
    mov $len, %rdx  /* buffer len */
    syscall

    /* exit */
    mov $60, %rax   /* syscall number */
    mov $0, %rdi    /* exit status */
    syscall
msg:
    .ascii "hello\n"
len = . - msg

GitHub upstream.

Assemble and run:

as -o hello.o hello.S
ld -o hello.out hello.o
./hello.out

Outputs the expected:

hello

Now let's use strace on that example:

env -i ASDF=qwer strace -o strace.log -s999 -v ./hello.out arg0 arg1
cat strace.log

We use:

strace.log now contains:

execve("./hello.out", ["./hello.out", "arg0", "arg1"], ["ASDF=qwer"]) = 0
write(1, "hello\n", 6)                  = 6
exit(0)                                 = ?
+++ exited with 0 +++

With such a minimal example, every single character of the output is self evident:

  • execve line: shows how strace executed hello.out, including CLI arguments and environment as documented at man execve

  • write line: shows the write system call that we made. 6 is the length of the string "hello\n".

    = 6 is the return value of the system call, which as documented in man 2 write is the number of bytes written.

  • exit line: shows the exit system call that we've made. There is no return value, since the program quit!

More complex examples

The application of strace is of course to see which system calls complex programs are actually doing to help debug / optimize your program.

Notably, most system calls that you are likely to encounter in Linux have glibc wrappers, many of them from POSIX.

Internally, the glibc wrappers use inline assembly more or less like this: How to invoke a system call via syscall or sysenter in inline assembly?

The next example you should study is a POSIX write hello world:

main.c

#define _XOPEN_SOURCE 700
#include <unistd.h>

int main(void) {
    char *msg = "hello\n";
    write(1, msg, 6);
    return 0;
}

Compile and run:

gcc -std=c99 -Wall -Wextra -pedantic -o main.out main.c
./main.out

This time, you will see that a bunch of system calls are being made by glibc before main to setup a nice environment for main.

This is because we are now not using a freestanding program, but rather a more common glibc program, which allows for libc functionality.

Then, at the every end, strace.log contains:

write(1, "hello\n", 6)                  = 6
exit_group(0)                           = ?
+++ exited with 0 +++

So we conclude that the write POSIX function uses, surprise!, the Linux write system call.

We also observe that return 0 leads to an exit_group call instead of exit. Ha, I didn't know about this one! This is why strace is so cool. man exit_group then explains:

This system call is equivalent to exit(2) except that it terminates not only the calling thread, but all threads in the calling process's thread group.

And here is another example where I studied which system call dlopen uses: https://unix.stackexchange.com/questions/226524/what-system-call-is-used-to-load-libraries-in-linux/462710#462710

Tested in Ubuntu 16.04, GCC 6.4.0, Linux kernel 4.4.0.

Somatoplasm answered 28/3, 2019 at 12:7 Comment(0)
R
8

Strace can be used as a debugging tool, or as a primitive profiler.

As a debugger, you can see how given system calls were called, executed and what they return. This is very important, as it allows you to see not only that a program failed, but WHY a program failed. Usually it's just a result of lousy coding not catching all the possible outcomes of a program. Other times it's just hardcoded paths to files. Without strace you get to guess what went wrong where and how. With strace you get a breakdown of a syscall, usually just looking at a return value tells you a lot.

Profiling is another use. You can use it to time execution of each syscalls individually, or as an aggregate. While this might not be enough to fix your problems, it will at least greatly narrow down the list of potential suspects. If you see a lot of fopen/close pairs on a single file, you probably unnecessairly open and close files every execution of a loop, instead of opening and closing it outside of a loop.

Ltrace is strace's close cousin, also very useful. You must learn to differenciate where your bottleneck is. If a total execution is 8 seconds, and you spend only 0.05secs on system calls, then stracing the program is not going to do you much good, the problem is in your code, which is usually a logic problem, or the program actually needs to take that long to run.

The biggest problem with strace/ltrace is reading their output. If you don't know how the calls are made, or at least the names of syscalls/functions, it's going to be difficult to decipher the meaning. Knowing what the functions return can also be very beneficial, especially for different error codes. While it's a pain to decipher, they sometimes really return a pearl of knowledge; once I saw a situation where I ran out of inodes, but not out of free space, thus all the usual utilities didn't give me any warning, I just couldn't make a new file. Reading the error code from strace's output pointed me in the right direction.

Regnant answered 3/8, 2009 at 18:21 Comment(0)
Q
4

Strace is a tool that tells you how your application interacts with your operating system.

It does this by telling you what OS system calls your application uses and with what parameters it calls them.

So for instance you see what files your program tries to open, and weather the call succeeds.

You can debug all sorts of problems with this tool. For instance if application says that it cannot find library that you know you have installed you strace would tell you where the application is looking for that file.

And that is just a tip of the iceberg.

Quinquennial answered 6/10, 2008 at 16:22 Comment(1)
this is very precise.Goggleeyed
N
4

strace is a good tool for learning how your program makes various system calls (requests to the kernel) and also reports the ones that have failed along with the error value associated with that failure. Not all failures are bugs. For example, a code that is trying to search for a file may get a ENOENT (No such file or directory) error but that may be an acceptable scenario in the logic of the code.

One good use case of using strace is to debug race conditions during temporary file creation. For example a program that may be creating files by appending the process ID (PID) to some predecided string may face problems in multi-threaded scenarios. [A PID+TID (process id + thread id) or a better system call such as mkstemp will fix this].

It is also good for debugging crashes. You may find this (my) article on strace and debugging crashes useful.

Nattie answered 3/8, 2009 at 17:27 Comment(0)
R
4

Here's some examples of how I use strace to dig into websites. Hope this is helpful.

Check for time to first byte like so:

time php index.php > timeTrace.txt

See what percentage of actions are doing what. Lots of lstat and fstat could be an indication that it's time to clear the cache:

strace -s 200 -c php index.php > traceLstat.txt

Outputs a trace.txt so you can see exactly what calls are being made.

strace -Tt -o Fulltrace.txt php index.php

Use this to check on whether anything took between .1 to .9 of a second to load:

cat Fulltrace.txt | grep "[<]0.[1-9]" > traceSlowest.txt

See what missing files or directories got caught in the strace. This will output a lot of stuff involving our system - the only relevant bits involve the customer's files:

strace -vv php index.php 2>&1 | sed -n '/= -1/p' > traceFailures.txt
Refractive answered 17/9, 2019 at 13:34 Comment(0)
G
1

I liked some of the answers where it reads strace checks how you interacts with your operating system.

This is exactly what we can see. The system calls. If you compare strace and ltrace the difference is more obvious.

$>strace -c cd
Desktop  Documents  Downloads  examples.desktop  Music  Pictures  Public  Templates  Videos
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  0.00    0.000000           0         7           read
  0.00    0.000000           0         1           write
  0.00    0.000000           0        11           close
  0.00    0.000000           0        10           fstat
  0.00    0.000000           0        17           mmap
  0.00    0.000000           0        12           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         2           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         2           ioctl
  0.00    0.000000           0         8         8 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2           getdents
  0.00    0.000000           0         2         2 statfs
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         9           openat
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000                    93        10 total

On the other hand there is ltrace that traces functions.

$>ltrace -c cd
Desktop  Documents  Downloads  examples.desktop  Music  Pictures  Public  Templates  Videos
% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 15.52    0.004946         329        15 memcpy
 13.34    0.004249          94        45 __ctype_get_mb_cur_max
 12.87    0.004099        2049         2 fclose
 12.12    0.003861          83        46 strlen
 10.96    0.003491         109        32 __errno_location
 10.37    0.003303         117        28 readdir
  8.41    0.002679         133        20 strcoll
  5.62    0.001791         111        16 __overflow
  3.24    0.001032         114         9 fwrite_unlocked
  1.26    0.000400         100         4 __freading
  1.17    0.000372          41         9 getenv
  0.70    0.000222         111         2 fflush
  0.67    0.000214         107         2 __fpending
  0.64    0.000203         101         2 fileno
  0.62    0.000196         196         1 closedir
  0.43    0.000138         138         1 setlocale
  0.36    0.000114         114         1 _setjmp
  0.31    0.000098          98         1 realloc
  0.25    0.000080          80         1 bindtextdomain
  0.21    0.000068          68         1 opendir
  0.19    0.000062          62         1 strrchr
  0.18    0.000056          56         1 isatty
  0.16    0.000051          51         1 ioctl
  0.15    0.000047          47         1 getopt_long
  0.14    0.000045          45         1 textdomain
  0.13    0.000042          42         1 __cxa_atexit
------ ----------- ----------- --------- --------------------
100.00    0.031859                   244 total

Although I checked the manuals several time, I haven't found the origin of the name strace but it is likely system-call trace, since this is obvious.

There are three bigger notes to say about strace.

Note 1: Both these functions strace and ltrace are using the system call ptrace. So ptrace system call is effectively how strace works.

The ptrace() system call provides a means by which one process (the "tracer") may observe and control the execution of another process (the "tracee"), and examine and change the tracee's memory and registers. It is primarily used to implement breakpoint debugging and system call tracing.

Note 2: There are different parameters you can use with strace, since strace can be very verbose. I like to experiment with -c which is like a summary of things. Based on -c you can select one system-call like -e trace=open where you will see only that call. This can be interesting if you are examining what files will be opened during the command you are tracing. And of course, you can use the grep for the same purpose but note you need to redirect like this 2>&1 | grep etc to understand that config files are referenced when the command was issued.

Note 3: I find this very important note. You are not limited to a specific architecture. strace will blow you mind, since it can trace over binaries of different architectures. enter image description here

Goggleeyed answered 24/8, 2018 at 21:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.