How can I monitor what's being put into the standard out buffer and break when a specific string is deposited in the pipe?
Asked Answered
R

4

20

In Linux, with C/C++ code, using gdb, how can you add a gdb breakpoint to scan the incoming strings in order to break on a particular string?

I don't have access to a specific library's code, but I want to break as soon as that library sends a specific string to standard out so I can go back up the stack and investigate the part of my code that is calling the library. Of course I don't want to wait until a buffer flush occurs. Can this be done? Perhaps a routine in libstdc++ ?

Rhaetia answered 22/11, 2011 at 23:34 Comment(0)
P
26

This question might be a good starting point: how can I put a breakpoint on "something is printed to the terminal" in gdb?

So you could at least break whenever something is written to stdout. The method basically involves setting a breakpoint on the write syscall with a condition that the first argument is 1 (i.e. STDOUT). In the comments, there is also a hint as to how you could inspect the string parameter of the write call as well.

x86 32-bit mode

I came up with the following and tested it with gdb 7.0.1-debian. It seems to work quite well. $esp + 8 contains a pointer to the memory location of the string passed to write, so first you cast it to an integral, then to a pointer to char. $esp + 4 contains the file descriptor to write to (1 for STDOUT).

$ gdb break write if 1 == *(int*)($esp + 4) && strcmp((char*)*(int*)($esp + 8), "your string") == 0

x86 64-bit mode

If your process is running in x86-64 mode, then the parameters are passed through scratch registers %rdi and %rsi

$ gdb break write if 1 == $rdi && strcmp((char*)($rsi), "your string") == 0

Note that one level of indirection is removed since we're using scratch registers rather than variables on the stack.

Variants

Functions other than strcmp can be used in the above snippets:

  • strncmp is useful if you want match the first n number of characters of the string being written
  • strstr can be used to find matches within a string, since you can't always be certain that the string you're looking for is at the beginning of string being written through the write function.

Edit: I enjoyed this question and finding it's subsequent answer. I decided to do a blog post about it.

Panties answered 22/11, 2011 at 23:59 Comment(3)
This answer saved me again today. I'm gonna create a bounty just so I can give you more cred' for this answer.Rhaetia
Ha, that's awesome. No worries mate!Panties
This makes the program go to background for me, then after fg it says Error in testing breakpoint condition: 'strcmp' has unknown return type; cast the call to its declared return typeMithridate
E
5

catch + strstr condition

The cool thing about this method is that it does not depend on glibc write being used: it traces the actual system call.

Furthermore, it is more resilient to printf() buffering, as it might even catch strings that are printed across multiple printf() calls.

x86_64 version:

define stdout
    catch syscall write
    commands
        printf "rsi = %s\n", $rsi
        bt
    end
    condition $bpnum $rdi == 1 && strstr((char *)$rsi, "$arg0") != NULL
end
stdout qwer

Test program:

#define _XOPEN_SOURCE 700
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    write(STDOUT_FILENO, "asdf1", 5);
    write(STDOUT_FILENO, "qwer1", 5);
    write(STDOUT_FILENO, "zxcv1", 5);
    write(STDOUT_FILENO, "qwer2", 5);
    printf("as");
    printf("df");
    printf("qw");
    printf("er");
    printf("zx");
    printf("cv");
    fflush(stdout);
    return EXIT_SUCCESS;
}

Outcome: breaks at:

  • qwer1
  • qwer2
  • fflush. The previous printf didn't actually print anything, they were buffered! The write syacall only happened on the fflush.

Notes:

  • $bpnum thanks to Tromey at: https://sourceware.org/bugzilla/show_bug.cgi?id=18727
  • rdi: register that contains the number of the Linux system call in x86_64, 1 is for write
  • rsi: first argument of the syscall, for write it points to the buffer
  • strstr: standard C function call, searches for submatches, returns NULL if non found

Tested in Ubuntu 17.10, gdb 8.0.1.

strace

Another option if you are feeling interactive:

setarch "$(uname -m)" -R strace -i ./stdout.out |& grep '\] write'

Sample output:

[00007ffff7b00870] write(1, "a\nb\n", 4a

Now copy that address and paste it into:

setarch "$(uname -m)" -R strace -i ./stdout.out |& grep -E '\] write\(1, "a'

The advantage of this method is that you can use the usual UNIX tools to manipulate strace output, and it does not require deep GDB-fu.

Explanation:

Easing answered 27/7, 2015 at 20:59 Comment(3)
This was absolutely eye-opening! Worked like a charm. Better than other solutions based on strstr, strcmp and so, because this "Python" functions does not have to be already present in your program.Prescriptible
For novices, like me. Write the script in script.gdb and execute $gdb /path/yourprogram, type (gdb) source script.gdb and run your program with (gdb) run.Prescriptible
Sorry, I just Mindf**ed, this post isn't using those functions, but I cannot find the correct post, so for others, just look here: sourceware.org/gdb/current/onlinedocs/gdb/…Prescriptible
C
3

Anthony's answer is awesome. Following his answer, I tried out another solution on Windows(x86-64 bits Windows). I know this question here is for GDB on Linux, however, I think this solution is a supplement for this kind of question. It might be helpful for others.

Solution on Windows

In Linux a call to printf would result in call to the API write. And because Linux is an open source OS, we could debug within the API. However, the API is different on Windows, it provided it's own API WriteFile. Due to Windows is a commercial non-open source OS, breakpoints could not be added in the APIs.

But some of the source code of VC is published together with Visual Studio, so we could find out in the source code where finally called the WriteFile API and set a breakpoint there. After debugging on the sample code, I found the printf method could result in a call to _write_nolock in which WriteFile is called. The function is located in:

your_VS_folder\VC\crt\src\write.c

The prototype is:

/* now define version that doesn't lock/unlock, validate fh */
int __cdecl _write_nolock (
        int fh,
        const void *buf,
        unsigned cnt
        )

Compared to the write API on Linux:

#include <unistd.h>

ssize_t write(int fd, const void *buf, size_t count); 

They have totally the same parameters. So we could just set a condition breakpoint in _write_nolock just refer to the solutions above, with only some differences in detail.

Portable Solution for Both Win32 and x64

It is very lucky that we could use the name of parameters directly on Visual Studio when setting a condition for breakpoints on both Win32 and x64. So it becomes very easy to write the condition:

  1. Add a breakpoints in _write_nolock

    NOTICE: There are little difference on Win32 and x64. We could just use the function name to set the location of breakpoints on Win32. However, it won't work on x64 because in the entrance of the function, the parameters is not initialized. Therefore, we could not use the parameter name to set the condition of breakpoints.

    But fortunately we have some work around: use the location in the function rather than the function name to set the breakpoints, e.g., the 1st line of the function. The parameters are already initialized there. (I mean use the filename+line number to set the breakpoints, or open the file directly and set a breakpoint in the function, not the entrance but the first line. )

  2. Restrict the condition:

    fh == 1 && strstr((char *)buf, "Hello World") != 0
    

NOTICE: there is still a problem here, I tested two different ways to write something into stdout: printf and std::cout. printf would write all the strings to the _write_nolock function at once. However std::cout would only pass character by character to _write_nolock, which means the API would be called strlen("your string") times. In this case, the condition could not be activated forever.

Win32 Solution

Of course we could use the same methods as Anthony provided: set the condition of breakpoints by registers.

For a Win32 program, the solution is almost the same with GDB on Linux. You might notice that there is a decorate __cdecl in the prototype of _write_nolock. This calling convention means:

  • Argument-passing order is Right to left.
  • Calling function pops the arguments from the stack.
  • Name-decoration convention: Underscore character (_) is prefixed to names.
  • No case translation performed.

There is a description here. And there is an example which is used to show the registers and stacks on Microsoft's website. The result could be found here.

Then it is very easy to set the condition of breakpoints:

  1. Set a breakpoint in _write_nolock.
  2. Restrict the condition:

    *(int *)($esp + 4) == 1 && strstr(*(char **)($esp + 8), "Hello") != 0
    

It is the same method as on the Linux. The first condition is to make sure the string is written to stdout. The second one is to match the specified string.

x64 Solution

Two important modification from x86 to x64 are the 64-bit addressing capability and a flat set of 16 64-bit registers for general use. As the increase of registers, x64 only use __fastcall as the calling convention. The first four integer arguments are passed in registers. Arguments five and higher are passed on the stack.

You could refer to the Parameter Passing page on Microsoft's website. The four registers (in order left to right) are RCX, RDX, R8 and R9. So it is very easy to restrict the condition:

  1. Set a breakpoint in _write_nolock.

    NOTICE: it's different from the portable solution above, we could just set the location of breakpoint to the function rather than the 1st line of the function. The reason is all the registers are already initialized at the entrance.

  2. Restrict condition:

    $rcx == 1 && strstr((char *)$rdx, "Hello") != 0
    

The reason why we need cast and dereference on esp is that $esp accesses the ESP register, and for all intents and purposes is a void*. While the registers here stores directly the values of parameters. So another level of indirection is not needed anymore.

Post

I also enjoy this question very much, so I translated Anthony's post into Chinese and put my answer in it as a supplement. The post could be found here. Thanks for @anthony-arnold 's permission.

Castigate answered 17/1, 2014 at 6:48 Comment(0)
H
3

Anthony's answer is very interesting and it definitely gives some results. Yet, I think it might miss the buffering of printf. Indeed on Difference between write() and printf(), you can read that: "printf doesn't necessarily call write every time. Rather, printf buffers its output."

STDIO WRAPPER SOLUTION

Hence I came with another solution that consists in creating a helper library that you can pre-load to wrap the printf like functions. You can then set some breakpoints on this library source and backtrace to get the info about the program you are debugging.

It works on Linux and target the libc, I do not know for c++ IOSTREAM, also if the program use write directly, it will miss it.

Here is the wrapper to hijack the printf (io_helper.c).

#include<string.h>
#include<stdio.h>
#include<stdarg.h>

#define MAX_SIZE 0xFFFF

int printf(const char *format, ...){
    char target_str[MAX_SIZE];
    int i=0;

    va_list args1, args2;

    /* RESOLVE THE STRING FORMATING */
    va_start(args1, format);
    vsprintf(target_str,format, args1);
    va_end(args1);

    if (strstr(target_str, "Hello World")){ /* SEARCH FOR YOUR STRING */
       i++; /* BREAK HERE */
    }   

    /* OUTPUT THE STRING AS THE PROGRAM INTENTED TO */
    va_start(args2, format);
    vprintf(format, args2);
    va_end(args2);
    return 0;
}

int puts(const char *s) 
{   
   return printf("%s\n",s);
}

I added puts because gcc tend to replace printf by puts when it can. So I force it back to printf.

Next you just compile it to a shared library.

gcc -shared -fPIC io_helper.c -o libio_helper.so -g

And you load it before running gdb.

LD_PRELOAD=$PWD/libio_helper.so; gdb test

Where test is the program you are debugging.

Then you can break with break io_helper.c:19 because you compiled the library with -g.

EXPLANATIONS

Our luck here is that printf and other fprintf, sprintf... are just here to resolve the variadic arguments and to call their 'v' equivalent. (vprintf in our case). Doing this job is easy, so we can do it and leave the real work to libc with the 'v' function. To get the variadic args of printf, we just have to use va_start and va_end.

The main advantages of this method is that you are sure that when you break, you are in the portion of the program that output your target string and that this is not a leftover in a buffer. Also you do not make any assumption on the hardware. The drawback is that you are assuming that the program use the libc stdio function to output things.

Herve answered 19/5, 2016 at 1:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.