How does reverse debugging work?
Asked Answered
C

8

91

GDB has a new version out that supports reverse debug (see http://www.gnu.org/software/gdb/news/reversible.html). I got to wondering how that works.

To get reverse debug to work it seems to me that you need to store the entire machine state including memory for each step. This would make performance incredibly slow, not to mention using a lot of memory. How are these problems solved?

Cila answered 24/9, 2009 at 8:35 Comment(3)
I imagine you could get by with storing state deltas rather than the entire state, but it still seems like it might be costly.Vachon
related question #523119Falstaffian
Saving deltas can work very well indeed, and is really necessary for an efficient full-system reversible solution.Jackass
S
139

I'm a gdb maintainer and one of the authors of the new reverse debugging. I'd be happy to talk about how it works. As several people have speculated, you need to save enough machine state that you can restore later. There are a number of schemes, one of which is to simply save the registers or memory locations that are modified by each machine instruction. Then, to "undo" that instruction, you just revert the data in those registers or memory locations.

Yes, it is expensive, but modern cpus are so fast that when you are interactive anyway (doing stepping or breakpoints), you don't really notice it that much.

Skeg answered 8/10, 2009 at 3:37 Comment(8)
But does reverse debugging only allow you to roll back next and step commands that you typed, or does it allow you to undo any number of instructions? For instance, if I set a breakpoint on an instruction and let it run until then, can I then roll back to the previous instruction, even though I skipped over it?Cila
> But does reverse debugging only allow you to roll back next and step commands that you typed or does it allow you to undo any number of instructions You can undo any number of instructions. Youre not restricted to, for instance only stopping at the points where you stopped when you were going forward. You can set a new breakpoint and run backwards to it > For instance if I set a breakpoint on an instruction and let it run until then, can I then roll back to the previous instruction, even though I skipped over it Yes So long as you turned on recording mode before you ran to the breakpointSkeg
Sorry for the unformatted text, don't know what's up with that.Skeg
Most syscalls modify data in a buffer. You just record the contents of the boffer before and after.Skeg
@Michael Snyder Wow this is amazing. Thanks for posting on SO.Floyfloyd
I'm worried that reverse-debugging can undo time and make us go back to the 60s or 70s. I don't want to wear bell-bottoms and grow my hair long again.Adamantine
And the syscalls that modify state in the OS? Does that just fail to work properly? What about when it modifies an opaque handle?Omegaomelet
@MichaelSnyder Is the reverse-debugging algorithm just to push on some stack each value which is overwritten ? (so a push before each mov) An alternative would be to re-execute the whole code until the next instruction is the one we want to revert, and hash the whole memory to check the state at this instruction is the desired one.Varicelloid
J
13

Note that you must not forget the use of simulators, virtual machines, and hardware recorders to implement reverse execution.

Another solution to implement it is to trace execution on physical hardware, such as is done by GreenHills and Lauterbach in their hardware-based debuggers. Based on this fixed trace of the action of each instruction, you can then move to any point in the trace by removing the effects of each instruction in turn. Note that this assumes that you can trace all things that affect the state visible in the debugger.

Another way is to use a checkpoint + re-execution method, which is used by VmWare Workstation 6.5 and Virtutech Simics 3.0 (and later), and which seems to be coming with Visual Studio 2010. Here, you use a virtual machine or a simulator to get a level of indirection on the execution of a system. You regularly dump the entire state to disk or memory, and then rely on the simulator being able to deterministically re-execute the exact same program path.

Simplified, it works like this: say that you are at time T in the execution of a system. To go to time T-1, you pick up some checkpoint from point t < T, and then execute (T-t-1) cycles to end up one cycle before where you were. This can be made to work very well, and apply even for workloads that do disk IO, consist of kernel-level code, and performs device driver work. The key is to have a simulator that contains the entire target system, with all its processors, devices, memories, and IOs. See the gdb mailinglist and the discussion following that on the gdb mailing list for more details. I use this approach myself quite regularly to debug tricky code, especially in device drivers and early OS boots.

Another source of information is a Virtutech white paper on checkpointing (which I wrote, in full disclosure).

Jackass answered 20/10, 2009 at 19:20 Comment(3)
Also see jakob.engbloms.se/archives/1547 and its two following blog posts for a more thorough walk-through of reverse debugging techniques.Jackass
How about the ability of "setting save points" instead of implementing reverse stepping. So, you debug, and at some point you can select the current step as "save point", and later on you have the ability to jump back to that save point and step forward again, editing your variables if necessary. Sort of like "snapshots" for VMs, or "restore points" for OSes.Propene
The problem I see with the use of virtual machines is that you cannot reproduce the behavior of external dependencies (e.g. server HTTP responses) deterministically.Laodicean
C
11

Although this question is old, most of the answers are too, and as remains an interesting topic, I'm posting a 2015 answer. Chapters 1 and 2 of my MSc thesis, Combining reverse debugging and live programming towards visual thinking in computer programming, covers some of the historical approaches to reverse debugging (especially focused on the snapshot-(or checkpoint)-and-replay approach), and explains the difference between it and omniscient debugging:

The computer, having forward-executed the program up to some point, should really be able to provide us with information about it. Such an improvement is possible, and is found in what are called omniscient debuggers. They are usually classified as reverse debuggers, although they might more accurately be described as "history logging" debuggers, as they merely record information during execution to view or query later, rather than allow the programmer to actually step backwards in time in an executing program. "Omniscient" comes from the fact that the entire state history of the program, having been recorded, is available to the debugger after execution. There is then no need to rerun the program, and no need for manual code instrumentation.

Software-based omniscient debugging started with the 1969 EXDAMS system where it was called "debug-time history-playback". The GNU debugger, GDB, has supported omniscient debugging since 2009, with its 'process record and replay' feature. TotalView, UndoDB and Chronon appear to be the best omniscient debuggers currently available, but are commercial systems. TOD, for Java, appears to be the best open-source alternative, which makes use of partial deterministic replay, as well as partial trace capturing and a distributed database to enable the recording of the large volumes of information involved.

Debuggers that do not merely allow navigation of a recording, but are actually able to step backwards in execution time, also exist. They can more accurately be described as back-in-time, time-travel, bidirectional or reverse debuggers.

The first such system was the 1981 COPE prototype ...

Conatus answered 8/6, 2015 at 12:19 Comment(0)
S
9

During an EclipseCon session we also asked how they do this with the Chronon Debugger for Java. That one does not allow you to actually step back, but can play back a recorded program execution in such a way that it feels like reverse debugging. (The main difference is that you cannot change the running program in the Chronon debugger, while you can do that in most other Java debuggers.)

If I understood it correctly, it manipulates the byte code of the running program, such that every change of an internal state of the program is recorded. External states don't need to be recorded additionally. If they influence your program in some way, then you must have an internal variable matching that external state (and therefore that internal variable is enough).

During playback time they can then basically recreate every state of the running program from the recorded state changes.

Interestingly the state changes are much smaller than one would expect on first look. So if you have a conditional "if" statement, you would think that you need at least one bit to record whether the program took the then- or the else-statement. In many cases you can avoid even that, like in the case that those different branches contain a return value. Then it is enough to record only the return value (which would be needed anyway) and to recalculate the decision about the executed branch from the return value itself.

Saddlery answered 2/4, 2012 at 14:15 Comment(0)
I
6

mozilla rr is a more robust alternative to GDB reverse debugging

https://github.com/mozilla/rr

GDB's target record-full built-in record and replay is severely limited, notably it stores too much state data and so has a very limited record length. Previously it had no support for AVX instructions: gdb reverse debugging fails with "Process record does not support instruction 0xf0d at address" but that seems to have been fixed.

Upsides of rr:

  • much more reliable currently. I have tested it relatively long runs of several complex software.
  • also offers a GDB interface with gdbserver protocol, making it a great replacement
  • small performance drop for most programs, I haven't noticed it myself without doing measurements
  • the generated traces are small on disk because only very few non-deterministic events are recorded, I've never had to worry about their size so far

rr achieves this by first running the program in a way that records what happened on every single non-deterministic event such as a thread switch.

Then during the second replay run, it uses that trace file, which is surprisingly small, to reconstruct exactly what happened on the original non-deterministic run but in a deterministic way, either forwards or backwards.

rr was originally developed by Mozilla to help them reproduce timing bugs that showed up on their nightly testing the following day. But the reverse debugging aspect is also fundamental for when you have a bug that only happens hours inside execution, since you often want to step back to examine what previous state led to the later failure.

The following example showcases some of its features, notably the reverse-next, reverse-step and reverse-continue commands.

Install on Ubuntu 18.04:

sudo apt-get install rr linux-tools-common linux-tools-generic linux-cloud-tools-generic
sudo cpupower frequency-set -g performance
# Overcome "rr needs /proc/sys/kernel/perf_event_paranoid <= 1, but it is 3."
echo 'kernel.perf_event_paranoid=1' | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

Test program:

reverse.c

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int f() {
    int i;
    i = 0;
    i = 1;
    i = 2;
    return i;
}

int main(void) {
    int i;

    i = 0;
    i = 1;
    i = 2;

    /* Local call. */
    f();

    printf("i = %d\n", i);

    /* Is randomness completely removed?
     * Recently fixed: https://github.com/mozilla/rr/issues/2088 */
    i = time(NULL);
    printf("time(NULL) = %d\n", i);

    return EXIT_SUCCESS;
}

compile and run:

gcc -O0 -ggdb3 -o reverse.out -std=c89 -Wextra reverse.c
rr record ./reverse.out
rr replay

Now you are left inside a GDB session, and you can properly reverse debug:

(rr) break main
Breakpoint 1 at 0x55da250e96b0: file a.c, line 16.
(rr) continue
Continuing.

Breakpoint 1, main () at a.c:16
16          i = 0;
(rr) next
17          i = 1;
(rr) print i
$1 = 0
(rr) next
18          i = 2;
(rr) print i
$2 = 1
(rr) reverse-next
17          i = 1;
(rr) print i
$3 = 0
(rr) next
18          i = 2;
(rr) print i
$4 = 1
(rr) next
21          f();
(rr) step
f () at a.c:7
7           i = 0;
(rr) reverse-step
main () at a.c:21
21          f();
(rr) next
23          printf("i = %d\n", i);
(rr) next
i = 2
27          i = time(NULL);
(rr) reverse-next
23          printf("i = %d\n", i);
(rr) next
i = 2
27          i = time(NULL);
(rr) next
28          printf("time(NULL) = %d\n", i);
(rr) print i
$5 = 1509245372
(rr) reverse-next
27          i = time(NULL);
(rr) next
28          printf("time(NULL) = %d\n", i);
(rr) print i
$6 = 1509245372
(rr) reverse-continue
Continuing.

Breakpoint 1, main () at a.c:16
16          i = 0;

When debugging complex software, you will likely run up to a crash point, and then fall inside a deep frame. In that case, don't forget that to reverse-next on higher frames, you must first:

reverse-finish

up to that frame, just doing the usual up is not enough.

The most serious limitations of rr in my opinion are:

UndoDB is a commercial alternative to rr: https://undo.io Both are trace / replay based, but I'm not sure how they compare in terms of features and performance.

Illusionary answered 30/10, 2018 at 11:24 Comment(3)
Do you know how I can do this with ddd? ThanksDeutoplasm
@Deutoplasm I'm not sure, but likely. First try to connect ddd to gdbserver. If that works, it should work with rr as well.Illusionary
@Deutoplasm however, don't use ddd, use gdb dashboard ;-) https://mcmap.net/q/16866/-gdb-split-view-with-code/… This will definitely work since it is just regular GDB.Illusionary
S
4

Nathan Fellman wrote:

But does reverse debugging only allow you to roll back next and step commands that you typed, or does it allow you to undo any number of instructions?

You can undo any number of instructions. You're not restricted to, for instance, only stopping at the points where you stopped when you were going forward. You can set a new breakpoint and run backwards to it.

For instance, if I set a breakpoint on an instruction and let it run until then, can I then roll back to the previous instruction, even though I skipped over it?

Yes. So long as you turned on recording mode before you ran to the breakpoint.

Skeg answered 8/10, 2009 at 22:47 Comment(1)
A crucial part of any reverse solution is that you turn it on at some point, and can only reverse up until that point. There is no magic that can run a machine in true reverse and find out what happened previously without some kind of record of what happened.Jackass
R
2

Here is how another reverse-debugger called ODB works. Extract:

Omniscient Debugging is the idea of collecting "time stamps" at each "point of interest" (setting a value, making a method call, throwing/catching an exception) in a program and then allowing the programmer to use those time stamps to explore the history of that program run.

The ODB ... inserts code into the program's classes as they are loaded and when the program runs, the events are recorded.

I'm guessing the gdb one works in the same kind of way.

Rudolph answered 24/9, 2009 at 9:1 Comment(4)
So would this require directives in the code to tell the compiler and debugger where those interesting points are?Cila
No. There's a Java Web Start demo on www.LambdaCS.com/debugger/debugger.html which shows you how it works. It looks like a normal program. That's ODB anyway, don't know about gdb. It's very cool though :)Rudolph
Note that the gdb solution DOES NOT change the target program in any way. If you have to instrument a program to debug it, you have a fair chance of the problem disappearing due to timing difference and other disturbances. All commercial revexec tools are based on some form of external record that does not change the code of the program itself.Jackass
@jakobengblom2: I think you're putting too much emphasis on the difference between changing the target by writing into its memory, emulating execution, or simply adding hardware breakpoints. They all change timing. In fact, target instrumentation probably changes timing the least.Otero
R
2

Reverse debugging means you can run the program backwards, which is very useful to track down the cause of a problem.

You don't need to store the complete machine state for each step, only the changes. It is probably still quite expensive.

Reedreedbird answered 24/9, 2009 at 10:24 Comment(2)
I see, but you still need to break execution at each change to save the changes.Cila
Yes, that's correct, but machines are quite fast now, and in human terms I don't believe the slow down is intollerable. It is comparable to valgrind, maybe not as slow as valgrind.Skeg

© 2022 - 2024 — McMap. All rights reserved.