How does a debugger work?
Asked Answered
G

7

221

I keep wondering how does a debugger work? Particulary the one that can be 'attached' to already running executable. I understand that compiler translates code to machine language, but then how does debugger 'know' what it is being attached to?

Guyer answered 19/10, 2008 at 19:20 Comment(2)
Eli's article has moved to eli.thegreenplace.net/2011/01/23/how-debuggers-work-part-1Danndanna
@Danndanna This article is interesting but only talks about the API-level abstraction for debugging on Linux. I guess OP wants to know more about under the hood.Coccyx
P
126

The details of how a debugger works will depend on what you are debugging, and what the OS is. For native debugging on Windows you can find some details on MSDN: Win32 Debugging API.

The user tells the debugger which process to attach to, either by name or by process ID. If it is a name then the debugger will look up the process ID, and initiate the debug session via a system call; under Windows this would be DebugActiveProcess.

Once attached, the debugger will enter an event loop much like for any UI, but instead of events coming from the windowing system, the OS will generate events based on what happens in the process being debugged – for example an exception occurring. See WaitForDebugEvent.

The debugger is able to read and write the target process' virtual memory, and even adjust its register values through APIs provided by the OS. See the list of debugging functions for Windows.

The debugger is able to use information from symbol files to translate from addresses to variable names and locations in the source code. The symbol file information is a separate set of APIs and isn't a core part of the OS as such. On Windows this is through the Debug Interface Access SDK.

If you are debugging a managed environment (.NET, Java, etc.) the process will typically look similar, but the details are different, as the virtual machine environment provides the debug API rather than the underlying OS.

Punkah answered 19/10, 2008 at 19:32 Comment(6)
This question may sound stupid but how does the OS keep track if a specific address inside the program is reached. E.g. a breackpoint is set on address 0x7710cafe. As the instruction pointer changes the OS (or maybe the CPU) will have to compare the instruction pointer with all break point addresses, or am I mistaken? How does this work ..?Mikaela
@StefanFalk I wrote an answer that addresses some of the lower-level details (on x86).Varitype
Could you explain how exactly the variable names map to addresses? Does the application use the same memory addresses for variables every time it is run? I had always assumed that it just found mapped from available memory but had never really thought about whether the bytes would map directly to the same spot in the app's memory space. It seems like that would be a major security issue.Impressible
@JamesJoshuaStreet I would imagine that that is a detail that is specific to the debugger.Jaundiced
This answer reveals something. But I think op is more interested in some low level details rather than some API abstractions.Coccyx
What is the purpose of the gdb? Is it a wrapper over the OS Debug APIs for debugging the c/c++ code.Muscat
V
100

As I understand it:

For software breakpoints on x86, the debugger replaces the first byte of the instruction with CC (int3). This is done with WriteProcessMemory on Windows. When the CPU gets to that instruction, and executes the int3, this causes the CPU to generate a debug exception. The OS receives this interrupt, realizes the process is being debugged, and notifies the debugger process that the breakpoint was hit.

After the breakpoint is hit and the process is stopped, the debugger looks in its list of breakpoints, and replaces the CC with the byte that was there originally. The debugger sets TF, the Trap Flag in EFLAGS (by modifying the CONTEXT), and continues the process. The Trap Flag causes the CPU to automatically generate a single-step exception (INT 1) on the next instruction.

When the process being debugged stops the next time, the debugger again replaces the first byte of the breakpoint instruction with CC, and the process continues.

I'm not sure if this is exactly how it's implemented by all debuggers, but I've written a Win32 program that manages to debug itself using this mechanism. Completely useless, but educational.

Varitype answered 13/2, 2014 at 6:30 Comment(0)
D
36

In Linux, debugging a process begins with the ptrace(2) system call. This article has a great tutorial on how to use ptrace to implement some simple debugging constructs.

Derringer answered 19/10, 2008 at 20:52 Comment(4)
Does the (2) tell us something more (or less) than "ptrace is a system call"?Vermis
@eSKay, no not really. The (2) is the manual section number. See en.wikipedia.org/wiki/Man_page#Manual_sections for a description of the manual sections.Derringer
@AdamRosenfield Except for the fact that section 2 is specifically "System Calls". So indirectly, yes, it tells us that ptrace is a system call.Varitype
More practically, the (2) tells us that we can type man 2 ptrace and get the right manpage -- not important here because there's no other ptrace to disambiguate, but for compare man printf with man 3 printf on Linux.Pironi
M
10

If you're on a Windows OS, a great resource for this would be "Debugging Applications for Microsoft .NET and Microsoft Windows" by John Robbins:

(or even the older edition: "Debugging Applications")

The book has has a chapter on how a debugger works that includes code for a couple of simple (but working) debuggers.

Since I'm not familiar with details of Unix/Linux debugging, this stuff may not apply at all to other OS's. But I'd guess that as an introduction to a very complex subject the concepts - if not the details and APIs - should 'port' to most any OS.

Mediatorial answered 19/10, 2008 at 20:41 Comment(0)
T
10

I think there are two main questions to answer here:

1. How the debugger knows that an exception occurred?

When an exception occurs in a process that’s being debugged, the debugger gets notified by the OS before any user exception handlers defined in the target process are given a chance to respond to the exception. If the debugger chooses not to handle this (first-chance) exception notification, the exception dispatching sequence proceeds further and the target thread is then given a chance to handle the exception if it wants to do so. If the SEH exception is not handled by the target process, the debugger is then sent another debug event, called a second-chance notification, to inform it that an unhandled exception occurred in the target process. Source

enter image description here


2. How the debugger knows how to stop on a breakpoint?

The simplified answer is: When you put a break-point into the program, the debugger replaces your code at that point with a int3 instruction which is a software interrupt. As an effect the program is suspended and the debugger is called.

Toluate answered 9/5, 2019 at 13:29 Comment(0)
B
7

Another valuable source to understand debugging is Intel CPU manual (Intel® 64 and IA-32 Architectures Software Developer’s Manual). In the volume 3A, chapter 16, it introduced the hardware support of debugging, such as special exceptions and hardware debugging registers. Following is from that chapter:

T (trap) flag, TSS — Generates a debug exception (#DB) when an attempt is made to switch to a task with the T flag set in its TSS.

I am not sure whether Window or Linux use this flag or not, but it is very interesting to read that chapter.

Hope this helps someone.

Bolen answered 1/3, 2011 at 23:44 Comment(0)
O
2

My understanding is that when you compile an application or DLL file, whatever it compiles to contains symbols representing the functions and the variables.

When you have a debug build, these symbols are far more detailed than when it's a release build, thus allowing the debugger to give you more information. When you attach the debugger to a process, it looks at which functions are currently being accessed and resolves all the available debugging symbols from here (since it knows what the internals of the compiled file looks like, it can acertain what might be in the memory, with contents of ints, floats, strings, etc.). Like the first poster said, this information and how these symbols work greatly depends on the environment and the language.

Oubliette answered 19/10, 2008 at 19:42 Comment(1)
This is just about symbols. There's far, far more to debugging than symbols.Varitype

© 2022 - 2024 — McMap. All rights reserved.