Can anyone help me interpret this MSVC debug-mode disassembly from a simple Hello World?
Asked Answered
S

5

12

I got the following simple C++ code:

#include <stdio.h>
int main(void)
{
    ::printf("\nHello,debugger!\n");
}

And from WinDbg, I got the following disassembly code:

SimpleDemo!main:
01111380 55              push    ebp
01111381 8bec            mov     ebp,esp
01111383 81ecc0000000    sub     esp,0C0h
01111389 53              push    ebx
0111138a 56              push    esi
0111138b 57              push    edi
0111138c 8dbd40ffffff    lea     edi,[ebp-0C0h]
01111392 b930000000      mov     ecx,30h
01111397 b8cccccccc      mov     eax,0CCCCCCCCh
0111139c f3ab            rep stos dword ptr es:[edi]
0111139e 8bf4            mov     esi,esp
011113a0 683c571101      push    offset SimpleDemo!`string' (0111573c)
011113a5 ff15b0821101    call    dword ptr [SimpleDemo!_imp__printf (011182b0)]
011113ab 83c404          add     esp,4
011113ae 3bf4            cmp     esi,esp
011113b0 e877fdffff      call    SimpleDemo!ILT+295(__RTC_CheckEsp) (0111112c)
011113b5 33c0            xor     eax,eax
011113b7 5f              pop     edi
011113b8 5e              pop     esi
011113b9 5b              pop     ebx
011113ba 81c4c0000000    add     esp,0C0h
011113c0 3bec            cmp     ebp,esp
011113c2 e865fdffff      call    SimpleDemo!ILT+295(__RTC_CheckEsp) (0111112c)
011113c7 8be5            mov     esp,ebp
011113c9 5d              pop     ebp
011113ca c3              ret

I have some difficulties to fully understand it. What is the SimpleDemo!ILT things doing here?

What's the point of the instruction comparing ebp and esp at 011113c0?

Since I don't have any local variables in main() function, why there's still a sub esp,0C0h at the loacation of 01111383?

Many thanks.

Update 1

Though I still don't know what ILT means, but the __RTC_CheckESP is for runtime checks. These code can be elimiated by placing the following pragma before the main() function.

#pragma runtime_checks( "su", off )

Reference:

http://msdn.microsoft.com/en-us/library/8wtf2dfz.aspx

http://msdn.microsoft.com/en-us/library/6kasb93x.aspx

Update 2

The sub esp,0C0h instruction allocate another 0C0h bytes extra space on the stack. Then EAX is filled with 0xCCCCCCCC, this is 4 bytes, since ECX=30h, 4*30h=0C0h, so the instruction rep stos dword ptr es:[edi] fill exactly the extra spaces with 0xCC. But what is this extra space on stack for? Is this some kind of safe belt? Also I notice that if I turn off the runtime check as Update 1 shows, there's still such extra space on stack, though much smaller. And this space is not filled with 0xCC.

The assembly code without runtime check is like below:

SimpleDemo!main:
00231250 55              push    ebp
00231251 8bec            mov     ebp,esp
00231253 83ec40          sub     esp,40h <-- Still extra space allocated from stack, but smaller
00231256 53              push    ebx
00231257 56              push    esi
00231258 57              push    edi
00231259 683c472300      push    offset SimpleDemo!`string' (0023473c)
0023125e ff1538722300    call    dword ptr [SimpleDemo!_imp__printf (00237238)]
00231264 83c404          add     esp,4
00231267 33c0            xor     eax,eax
00231269 5f              pop     edi
0023126a 5e              pop     esi
0023126b 5b              pop     ebx
0023126c 8be5            mov     esp,ebp
0023126e 5d              pop     ebp
0023126f c3              ret
Squilgee answered 26/10, 2010 at 14:7 Comment(0)
C
39

Most of the instructions are part of MSVC runtime checking, enabled by default for debug builds. Just calling printf and returning 0 in an optimized build takes much less code. (Godbolt compiler explorer). Other compilers (like GCC and clang) don't do as much stuff like stack-pointer comparison after calls, or poisoning stack memory with a recognizable 0xCC pattern to detect use-uninitialized, so their debug builds are like MSVC debug mode without its extra runtime checks.

I've annotated the assembler, hopefully that will help you a bit. Lines starting 'd' are debug code lines, lines starting 'r' are run time check code lines. I've also put in what I think a debug with no runtime checks version and release version would look like.

  ; The ebp register is used to access local variables that are stored on the stack, 
  ; this is known as a stack frame. Before we start doing anything, we need to save 
  ; the stack frame of the calling function so it can be restored when we finish.
  push    ebp                   
  ; These two instructions create our stack frame, in this case, 192 bytes
  ; This space, although not used in this case, is useful for edit-and-continue. If you
  ; break the program and add code which requires a local variable, the space is 
  ; available for it. This is much simpler than trying to relocate stack variables, 
  ; especially if you have pointers to stack variables.
  mov     ebp,esp             
d sub     esp,0C0h              
  ; C/C++ functions shouldn't alter these three registers in 32-bit calling conventions,
  ; so save them. These are stored below our stack frame (the stack moves down in memory)
r push    ebx
r push    esi
r push    edi                   
  ; This puts the address of the stack frame bottom (lowest address) into edi...
d lea     edi,[ebp-0C0h]        
  ; ...and then fill the stack frame with the uninitialised data value (ecx = number of
  ; dwords, eax = value to store)
d mov     ecx,30h
d mov     eax,0CCCCCCCCh     
d rep stos dword ptr es:[edi]   
  ; Stack checking code: the stack pointer is stored in esi
r mov     esi,esp               
  ; This is the first parameter to printf. Parameters are pushed onto the stack 
  ; in reverse order (i.e. last parameter pushed first) before calling the function.
  push    offset SimpleDemo!`string' 
  ; This is the call to printf. Note the call is indirect, the target address is
  ; specified in the memory address SimpleDemo!_imp__printf, which is filled in when
  ; the executable is loaded into RAM.
  call    dword ptr [SimpleDemo!_imp__printf] 
  ; In C/C++, the caller is responsible for removing the parameters. This is because
  ; the caller is the only code that knows how many parameters were put on the stack
  ; (thanks to the '...' parameter type)
  add     esp,4                 
  ; More stack checking code - this sets the zero flag if the stack pointer is pointing
  ; where we expect it to be pointing. 
r cmp     esi,esp               
  ; ILT - Import Lookup Table? This is a statically linked function which throws an
  ; exception/error if the zero flag is cleared (i.e. the stack pointer is pointing
  ; somewhere unexpected)
r call    SimpleDemo!ILT+295(__RTC_CheckEsp)) 
  ; The return value is stored in eax by convention
  xor     eax,eax               
  ; Restore the values we shouldn't have altered
r pop     edi
r pop     esi
r pop     ebx                   
  ; Destroy the stack frame
r add     esp,0C0h              
  ; More stack checking code - this sets the zero flag if the stack pointer is pointing
  ; where we expect it to be pointing. 
r cmp     ebp,esp               
  ; see above
r call    SimpleDemo!ILT+295(__RTC_CheckEsp) 
  ; This is the usual way to destroy the stack frame, but here it's not really necessary
  ; since ebp==esp
  mov     esp,ebp               
  ; Restore the caller's stack frame
  pop     ebp                   
  ; And exit
  ret                           
  
      ; Debug only, no runtime checks  
      push    ebp                   
      mov     ebp,esp             
    d sub     esp,0C0h              
    d lea     edi,[ebp-0C0h]        
    d mov     ecx,30h
    d mov     eax,0CCCCCCCCh     
    d rep stos dword ptr es:[edi]   
      push    offset SimpleDemo!`string' 
      call    dword ptr [SimpleDemo!_imp__printf] 
      add     esp,4                 
      xor     eax,eax               
      mov     esp,ebp               
      pop     ebp                   
      ret                             
      ; Release mode (The optimiser is clever enough to drop the frame pointer setup with no VLAs or other complications)
      push    offset SimpleDemo!`string' 
      call    dword ptr [SimpleDemo!_imp__printf] 
      add     esp,4                 
      xor     eax,eax               
      ret
Childhood answered 26/10, 2010 at 15:37 Comment(5)
Outing myself as an assembler noob: So WHERE is the Hello,debugger! string? I can't see any hex encoded string or something like that. Magic? ;-)Forestation
@moontear: The application is split into sections, each section has a specific purpose and name. text refers to code, bss is used to store data and there are others. The compiler put the above assembler into the text section and the "Hello,debugger!" string into the bss section. The compiler created a name for the string, SimpleDemo!'string', a bit like variable name and the address of the "variable" is pushed onto the stack. In the debugger, displaying the memory from the address of SimpleDemo!'string' onwards would display the text.Childhood
@Marko: I think I am, but then that might just be my programming telling me I am.Childhood
super awesome answer. went above and beyond. wish i could give you more sweet, sweet internet points.Tricia
Even setting up EBP as a frame pointer with push ebp / mov ebp, esp and then restoring it are unnecessary, and x86 MSVC -O2 doesn't do it (godbolt.org/z/1rn9MY551). You could label even those instructions as d debug if you wanted, although compilers will still use EBP as a frame pointer in some functions (notably with C99 VLAs or alloca, or if over-aligning the stack)Pazit
S
2

Number one your code's main() is improperly formed. It doesn't return the int you promised it would return. Correcting this defect, we get:

#include 
int main(int argc, char *argv[])
{
    ::printf("\nHello,debugger!\n");
    return 0;
}

Additionally, any more, it is very strange to see #include <stdio.h> in a C++ program. I believe you want #include <cstdio>

In all cases, space must be made on the stack for arguments and for return values. main()'s return value requires stack space. main()s context to be saved during the call to printf() requires stack space. printf()'s arguments require stack space. printf()'s return value requires stack space. That's what the 0c0h byte stack frame is doing.

The first thing that happens is the incoming bas pointer is copied to the top of the stack. Then the new stack pointer is copied into the base pointer. We'll be checking later to be sure that the stack winds up back where it started from (because you have runtime checking turned on). Then we build the (0C0h bytes long) stack frame to hold our context and printf()'s arguments during the call to printf(). We jump to printf(). When we get back, we hop over the return value which you didn't check in your code (the only thing left on its frame) and make sure the stack after the call is in the same place it was before the call. We pop our context back off the stack. We then check that the final stack pointer matches the value we saved way up at the front. Then we pop the prior value of the base pointer off the very top of the stack and return.

Slough answered 26/10, 2010 at 14:41 Comment(5)
Thanks for your reply. But I don't agree that the 0C0H space is used for holding contexts and return values. See my update 2. And the return value is returned through the EAX register.Squilgee
Since you brought it up, there's no problem leaving out the return from main() in a C++ program - it's equivalent to having a return 0; at the end.Definitely
@Michael Burr: Agree for every function except main(). I've worked on too many systems with broken loaders that will return random stack to the calling environment if main() doesn't explicitly return something.Slough
Just a note — from C++11, omitting return statement in main is the same as returning 0 according to the C++ Standard.Disenthrall
@EricTowers: Perhaps your previous experience with falling off the end of main returning garbage was in C89 systems? An implicit return 0; at the end of int main() was new in C99, borrowed back from C++ which has always had it. Otherwise the systems you used were just buggy. All the mainstream C and C++ compilers get it right, and embedded systems with "freestanding" not "hosted" implementations usually have a void main so we don't need to consider most non-mainstream embedded compilers for this.Pazit
S
1

That is code that is inserted by the compiler when you build with runtime checking (/RTC). Disable those options and it should be clearer. /GZ could also be causing this depending on your VS version.

Strongwilled answered 26/10, 2010 at 14:13 Comment(0)
T
1

For the record, I suspect that ILT means "Incremental Linking Thunk".

The way incremental linking (and Edit&Continue) works is the following: the linker adds a layer of indirection for every call via thunks which are grouped at the beginning of executable, and adds a huge reserved space after them. This way, when you're relinking the updated executable it can just put any new/changed code into the reserved area and patch only the affected thunks, without changing the rest of the code.

Timelag answered 5/3, 2013 at 17:46 Comment(0)
S
0

The 40 bytes is the worst case stack allocation for any called or subsequently called function. This is explained in glorious detail here.

What is this space reserved on the top of the stack for? First, space is created for any local variables. In this case, FunctionWith6Params() has two. However, those two local variables only account for 0x10 bytes. What’s the deal with the rest of the space created on the top of the stack?
On the x64 platform, when code prepares the stack for calling another function, it does not use push instructions to put the parameters on the stack as is commonly the case in x86 code. Instead, the stack pointer typically remains fixed for a particular function. The compiler looks at all of the functions the code in the current function calls, it finds the one with the maximum number of parameters, and then creates enough space on the stack to accommodate those parameters. In this example, FunctionWith6Params() calls printf() passing it 8 parameters. Since that is the called function with the maximum number of parameters, the compiler creates 8 slots on the stack. The top four slots on the stack will then be the home space used by any functions FunctionWith6Params() calls.
Slough answered 27/10, 2010 at 3:2 Comment(1)
Thanks Eric. That blog entry is helpful. I am familiar with the x86 stack operation. But it seems x64 is quite different. I have never expected a compiler would do such optimization.Squilgee

© 2022 - 2024 — McMap. All rights reserved.