I've compiled the following using Visual Studio C++ 2008 SP1, x64
C++
compiler:
I'm curious, why did compiler add those nop
instructions after those call
s?
PS1. I would understand that the 2nd and 3rd nop
s would be to align the code on a 4 byte margin, but the 1st nop
breaks that assumption.
PS2. The C++ code that was compiled had no loops or special optimization stuff in it:
CTestDlg::CTestDlg(CWnd* pParent /*=NULL*/)
: CDialog(CTestDlg::IDD, pParent)
{
m_hIcon = AfxGetApp()->LoadIcon(IDR_MAINFRAME);
//This makes no sense. I used it to set a debugger breakpoint
::GdiFlush();
srand(::GetTickCount());
}
PS3. Additional Info: First off, thank you everyone for your input.
Here's additional observations:
My first guess was that incremental linking could've had something to do with it. But, the
Release
build settings in theVisual Studio
for the project haveincremental linking
off.This seems to affect
x64
builds only. The same code built asx86
(orWin32
) does not have thosenop
s, even though instructions used are very similar:
- I tried to build it with a newer linker, and even though the
x64
code produced byVS 2013
looks somewhat different, it still adds thosenop
s after somecall
s:
- Also
dynamic
vsstatic
linking to MFC made no difference on presence of thosenop
s. This one is built with dynamical linking to MFC dlls withVS 2013
:
- Also note that those
nop
s can appear afternear
andfar
call
s as well, and they have nothing to do with alignment. Here's a part of the code that I got fromIDA
if I step a little bit further on:
As you see, the nop
is inserted after a far
call
that happens to "align" the next lea
instruction on the B
address! That makes no sense if those were added for alignment only.
- I was originally inclined to believe that since
near
relative
call
s (i.e. those that start withE8
) are somewhat faster thanfar
call
s (or the ones that start withFF
,15
in this case)
the linker may try to go with near
call
s first, and since those are one byte shorter than far
call
s, if it succeeds, it may pad the remaining space with nop
s at the end. But then the example (5) above kinda defeats this hypothesis.
So I still don't have a clear answer to this.
call cs:LoadIconW
instruction in the disassembly above is an example of this. The location the disassembler has calledLoadIconW
contains a pointer to the actualLoadIconW
function. – Harbouragecall
instruction with a prefix, but I'm not sure that's guaranteed to be future-proof. Some future ISA extension might userep call
to mean something special. I tested, andcall
works on Skylake when preceded byrep
, or0x40
(REX.W=0), or0x48
(REX.W=1). I'd guess that a REX prefix is more future-proof. A linker would need to check that there wasn't already a REX prefix, though (e.g. from hand-written code with padding), and that's impossible because you can't unambiguously step backwards in x86. Multiple REP prefixes would be ok – Blackingtoncall
orjmp
instruction, right? The opcode has to change from indirect to rel32. (Hmm, prefixes onjcc
instructions have special meaning as branch-prediction hints on P4. Butjcc
can't be indirect anyway, so could only appear for conditional tailcalls that were already using a direct jump.) – BlackingtonREX
has to be the last prefix if it appears, so checking the byte before thecall
opcode can give false positives (previous instruction ended with0x4?
), but not false negatives. – Blackingtoncall
on pretty much everything and highly unlikely to ever change the meaning ofcall
? That's what GNU ld is using. – Rosarosabelsyscall
andsysenter
, but no OS I'm aware of uses them). The difference is direct vs indirect. – Rosarosabelnop
s are still there even with dynamic linking, the linker relaxation idea's gotta be wrong then. – Rosarosabelcall/nop
: « add the necessary call/nop to those functions within the “.init” section » (cseweb.ucsd.edu/~gbournou/CSE131/GlobalAndStaticVars.pdf) But them doesn't explain why. Makes the answers here not very satisfying to me. – Sachsse