Does C# System.String Instances Really End Up on the Heap?
Asked Answered
P

0

8

Let's consider some very simple C# code:

static void Main(string[] args)
        {
            int i = 5;
            string s = "ABC";
            bool b = false;
        }

Jeffrey Richter's "CLR via C#" (Chapter 14) states that "The String type is derived immediately from Object, making it a reference type, and therefore, String objects (its array of characters) always live in the heap, never on a thread's stack".

Also referring to strings, on an example in the book quite similar to the one above: "The newobj IL instruction constructs a new instance of an object. However, no newobj instruction appears in the IL code example. Instead, you see the special ldstr (load string) IL instruction, which constructs a String object by using a literal string obtained from metadata. This shows you that the common language runtime (CLR) does, in fact, have a special way of constructing literal String objects."

Looking at the IL code, this is clearly the case (only relevant part shown):

[...]
    .locals init (
        [0] int32,
        [1] string,
        [2] bool
    )
    // (no C# code)
    IL_0000: nop
    // int num = 5;
    IL_0001: ldc.i4.5
    IL_0002: stloc.0
    // string text = "ABC";
    IL_0003: ldstr "ABC"
    IL_0008: stloc.1
    // bool flag = false;
[...]

The ldstr IL instruction ensures that "an object reference to a string is pushed onto the stack". Which makes sense - the instance of the string stays on the heap, and the reference to this object (its address) is stored by the variable on the stack.

Now let's set a breakpoint on the line following variable text being declared, start debugging in Visual Studio and then switch to the Disassembly view. Relevant code follows (the full disassembled code is here):

017B0483  nop  
            int i = 5;
017B0484  mov         dword ptr [ebp-40h],5  
            string s = "ABC";
017B048B  mov         eax,dword ptr ds:[429231Ch]  
017B0491  mov         dword ptr [ebp-44h],eax  
            bool b = false;
017B0494  xor         edx,edx  
017B0496  mov         dword ptr [ebp-48h],edx  
        }

Looking specifically at the 2 assembly instructions handling the C# string line, the first one moves the content of the virtual memory at 429231C to the eax register, and the second stores the respective content on the stack, where the s variable lives.

Let's use WinDbg (x86, since the C# code is using the VS' default 32-bit target platform) to look at that specific address, by attaching to the process being debugged by VS, in a non-invasive mode. The content of 429231C above should be a reference to the memory space where the string actually lives. Let's check:

enter image description here

The second command does yield a 41, 42 and 43 in hex, which do represent A, B and C in ASCII; however the order is not all right and might just be a coincidence. (1) It doesn't look as the assembly code for the string line does things right.

If we use VMMap to look at that address: enter image description here

The original address 429231C looks to be within the managed heap. But then (2) why would the content of an address on the heap be brought in as the reference contained within a stack variable, as the assembly code previously looked to indicate ?

The 2 questions I'm asking are (1) and (2). Despite the fact that everything makes sense to me right up to analyzing the IL code, things go downhill fast once I look at the disassembled code for that IL. I tend to think that I'm rather messing something up in my logic (most likely) or I'm hitting some sort of bug in the VS debugger (less likely).

Later Update: As very well pointed out by @madreflection and @Jester, endianness tripped me. The hex representation checks out all right. Only question (2) now remains.

Later Update 2: The comments have been quite insightful, and I think @madreflection puts it best - there's an additional level of indirection - and the reasons for doing this (stated in the comments) start to make sense to me now. A quick diagram is below. I've also checked that both addresses do indeed belong to the managed heap with VMMap.

enter image description here

Later Update 3: Corrected previous diagram.

Periderm answered 5/6, 2019 at 21:32 Comment(1)
Comments are not for extended discussion; this conversation has been moved to chat.Universe

© 2022 - 2024 — McMap. All rights reserved.