Arrays, heap and stack and value types
Asked Answered
S

8

155
int[] myIntegers;
myIntegers = new int[100];

In the above code, is new int[100] generating the array on the heap? From what I've read on CLR via c#, the answer is yes. But what I can't understand, is what happens to the actual int's inside the array. As they are value types, I'd guess they'd have to be boxed, as I can, for example, pass myIntegers to other parts of the program and it'd clutter up the stack if they were left on it all the time. Or am I wrong? I'd guess they'd just be boxed and would live on the heap for as long the array existed.

Shorts answered 11/7, 2009 at 14:30 Comment(0)
R
339

Your array is allocated on the heap, and the ints are not boxed.

The source of your confusion is likely because people have said that reference types are allocated on the heap, and value types are allocated on the stack. This is not an entirely accurate representation.

All local variables and parameters are allocated on the stack. This includes both value types and reference types. The difference between the two is only what is stored in the variable. Unsurprisingly, for a value type, the value of the type is stored directly in the variable, and for a reference type, the value of the type is stored on the heap, and a reference to this value is what is stored in the variable.

The same holds for fields. When memory is allocated for an instance of an aggregate type (a class or a struct), it must include storage for each of its instance fields. For reference-type fields, this storage holds just a reference to the value, which would itself be allocated on the heap later. For value-type fields, this storage holds the actual value.

So, given the following types:

class RefType{
    public int    I;
    public string S;
    public long   L;
}

struct ValType{
    public int    I;
    public string S;
    public long   L;
}

The values of each of these types would require 16 bytes of memory (assuming a 32-bit word size). The field I in each case takes 4 bytes to store its value, the field S takes 4 bytes to store its reference, and the field L takes 8 bytes to store its value. So the memory for the value of both RefType and ValType looks like this:

 0 ┌───────────────────┐
   │        I          │
 4 ├───────────────────┤
   │        S          │
 8 ├───────────────────┤
   │        L          │
   │                   │
16 └───────────────────┘

Now if you had three local variables in a function, of types RefType, ValType, and int[], like this:

RefType refType;
ValType valType;
int[]   intArray;

then your stack might look like this:

 0 ┌───────────────────┐
   │     refType       │
 4 ├───────────────────┤
   │     valType       │
   │                   │
   │                   │
   │                   │
20 ├───────────────────┤
   │     intArray      │
24 └───────────────────┘

If you assigned values to these local variables, like so:

refType = new RefType();
refType.I = 100;
refType.S = "refType.S";
refType.L = 0x0123456789ABCDEF;

valType = new ValType();
valType.I = 200;
valType.S = "valType.S";
valType.L = 0x0011223344556677;

intArray = new int[4];
intArray[0] = 300;
intArray[1] = 301;
intArray[2] = 302;
intArray[3] = 303;

Then your stack might look something like this:

 0 ┌───────────────────┐
   │    0x4A963B68     │ -- heap address of `refType`
 4 ├───────────────────┤
   │       200         │ -- value of `valType.I`
   │    0x4A984C10     │ -- heap address of `valType.S`
   │    0x44556677     │ -- low 32-bits of `valType.L`
   │    0x00112233     │ -- high 32-bits of `valType.L`
20 ├───────────────────┤
   │    0x4AA4C288     │ -- heap address of `intArray`
24 └───────────────────┘

Memory at address 0x4A963B68 (value of refType) would be something like:

 0 ┌───────────────────┐
   │       100         │ -- value of `refType.I`
 4 ├───────────────────┤
   │    0x4A984D88     │ -- heap address of `refType.S`
 8 ├───────────────────┤
   │    0x89ABCDEF     │ -- low 32-bits of `refType.L`
   │    0x01234567     │ -- high 32-bits of `refType.L`
16 └───────────────────┘

Memory at address 0x4AA4C288 (value of intArray) would be something like:

 0 ┌───────────────────┐
   │        4          │ -- length of array
 4 ├───────────────────┤
   │       300         │ -- `intArray[0]`
 8 ├───────────────────┤
   │       301         │ -- `intArray[1]`
12 ├───────────────────┤
   │       302         │ -- `intArray[2]`
16 ├───────────────────┤
   │       303         │ -- `intArray[3]`
20 └───────────────────┘

Now, if you passed intArray to another function, the value pushed onto the stack would be 0x4AA4C288, the address of the array, not a copy of the array.

Reflexive answered 11/7, 2009 at 17:3 Comment(8)
I note that the statement that all local variables are stored on the stack is inaccurate. Local variables that are outer variables of an anonymous function are stored on the heap. Local variables of iterator blocks are stored on the heap. Local variables of async blocks are stored on the heap. Local variables that are enregistered are stored on neither the stack nor the heap. Local variables that are elided are stored on neither the stack nor the heap.Gladiator
LOL, always the nit-picker, Mr. Lippert. :) I feel compelled to point out that with the exception of your latter two cases, the so-called "locals" cease to be locals at compile time. The implementation raises them to the status of class members, which is the only reason they get stored on the heap. So it's merely an implementation detail (snicker). Of course, register storage is an even lower-level implementation detail, and elision doesn't count.Reflexive
Of course, my entire post is implementation details, but, as I'm sure you realize, it was all in attempt to separate the concepts of variables and values. A variable (call it a local, a field, a parameter, whatever) can be stored on the stack, the heap, or some other implementation-defined place, but that's not really what's important. What's important, is whether that variable directly stores the value it represents, or simply a reference to that value, stored elsewhere. It's important because it affects copy semantics: whether copying that variable copies its value or its address.Reflexive
@EricLippert: There. Now you're tagged, so you'll see my response.Reflexive
Apparently you have a different idea of what it means to be a "local variable" than I do. You seem to believe that a "local variable" is characterized by its implementation details. This belief is not justified by anything I'm aware of in the C# specification. A local variable is in fact a variable declared inside a block whose name is in scope only throughout the declaration space associated with the block. I assure you that local variables that are, as an implementation detail, hoisted to fields of a closure class, are still local variables according to the rules of C#.Gladiator
That said, of course your answer is generally excellent; the point that values are conceptually different from variables is one that needs to be made as often and as loudly as possible, since it is fundamental. And yet a great many people believe the strangest myths about them! So good on you for fighting the good fight.Gladiator
Also can you explain me what happens when int [] temp = new int[10];intArray.CopyTo(temp, 0);intArray=temp...Now is new reference is created or just the reference is changedStogy
Your statement: "people have said that reference types are allocated on the heap and value types are allocated on the stack. This is not an entirely accurate representation." is right on the money. For years I bent the ear of everyone at Microsoft whenever the occasion arose as I felt that in addition to being inaccurate, this statement was not the essence of the difference between value types and reference types, caused confusion among practitioners, and created grist for a false fact that certification test takers would be responsible for regurgitating.Toddtoddie
I
25

Yes the array will be located on the heap.

The ints inside the array will not be boxed. Just because a value type exists on the heap, does not necessarily mean it will be boxed. Boxing will only occur when a value type, such as int, is assigned to a reference of type object.

For example

Does not box:

int i = 42;
myIntegers[0] = 42;

Boxes:

object i = 42;
object[] arr = new object[10];  // no boxing here 
arr[0] = 42;

You may also want to check out Eric's post on this subject:

Inalienable answered 11/7, 2009 at 14:35 Comment(8)
But I don't get it. Shouldn't value types be allocated on the stack? Or both value and reference types can be allocated both on heap or stack and it's just that they usually are just stored in one place or other?Shorts
@Jorge, a value type with no reference type wrapper / container will live on the stack. However once it's used within a reference type container it will live in the heap. An array is a reference type and hence the memory for the int must be in the heap.Inalienable
@Jorge: reference types live only in the heap, never on the stack. Contrariwise, it is impossible (in verifiable code) to store a pointer to a stack location into an object of a reference type.Fibster
I think that you meant to assign i to arr[0]. The constant assignment will still cause boxing of "42", but you created i, so you may as well use it ;-)Nefertiti
@AntonTykhyy: There's no rule i'm aware of saying a CLR can't do escape analysis. If it detects that an object will never be referenced past the lifetime of the function that created it, it's entirely legitimate -- and even preferable -- to construct the object on the stack, whether it's a value type or not. "Value type" and "reference type" basically describe what's at the memory taken up by the variable, not a hard and fast rule on where the object lives.Barley
@cHao: possibly, but the benefit is small. Gen0 allocation is as fast as incrementing a pointer, collections are very fast too. On the other hand, escape analysis is notoriously complex, and with the proliferation of closures (iterators, lambdas, async/await) most objects will have to be allocated on the heap anyway. On balance, it is hardly worth the effort of the CLR team.Fibster
@AntonTykhyy: Just pointing out that "only in the heap, never on the stack" may not always be true. Whether it's true, or will be in the future, is getting even deeper into implementation-specific details than caring about heap vs stack in the first place. Other CLR implementations may see a benefit.Barley
@Inalienable Your answer here is so crucial because the whole idea was clear in my mind except that int[] is value type why it is on the heap and you gave the answer thanks.Busywork
P
25

To understand what's happening, here are some facts:

  • Object are always allocated on the heap.
  • The heap only contains objects.
  • Value types are either allocated on the stack, or part of an object on the heap.
  • An array is an object.
  • An array can only contain value types.
  • An object reference is a value type.

So, if you have an array of integers, the array is allocated on the heap and the integers that it contains is part of the array object on the heap. The integers reside inside the array object on the heap, not as separate objects, so they are not boxed.

If you have an array of strings, it's really an array of string references. As references are value types they will be part of the array object on the heap. If you put a string object in the array, you actually put the reference to the string object in the array, and the string is a separate object on the heap.

Pocketknife answered 11/7, 2009 at 15:49 Comment(4)
Yes, references behave exactly like value types but I noticed they are usually not called that way, or included in the value types. See for instance (but there are much more like this) msdn.microsoft.com/en-us/library/s1ax56ch.aspxFrancie
@Henk: Yes, you are right that references is not listed among value type variables, but when it comes to how memory is allocated for them they are in every respect value types, and it's very useful to realise that to understand how the memory allocation all fits together. :)Pocketknife
I doubt the 5th point, "An array can only contain value types." What about string array ? string[] strings = new string[4];Ladino
"If you have an array of strings, it's really an array of string references" but for int[] it just keeps the reference of in[], am I right?Busywork
M
12

I think at the core of your question lies a misunderstanding about reference and value types. This is something probably every .NET and Java developer struggled with.

An array is just a list of values. If it's an array of a reference type (say a string[]) then the array is a list of references to various string objects on the heap, as a reference is the value of a reference type. Internally, these references are implemented as pointers to an address in memory. If you wish to visualize this, such an array would look like this in memory (on the heap):

[ 00000000, 00000000, 00000000, F8AB56AA ]

This is an array of string that contains 4 references to string objects on the heap (the numbers here are hexadecimal). Currently, only the last string actually points to anything (memory is initialized to all zero's when allocated), this array would basically be the result of this code in C#:

string[] strings = new string[4];
strings[3] = "something"; // the string was allocated at 0xF8AB56AA by the CLR

The above array would be in a 32 bit program. In a 64 bit program, the references would be twice as big (F8AB56AA would be 00000000F8AB56AA).

If you have an array of value types (say an int[]) then the array is a list of integers, as the value of a value type is the value itself (hence the name). The visualization of such an array would be this:

[ 00000000, 45FF32BB, 00000000, 00000000 ]

This is an array of 4 integers, where only the second int is assigned a value (to 1174352571, which is the decimal representation of that hexadecimal number) and the rest of the integers would be 0 (like I said, memory is initialized to zero and 00000000 in hexadecimal is 0 in decimal). The code that produced this array would be:

 int[] integers = new int[4];
 integers[1] = 1174352571; // integers[1] = 0x45FF32BB would be valid too

This int[] array would also be stored on the heap.

As another example, the memory of a short[4] array would look like this:

[ 0000, 0000, 0000, 0000 ]

As the value of a short is a 2 byte number.

Where a value type is stored, is just an implementation detail as Eric Lippert explains very well here, not inherent to the differences between value and reference types (which is difference in behavior).

When you pass something to a method (be that a reference type or a value type) then a copy of the value of the type is actually passed to the method. In the case of a reference type, the value is a reference (think of this as a pointer to a piece of memory, although that also is an implementation detail) and in the case of a value type, the value is the thing itself.

// Calling this method creates a copy of the *reference* to the string
// and a copy of the int itself, so copies of the *values*
void SomeMethod(string s, int i){}

Boxing only occurs if you convert a value type to a reference type. This code boxes:

object o = 5;
Mala answered 11/7, 2009 at 16:57 Comment(1)
I believe "an implementation detail" should be a font-size: 50px. ;)Esquivel
E
4

These are illustrations depicting above answer by @P Daddy

enter image description here

enter image description here

And I illustrated the corresponding contents in my style.

enter image description here

Ethben answered 6/12, 2017 at 1:0 Comment(3)
@P Daddy I made illustrations. Please check if there's wrong part. And I have some additional questions. 1. When I create 4 length int type array, the length information(4) is also always stored in the memory?Ethben
2. On second illustration, copied array address is stored where? Is it same stack area in which intArray address is stored? Is it other stack but same kind of stack? Is it different kind of stack? 3. What does low 32-bits/high 32-bits mean? 4. What's return value when I allocate value type (in this example, structure) on the stack by using new keyword? Is it also the address? When I was checking by this statement Console.WriteLine(valType), it would show the fully qualified name like object like ConsoleApp.ValType.Ethben
5. valType.I=200; Does this statement mean I get the address of valType, by this address I access to the I and right there I store 200 but "on the stack".Ethben
F
3

Enough has been said by everybody, but if someone is looking for a clear (but non-official) sample and documentation about heap, stack, local variables, and static variables, refer the complete Jon Skeet's article on Memory in .NET - what goes where

Excerpt:

  1. Each local variable (ie one declared in a method) is stored on the stack. That includes reference type variables - the variable itself is on the stack, but remember that the value of a reference type variable is only a reference (or null), not the object itself. Method parameters count as local variables too, but if they are declared with the ref modifier, they don't get their own slot, but share a slot with the variable used in the calling code. See my article on parameter passing for more details.

  2. Instance variables for a reference type are always on the heap. That's where the object itself "lives".

  3. Instance variables for a value type are stored in the same context as the variable that declares the value type. The memory slot for the instance effectively contains the slots for each field within the instance. That means (given the previous two points) that a struct variable declared within a method will always be on the stack, whereas a struct variable which is an instance field of a class will be on the heap.

  4. Every static variable is stored on the heap, regardless of whether it's declared within a reference type or a value type. There is only one slot in total no matter how many instances are created. (There don't need to be any instances created for that one slot to exist though.) The details of exactly which heap the variables live on are complicated, but explained in detail in an MSDN article on the subject.

Fichtean answered 9/4, 2013 at 7:43 Comment(2)
Your "what goes where" link is dead.Kearns
I can't edit it atm, the correct link to Skeet's article is this: jonskeet.uk/csharp/memory.htmlHomologue
P
1

An array of integers is allocated on the heap, nothing more, nothing less. myIntegers references to the start of the section where the ints are allocated. That reference is located on the stack.

If you have a array of reference type objects, like the Object type, myObjects[], located on the stack, would reference to the bunch of values which reference the objects themselfes.

To summarize, if you pass myIntegers to some functions, you only pass the reference to the place where the real bunch of integers is allocated.

Polyphonic answered 11/7, 2009 at 14:38 Comment(0)
F
1

There is no boxing in your example code.

Value types can live on the heap as they do in your array of ints. The array is allocated on the heap and it stores ints, which happen to be value types. The contents of the array are initialized to default(int), which happens to be zero.

Consider a class that contains a value type:


    class HasAnInt
    {
        int i;
    }

    HasAnInt h = new HasAnInt();

Variable h refers to an instance of HasAnInt that lives on the heap. It just happens to contain a value type. That's perfectly okay, 'i' just happens to live on the heap as it's contained in a class. There is no boxing in this example either.

Fresco answered 11/7, 2009 at 15:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.