Boxing and unboxing with generics
Asked Answered
A

6

75

The .NET 1.0 way of creating collection of integers (for example) was:

ArrayList list = new ArrayList();
list.Add(i);          /* boxing   */
int j = (int)list[0]; /* unboxing */

The penalty of using this is the lack of type safety and performance due to boxing and unboxing.

The .NET 2.0 way is to use generics:

List<int> list = new List<int>();
list.Add(i);
int j = list[0];

The price of boxing (to my understanding) is the need to create an object on the heap, copy the stack allocated integer to the new object and vice-versa for unboxing.

How does the use of generics overcome this? Does the stack-allocated integer stays on the stack and being pointed to from the heap (I guess this is not the case because of what will happen when it will get out of scope)? It seems like there is still a need of copying it somewhere else out of the stack.

What is really going on?

Ampliate answered 9/12, 2010 at 20:59 Comment(0)
J
81

When it comes to collections, generics make it possible to avoid boxing/unboxing by utilizing actual T[] arrays internally. List<T> for example uses a T[] array to store its contents.

The array, of course, is a reference type and is therefore (in the current version of the CLR, yada yada) stored on the heap. But since it's a T[] and not an object[], the array's elements can be stored "directly": that is, they're still on the heap, but they're on the heap in the array instead of being boxed and having the array contain references to the boxes.

So for a List<int>, for example, what you'd have in the array would "look" like this:

[ 1 2 3 ]

Compare this to an ArrayList, which uses an object[] and would therefore "look" something like this:

[ *a *b *c ]

...where *a, etc. are references to objects (boxed integers):

*a -> 1
*b -> 2
*c -> 3

Excuse those crude illustrations; hopefully you know what I mean.

Jablon answered 9/12, 2010 at 21:4 Comment(1)
Just to add why List<int> is stored as an array of values(eg: [1 2]). When the JIT compiler sees a List<int> it re-writes the code to int[], when it sees List<CustomClass>, it rewrites to CustomClass[]. The former(value types) is an array of values where the later is an array of pointers. Where as the ArrayList being a normal class, it stores everything as array of references. This is where storing value types becomes difficult.Aspirate
H
80

Your confusion is a result of misunderstanding what the relationship is between the stack, the heap, and variables. Here's the correct way to think about it.

  • A variable is a storage location that has a type.
  • The lifetime of a variable can either be short or long. By "short" we mean "until the current function returns or throws" and by "long" we mean "possibly longer than that".
  • If the type of a variable is a reference type then the contents of the variable is a reference to a long-lived storage location. If the type of a variable is a value type then the contents of the variable is a value.

As an implementation detail, a storage location which is guaranteed to be short-lived can be allocated on the stack. A storage location which might be long-lived is allocated on the heap. Notice that this says nothing about "value types are always allocated on the stack." Value types are not always allocated on the stack:

int[] x = new int[10];
x[1] = 123;

x[1] is a storage location. It is long-lived; it might live longer than this method. Therefore it must be on the heap. The fact that it contains an int is irrelevant.

You correctly say why a boxed int is expensive:

The price of boxing is the need to create an object on the heap, copy the stack allocated integer to the new object and vice-versa for unboxing.

Where you go wrong is to say "the stack allocated integer". It doesn't matter where the integer was allocated. What matters was that its storage contained the integer, instead of containing a reference to a heap location. The price is the need to create the object and do the copy; that's the only cost that is relevant.

So why isn't a generic variable costly? If you have a variable of type T, and T is constructed to be int, then you have a variable of type int, period. A variable of type int is a storage location, and it contains an int. Whether that storage location is on the stack or the heap is completely irrelevant. What is relevant is that the storage location contains an int, instead of containing a reference to something on the heap. Since the storage location contains an int, you do not have to take on the costs of boxing and unboxing: allocating new storage on the heap and copying the int to the new storage.

Is that now clear?

Hughie answered 9/12, 2010 at 23:15 Comment(1)
Thanks for emphasizing that a value-type variable holds the value, whereas a reference-type variable holds a reference to information held elsewhere. The distinction between value types and reference types isn't whether they are stored on the stack or heap, but whether they're stored "in" the variable. A reference-type automatic variable will typically be stored on the stack, but the information represented by it will be stored on the heap.Wholism
C
4

Generics allows the list's internal array to be typed int[] instead of effectively object[], which would require boxing.

Here's what happens without generics:

  1. You call Add(1).
  2. The integer 1 is boxed into an object, which requires a new object to be constructed on the heap.
  3. This object is passed to ArrayList.Add().
  4. The boxed object is stuffed into an object[].

There are three levels of indirection here: ArrayList -> object[] -> object -> int.

With generics:

  1. You call Add(1).
  2. The int 1 is passed to List<int>.Add().
  3. The int is stuffed into an int[].

So there are only two levels of indirection: List<int> -> int[] -> int.

A few other differences:

  • The non-generic method will require a sum of 8 or 12 bytes (one pointer, one int) to store the value, 4/8 in one allocation and 4 in the other. And this will probably be more due to alignment and padding. The generic method will require only 4 bytes of space in the array.
  • The non-generic method requires allocating a boxed int; the generic method does not. This is faster and reduces GC churn.
  • The non-generic method requires casts to extract values. This is not typesafe and it's a bit slower.
Cherilyncherilynn answered 9/12, 2010 at 21:6 Comment(2)
Your 'levels of indirection' is only logically correct. The object actually stores the int, there's no indirection. Can't really fix this without talking about pointers, tough cookie.Rotunda
@Hans. It does, yes, but the runtime has to extract the value from the object, and this counts as a dereference, since the pointer-to-object needs to be dereferenced. (Same as extracting the field A from a reference to class Foo in this case: class Foo { public int A; }.)Cherilyncherilynn
L
3

An ArrayList only handles the type object so to use this class requires casting to and from object. In the case of value types, this casting involves boxing and unboxing.

When you use a generic list the compiler outputs specialized code for that value type so that the actual values are stored in the list rather than a reference to objects that contain the values. Therefore no boxing is required.

The price of boxing (to my understanding) is the need to create an object on the heap, copy the stack allocated integer to the new object and vice-versa for unboxing.

I think you are assuming that value types are always instantiated on the stack. This is not the case - they can be created either on the heap, on the stack or in registers. For more information about this please see Eric Lippert's article: The Truth About Value Types.

Lingual answered 9/12, 2010 at 21:2 Comment(4)
So the integer is still being copied from the stack to the heap but there in no creation of new objects?Ampliate
AFAIK value types are being allocated on the heap as part of other objects. but a local variables or method parameters they are always being allocated on the stack, am I right?Ampliate
Your are thinking about where the objects are stored. Generics perform better not because of WHERE they store objects but because WHAT is stored inside the collection. See my answer below.Isacco
Why does it matter if they are on the stack of the heap? It really shouldn't concern you either way.Giddings
R
1

In .NET 1, when the Add method is called:

  1. Space is allocated on the heap; a new reference is made
  2. The contents of the i variable is copied into the reference
  3. A copy of the reference is put at the end of the list

In .NET 2:

  1. A copy of the variable i is passed to the Add method
  2. A copy of that copy is put at the end of the list

Yes, the i variable is still copied (after all, it's a value type, and value types are always copied - even if they're just method parameters). But there's no redundant copy made on the heap.

Realize answered 9/12, 2010 at 21:3 Comment(0)
I
1

Why are you thinking in terms of WHERE the values\objects are stored? In C# value types can be stored on stack as well as heap depending upon what the CLR chooses.

Where generics make a difference is WHAT is stored in the collection. In case of ArrayList the collection contains references to boxed objects where as the List<int> contains int values themselves.

Isacco answered 9/12, 2010 at 21:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.