Why do we need boxing and unboxing in C#?
Asked Answered
B

12

394

Why do we need boxing and unboxing in C#?

I know what boxing and unboxing is, but I can't comprehend the real use of it. Why and where should I use it?

short s = 25;

object objshort = s;  //Boxing

short anothershort = (short)objshort;  //Unboxing
Bootee answered 21/1, 2010 at 18:36 Comment(0)
S
542

Why

To have a unified type system and allow value types to have a completely different representation of their underlying data from the way that reference types represent their underlying data (e.g., an int is just a bucket of thirty-two bits which is completely different than a reference type).

Think of it like this. You have a variable o of type object. And now you have an int and you want to put it into o. o is a reference to something somewhere, and the int is emphatically not a reference to something somewhere (after all, it's just a number). So, what you do is this: you make a new object that can store the int and then you assign a reference to that object to o. We call this process "boxing."

So, if you don't care about having a unified type system (i.e., reference types and value types have very different representations and you don't want a common way to "represent" the two) then you don't need boxing. If you don't care about having int represent their underlying value (i.e., instead have int be reference types too and just store a reference to their underlying value) then you don't need boxing.

where should I use it.

For example, the old collection type ArrayList only eats objects. That is, it only stores references to somethings that live somewhere. Without boxing you cannot put an int into such a collection. But with boxing, you can.

Now, in the days of generics you don't really need this and can generally go merrily along without thinking about the issue. But there are a few caveats to be aware of:

This is correct:

double e = 2.718281828459045;
int ee = (int)e;

This is not:

double e = 2.718281828459045;
object o = e; // box
int ee = (int)o; // runtime exception

Instead you must do this:

double e = 2.718281828459045;
object o = e; // box
int ee = (int)(double)o;

First we have to explicitly unbox the double ((double)o) and then cast that to an int.

What is the result of the following:

double e = 2.718281828459045;
double d = e;
object o1 = d;
object o2 = e;
Console.WriteLine(d == e);
Console.WriteLine(o1 == o2);

Think about it for a second before going on to the next sentence.

If you said True and False great! Wait, what? That's because == on reference types uses reference-equality which checks if the references are equal, not if the underlying values are equal. This is a dangerously easy mistake to make. Perhaps even more subtle

double e = 2.718281828459045;
object o1 = e;
object o2 = e;
Console.WriteLine(o1 == o2);

will also print False!

Better to say:

Console.WriteLine(o1.Equals(o2));

which will then, thankfully, print True.

One last subtlety:

[struct|class] Point {
    public int x, y;

    public Point(int x, int y) {
        this.x = x;
        this.y = y;
    }
}

Point p = new Point(1, 1);
object o = p;
p.x = 2;
Console.WriteLine(((Point)o).x);

What is the output? It depends! If Point is a struct then the output is 1 but if Point is a class then the output is 2! A boxing conversion makes a copy of the value being boxed explaining the difference in behavior.

Slashing answered 21/1, 2010 at 18:43 Comment(4)
@Jason Do you mean to say that if we have primitive lists, there is No reason to use any boxing/unboxing?Sells
I'm not sure what you mean by "primitive list."Slashing
Could you please talk to the performance impact of boxing and unboxing?Daff
@KevinMeredith there is a basic explanation about performance for boxing and unboxing operations in msdn.microsoft.com/en-us/library/ms173196.aspxBriannabrianne
D
75

In the .NET framework, there are two species of types--value types and reference types. This is relatively common in OO languages.

One of the important features of object oriented languages is the ability to handle instances in a type-agnostic manner. This is referred to as polymorphism. Since we want to take advantage of polymorphism, but we have two different species of types, there has to be some way to bring them together so we can handle one or the other the same way.

Now, back in the olden days (1.0 of Microsoft.NET), there weren't this newfangled generics hullabaloo. You couldn't write a method that had a single argument that could service a value type and a reference type. That's a violation of polymorphism. So boxing was adopted as a means to coerce a value type into an object.

If this wasn't possible, the framework would be littered with methods and classes whose only purpose was to accept the other species of type. Not only that, but since value types don't truly share a common type ancestor, you'd have to have a different method overload for each value type (bit, byte, int16, int32, etc etc etc).

Boxing prevented this from happening. And that's why the British celebrate Boxing Day.

Depreciable answered 21/1, 2010 at 18:53 Comment(1)
Before generics, auto-boxing was necessary to do many things; given the existence of generics, however, were it not for the need to maintain compatibility with old code, I think .net would be better off without implied boxing conversions. Casting a value type like List<string>.Enumerator to IEnumerator<string> yields an object which mostly behaves like a class type, but with a broken Equals method. A better way to cast List<string>.Enumerator to IEnumerator<string> would be to call a custom conversion operator, but the existence of an implied conversion prevents that.Attestation
A
70

The best way to understand this is to look at lower-level programming languages C# builds on.

In the lowest-level languages like C, all variables go one place: The Stack. Each time you declare a variable it goes on the Stack. They can only be primitive values, like a bool, a byte, a 32-bit int, a 32-bit uint, etc. The Stack is both simple and fast. As variables are added they just go one on top of another, so the first you declare sits at say, 0x00, the next at 0x01, the next at 0x02 in RAM, etc. In addition, variables are often pre-addressed at compile-time, so their address is known before you even run the program.

In the next level up, like C++, a second memory structure called the Heap is introduced. You still mostly live in the Stack, but special ints called Pointers can be added to the Stack, that store the memory address for the first byte of an Object, and that Object lives in the Heap. The Heap is kind of a mess and somewhat expensive to maintain, because unlike Stack variables they don't pile linearly up and then down as a program executes. They can come and go in no particular sequence, and they can grow and shrink.

Dealing with pointers is hard. They're the cause of memory leaks, buffer overruns, and frustration. C# to the rescue.

At a higher level, C#, you don't need to think about pointers - the .Net framework (written in C++) thinks about these for you and presents them to you as References to Objects, and for performance, lets you store simpler values like bools, bytes and ints as Value Types. Underneath the hood, Objects and stuff that instantiates a Class go on the expensive, Memory-Managed Heap, while Value Types go in that same Stack you had in low-level C - super-fast.

For the sake of keeping the interaction between these 2 fundamentally different concepts of memory (and strategies for storage) simple from a coder's perspective, Value Types can be Boxed at any time. Boxing causes the value to be copied from the Stack, put in an Object, and placed on the Heap - more expensive, but, fluid interaction with the Reference world. As other answers point out, this will occur when you for example say:

bool b = false; // Cheap, on Stack
object o = b; // Legal, easy to code, but complex - Boxing!
bool b2 = (bool)o; // Unboxing!

A strong illustration of the advantage of Boxing is a check for null:

if (b == null) // Will not compile - bools can't be null
if (o == null) // Will compile and always return false

Our object o is technically an address in the Stack that points to a copy of our bool b, which has been copied to the Heap. We can check o for null because the bool's been Boxed and put there.

In general you should avoid Boxing unless you need it, for example to pass an int/bool/whatever as an object to an argument. There are some basic structures in .Net that still demand passing Value Types as object (and so require Boxing), but for the most part you should never need to Box.

A non-exhaustive list of historical C# structures that require Boxing, that you should avoid:

  • The Event system turns out to have a Race Condition in naive use of it, and it doesn't support async. Add in the Boxing problem and it should probably be avoided. (You could replace it for example with an async event system that uses Generics.)

  • The old Threading and Timer models forced a Box on their parameters but have been replaced by async/await which are far cleaner and more efficient.

  • The .Net 1.1 Collections relied entirely on Boxing, because they came before Generics. These are still kicking around in System.Collections. In any new code you should be using the Collections from System.Collections.Generic, which in addition to avoiding Boxing also provide you with stronger type-safety.

You should avoid declaring or passing your Value Types as objects, unless you have to deal with the above historical problems that force Boxing, and you want to avoid the performance hit of Boxing it later when you know it's going to be Boxed anyway.

Per Mikael's suggestion below:

Do This

using System.Collections.Generic;

var employeeCount = 5;
var list = new List<int>(10);

Not This

using System.Collections;

Int32 employeeCount = 5;
var list = new ArrayList(10);

Update

This answer originally suggested Int32, Bool etc cause boxing, when in fact they are simple aliases for Value Types. That is, .Net has types like Bool, Int32, String, and C# aliases them to bool, int, string, without any functional difference.

Avant answered 12/2, 2016 at 21:38 Comment(10)
You taught me what a hundred programmers and IT-professionals couldn´t explain in years, but change it to say what you should do instead of what to avoid, because it got a bit hard to follow.. ground rules most often doesnt go 1. you shall not do this, instead do thisGeerts
I've been working in C# for quite some time but I began with C++. This helped me bridge my understanding of pointers in C++ to boxing/unboxing in C#.Zorazorah
there is no "Int" in c#, there is int and Int32. i believe you are wrong in stating one is a value type and the other is a reference type wrapping the value type. unless i am mistaken, that is true in Java, but not C#. In C# the ones that show up blue in the IDE are aliases for their struct definition. So: int = Int32, bool = Boolean, string = String. The reason to use bool over Boolean is because it is suggested as so in the MSDN design guidelines and conventions. Otherwise i love this answer. But I will down vote until you prove me wrong or fix that in your answer.Glissando
If you declare a variable as int and another as Int32, or bool and Boolean - right click and view definition, you will end up in the same definition for a struct.Glissando
@HeribertoLugo is correct, the line "You should avoid declaring your Value Types as Bool instead of bool" is mistaken. As OP points out you should avoid declaring your bool (or Boolean, or any other value-type) as Object. bool/Boolean, int/Int32, are just aliases between C# and .NET: learn.microsoft.com/en-us/dotnet/csharp/language-reference/…Slumlord
"In the next level up, like C++, a second memory structure called the Heap is introduced" ...I don't think this is correct. C also allocates on the heap using malloc instead of new? I don't think the heap is being introduced in C++.Vasectomy
I'm trying to quickly summarize a lot here; yes it's true you can build the bare basics of more advanced structures in C, including access the heap using malloc - and managing it can get messy very quickly. But you'll never dump things there implicitly like you will in C++ and C#, and even somewhat simple C# interaction with the heap is a ton of complexity in C. The goal of this answer is to allow people who started with C# to understand what boxing is and why you'd have it, without providing an exhaustive history of Computer Science or study of C or C++ or other languagesAvant
@ChrisMoschini without using new/malloc/variant in c++ either directly or indirectly (which is analagous to what can happen in C), in what way in c++ do you uniquely allocate heap memory? In c, if I call some library function f, it could internally (implicitly) allocate memory. If I call a function to give me a pointer to struct S I do not know where that struct is located in memory (and this is often done for example where the pointed to data is part of a small singleton set). I am a c++ rather than a c programmer but to me c uses the heap too. Also c still works the same way so not historyDichromatic
@ccpgh In C++, there's syntax that interacts with the heap for you, without explicitly saying, go interact with the heap. Whereas in C you've got to do so explicitly, or you won't use it at all. Hiding the calls in libraries doesn't change the fact that C on its own won't do so, while C++ will.Avant
@ChrisMoschini as I pointed out in my post that is the same as C. If I call a library function in C it can interact with the heap as well. Both languages interact with the heap without the programmer explicitly doing so.Dichromatic
S
27

Boxing isn't really something that you use - it is something the runtime uses so that you can handle reference and value types in the same way when necessary. For example, if you used an ArrayList to hold a list of integers, the integers got boxed to fit in the object-type slots in the ArrayList.

Using generic collections now, this pretty much goes away. If you create a List<int>, there is no boxing done - the List<int> can hold the integers directly.

Sacker answered 21/1, 2010 at 18:43 Comment(2)
You still need boxing for things like composite string formatting. You might not see it as often when using generics, but it's definitely still there.Frankish
true - it shows up all the time in ADO.NET too - sql parameter values are all 'object's no matter what the real data type isSacker
S
15

Boxing and Unboxing are specifically used to treat value-type objects as reference-type; moving their actual value to the managed heap and accessing their value by reference.

Without boxing and unboxing you could never pass value-types by reference; and that means you could not pass value-types as instances of Object.

Slumlord answered 21/1, 2010 at 18:42 Comment(1)
Pass by reference of numeric types exists in languages without boxing, and other languages implement treating value types as instances of Object without boxing and moving the value to the heap (e.g. implementations of dynamic languages where pointers are aligned to 4 byte boundaries use the lower four bits of references to indicate that the value is a integer or symbol rather than than a full object; such value types are immutable and the same size as a pointer).Washko
N
8

The last place I had to unbox something was when writing some code that retrieved some data from a database (I wasn't using LINQ to SQL, just plain old ADO.NET):

int myIntValue = (int)reader["MyIntValue"];

Basically, if you're working with older APIs before generics, you'll encounter boxing. Other than that, it isn't that common.

Northward answered 21/1, 2010 at 18:43 Comment(0)
S
3

Boxing is required, when we have a function that needs object as a parameter, but we have different value types that need to be passed, in that case we need to first convert value types to object data types before passing it to the function.

I don't think that is true, try this instead:

class Program
    {
        static void Main(string[] args)
        {
            int x = 4;
            test(x);
        }

        static void test(object o)
        {
            Console.WriteLine(o.ToString());
        }
    }

That runs just fine, I didn't use boxing/unboxing. (Unless the compiler does that behind the scenes?)

Sprage answered 17/10, 2011 at 5:1 Comment(1)
That is because everything inherits from System.Object, and you are giving the method an object with extra information, so basically you are calling the test method with what it is expecting and anything it might expect since it doesn´t expect anything in particular. A lot in .NET is done behind the scenes, and the reason why it is a very simple language to useGeerts
A
2

In .net, every instance of Object, or any type derived therefrom, includes a data structure which contains information about its type. "Real" value types in .net do not contain any such information. To allow data in value types to be manipulated by routines that expect to receive types derived from object, the system automatically defines for each value type a corresponding class type with the same members and fields. Boxing creates a new instances of this class type, copying the fields from a value type instance. Unboxing copies the fields from an instance of the class type to an instance of the value type. All of the class types which are created from value types are derived from the ironically named class ValueType (which, despite its name, is actually a reference type).

Attestation answered 13/9, 2011 at 6:43 Comment(0)
R
1

Boxing is the conversion of a value to a reference type with the data at some offset in an object on the heap.

As for what boxing actually does. Here are some examples

Mono C++

void* mono_object_unbox (MonoObject *obj)
 {    
MONO_EXTERNAL_ONLY_GC_UNSAFE (void*, mono_object_unbox_internal (obj));
 }

#define MONO_EXTERNAL_ONLY_GC_UNSAFE(t, expr) \
    t result;       \
    MONO_ENTER_GC_UNSAFE;   \
    result = expr;      \
    MONO_EXIT_GC_UNSAFE;    \
    return result;

static inline gpointer
mono_object_unbox_internal (MonoObject *obj)
{
    /* add assert for valuetypes? */
    g_assert (m_class_is_valuetype (mono_object_class (obj)));
    return mono_object_get_data (obj);
}

static inline gpointer
mono_object_get_data (MonoObject *o)
{
    return (guint8*)o + MONO_ABI_SIZEOF (MonoObject);
}

#define MONO_ABI_SIZEOF(type) (MONO_STRUCT_SIZE (type))
#define MONO_STRUCT_SIZE(struct) MONO_SIZEOF_ ## struct
#define MONO_SIZEOF_MonoObject (2 * MONO_SIZEOF_gpointer)

typedef struct {
    MonoVTable *vtable;
    MonoThreadsSync *synchronisation;
} MonoObject;

Unboxing an object in Mono is a process of casting a pointer at an offset of 2 gpointers in the object (e.g. 16 bytes). A gpointer is a void*. This makes sense when looking at the definition of MonoObject as it's clearly just a header for the data.

C++

To box a value in C++ you could do something like:

#include <iostream>
#define Object void*

template<class T> Object box(T j){
  return new T(j);
}

template<class T> T unbox(Object j){
  T temp = *(T*)j;
  delete j;
  return temp;
}

int main() {
  int j=2;
  Object o = box(j);
  int k = unbox<int>(o);
  std::cout << k;
}
Ridings answered 1/6, 2020 at 16:24 Comment(0)
P
0

When a method only takes a reference type as a parameter (say a generic method constrained to be a class via the new constraint), you will not be able to pass a reference type to it and have to box it.

This is also true for any methods that take object as a parameter - this will have to be a reference type.

Picoline answered 21/1, 2010 at 18:44 Comment(0)
B
0

In general, you typically will want to avoid boxing your value types.

However, there are rare occurances where this is useful. If you need to target the 1.1 framework, for example, you will not have access to the generic collections. Any use of the collections in .NET 1.1 would require treating your value type as a System.Object, which causes boxing/unboxing.

There are still cases for this to be useful in .NET 2.0+. Any time you want to take advantage of the fact that all types, including value types, can be treated as an object directly, you may need to use boxing/unboxing. This can be handy at times, since it allows you to save any type in a collection (by using object instead of T in a generic collection), but in general, it is better to avoid this, as you're losing type safety. The one case where boxing frequently occurs, though, is when you're using Reflection - many of the calls in reflection will require boxing/unboxing when working with value types, since the type is not known in advance.

Beria answered 31/10, 2013 at 20:14 Comment(0)
N
0

Boxing happens when a value type is passed to a variable or parameter with a type of object. Since it happens automatically, the question is not when you should use boxing, but rather when you should use the type object.

The type object should only be used when it is absolutely necessary, since it circumvents the type safety which is otherwise a major benefit of a statically typed language like C#. But it may be necessary in cases where it is not possible to know the type of a value at compile time.

For example when reading a database field value through the ADO.NET framework. The returned value could be either an integer or a string or something else, so the type has to be object, and the client code has to perform the appropriate casting. To avoid this problem, ORM frameworks like Linq-to-SQL or EF Core use statically typed entities instead, so the use of object is avoided.

Before the introduction of generics, collections like ArrayList had the items types as object. This meant you could store anything in a list, and you could add a string to a list of numbers, without the type system complaining. Generics solve this problem and make boxing unnecessary when using collections of value types.

So typing something as object is rarely needed, and you want to avoid it. Generics is typically a better solution in cases where code needs to be able to handle both value types and reference types.

Natelson answered 18/11, 2021 at 10:43 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.