Boxing / Unboxing Nullable Types - Why this implementation?

Asked 7/9, 2009 at 4:51 Answered 19/11, 2011 at 19:35

Extract from CLR via C# on Boxing / Unboxing value types ...

On Boxing: If the nullable instance is not null, the CLR takes the value out of the nullable instance and boxes it. In other words a Nullable < Int32 > with a value of 5 is boxed into a boxed-Int32 with a value of 5.

On Unboxing: Unboxing is simply the act of obtaining a reference to the unboxed portion of a boxed object. The problem is that a boxed value type cannot be simply unboxed into a nullable version of that value type because the boxed value doesn't have the boolean hasValue field in it. So, when unboxing a value type into a nullable version, the CLR must allocate a Nullable < T > object, initialize the hasValue field to true, and set the value field to the same value that is in the boxed value type. This impacts your application performance (memory allocation during unboxing).

Why did the CLR team go through so much trouble for Nullable types ? Why was it not simply boxed into a Nullable < Int32 > in the first place ?

Damarisdamarra answered 7/9, 2009 at 4:51 Comment(3)

"Memory allocation during unboxing" What??? The book is definitely wrong on that regard. – Endymion 3/11, 2014 at 22:45

@BenVoigt Unboxing into a nullable type is about 20 × slower that unboxing into a normal type, but only if it is a regular casting, the as operator is not slower than regular unboxing. – Cawthon 4/11, 2014 at 0:12

Behavior is currently described on learn.microsoft.com/en-us/dotnet/csharp/programming-guide/… – Drachm 11/4, 2019 at 1:20

I remember this behavior was kind of last minute change. In early betas of .NET 2.0, Nullable<T> was a "normal" value type. Boxing a null valued int? turned it into a boxed int? with a boolean flag. I think the reason they decided to choose the current approach is consistency. Say:

int? test = null;
object obj = test;
if (test != null)
   Console.WriteLine("test is not null");
if (obj != null)
   Console.WriteLine("obj is not null");

In the former approach (box null -> boxed Nullable<T>), you wouldn't get "test is not null" but you'd get "object is not null" which is weird.

Additionally, if they had boxed a nullable value to a boxed-Nullable<T>:

int? val = 42;
object obj = val;

if (obj != null) {
   // Our object is not null, so intuitively it's an `int` value:
   int x = (int)obj; // ...but this would have failed. 
}

Beside that, I believe the current behavior makes perfect sense for scenarios like nullable database values (think SQL-CLR...)

Clarification:

The whole point of providing nullable types is to make it easy to deal with variables that have no meaningful value. They didn't want to provide two distinct, unrelated types. An int? should behaved more or less like a simple int. That's why C# provides lifted operators.

So, when unboxing a value type into a nullable version, the CLR must allocate a Nullable<T> object, initialize the hasValue field to true, and set the value field to the same value that is in the boxed value type. This impacts your application performance (memory allocation during unboxing).

This is not true. The CLR would have to allocates memory on stack to hold the variable whether or not it's nullable. There's not a performance issue to allocate space for an extra boolean variable.

Awfully answered 7/9, 2009 at 4:59 Comment(13)

From what I understand, in the current implementation (Step 1) if test is null, the CLR does not box anything and returns null. (Step 2) If the nullable instance is not null, it boxes it into a boxed-int32. Doesn't Step 1 solve the "obj is not null" problem? Why did they have to do step 2 ? Sorry, but I seem to be missing something. – Damarisdamarra 7/9, 2009 at 5:13

Preets: You mean they would box null to a null reference and box int? x = 4; to boxed-Nullable<int>? – Awfully 7/9, 2009 at 5:17

Umm.. yeah.. is that not possible ? – Damarisdamarra 7/9, 2009 at 5:19

Preets: If they had done that, you couldn't unbox it directly to an int. – Awfully 7/9, 2009 at 5:25

What's the point though? Boxing a non-null nullable as a Nullable<...> is simply wasting a boolean value, thus (slightly) increasing GC pressure and reducing processor cache for no good reason. The whole idea behind Nullable<...> is that it represents a value type that happens to be able to be null - but that entire extra step is unnecessary for boxed values which can inherently be null anyhow. – Gdynia 7/9, 2009 at 5:27

While I can sort of understand the processor cache argument (I don't think it matters most of the time), I am not sure about the GC pressure argument. Whether you box and int or box a Nullable<int>, the GC still handles it as a single block. Creating it is just an allocation (near-free with the GC), and deleting means mark/sweep/compact is still going to mark, sweep, and compact a block regardless. I can't see any difference in GC load...just a difference in the size of the allocated block. I think the crux of the matter is what Mehrdad stated: "unbox directly to an int". – Generative 7/9, 2009 at 5:49

@EamonNerbonne: Keeping the identities of possibly-nested nullable types makes it possible to for a collection of arbitrary type T distinguish between a return that means "Key X has no associated value", and "Key X has been associated with a null value". – Scholasticate 23/12, 2012 at 17:28

@supercat: that's still possible. First of all, the collection itself internally can presumably distinguish between an unstored value (not present in whatever datastructure used) and a stored null value. So then it's a question of API: most .NET collection API's (and certainly the newer ones) make it possible to distinguish present null values from the absence of a value. To store a value or a fallback such as null in a structure with an API that doesn't distinguish null from absence, simply wrap the value in your own Maybe<T> structure (a common pattern in functional programming anyhow). – Gdynia 2/1, 2013 at 14:24

@supercat: in short: you can do whatever a boxed Nullable<> could do easily yourself, but you can't emulate the current behaviour of auto-unwrapping yourself without VM support. Thus the current behavior is better (and probably faster). Not to mention the fact that usually you can let the typesystem deduce the type of a null-placeholder so that the need to do so at runtime is fairly rare. Optimizing for such a corner case (you can't statically determine the type, and you do actually care about the type of the "null" even though no value is present) isn't likely to be useful. – Gdynia 2/1, 2013 at 14:30

@EamonNerbonne: I don't really see much advantage to the unusual behavior of Nullable<T>. While there's nothing preventing user implementation of a Maybe<T> type, there's substantial value in having different libraries that need the same things in a type use the same type. If Moe writes an interface with a method that returns a Moe.Maybe<T>, and Larry writes one which uses a Larry.Maybe<T>, someone trying to implement both interfaces will have to use different methods to deal with the different types. It would be cleaner if there were one standard Maybe<T>... – Scholasticate 2/1, 2013 at 15:59

...which simply contained public T Value; public bool IsValid; along with the natural constructors. The struct would not imply any particular semantics for those two fields--the semantics would be specified by any methods which return or accept an instance of the structure. Some people may dislike the notion of exposed fields, but they can greatly reduce the performance cost of using a composite type. For example, one saying Result.IsValid = MyDict.TryGetValue(ref Result.Value); would eliminate the need to have TryGetValue put the value into a temp before it's copied to Result. – Scholasticate 2/1, 2013 at 16:4

I agree - I'd love to see a standard maybe, or better yet, real support for discriminated unions. Nevertheless, Nullable<> isn't that type - it just adds null to value types. It was never going to be able to fit both needs perfectly. – Gdynia 3/1, 2013 at 10:0

What's more interesting are the lifted operators. They are handled by the compiler and share some similarities with NaN as error-indicators. – Cawthon 4/11, 2014 at 0:27

I think it makes sense to box a null value to a null reference. Having a boxed value saying "I know I would be an Int32 if I had a value, but I don't" seems unintuitive to me. Better to go from the value type version of "not a value" (a value with HasValue as false) to the reference type version of "not a value" (a null reference).

I believe this change was made on the feedback of the community, btw.

This also allows an interesting use of as even for value types:

object mightBeADouble = GetMyValue();

double? unboxed = mightBeADouble as double?;
if (unboxed != null)
{
    ...
}

This is more consistent with the way "uncertain conversions" are handled with reference types, than the previous:

object mightBeADouble = GetMyValue();

if (mightBeADouble is double)
{
    double unboxed = (double) mightBeADouble;
    ...
}

(It may also perform better, as there's only a single execution time type check.)

Phosphorescence answered 7/9, 2009 at 5:23 Comment(1)

The normal use I'm aware of for Nullable<T> is in situations where a function may or may not have a T to return. Conceptually, that pattern is just as applicable when T is a reference type or nullable type, as when it is a non-nullable value type. If one had a collection of Nullable<int>, a logical return type for a TryGetXXX method would be a Nullable<Nullable<int>``. If the return's HasValue` is false, that meant it couldn't get a Nullable<int>. If the outer one is true, the return value is whatever was stored in the collection (which could be null). – Scholasticate 23/12, 2012 at 17:32

A thing that you gain via this behavior is that the boxed version implements all interfaces supported by the underlying type. (The goal is to make Nullable<int> appear the same as int for all practical purposes.) Boxing to a boxed-Nullable<int> instead of a boxed-int would prevent this behavior.

From the MSDN Page,

double? d = 44.4;
  object iBoxed = d;
  // Access IConvertible interface implemented by double.
  IConvertible ic = (IConvertible)iBoxed;
  int i = ic.ToInt32(null);
  string str = ic.ToString();

Also getting the int from a boxed version of a Nullable<int> is straightforward - Usually you can't unbox to a type other than the original src type.

float f = 1.5f;
object boxed_float = f;
int int_value = (int) boxed_float; // will blow up. Cannot unbox a float to an int, you *must* unbox to a float first.

float? nullableFloat = 1.4f;
boxed_float = nullableFloat;
float fValue = (float) boxed_float;  // can unbox a float? to a float    Console.WriteLine(fValue);

Here you do not have to know if the original version was an int or a Nullable version of it. (+ you get some perf too ; save space of storing the the hasValue boolean as well in the boxed object)

Saville answered 7/9, 2009 at 5:45 Comment(1)

I can understand this as a rationale, but I would think it would be better to have either used some CLR magic to create wrapper methods for a Nullable<T> to implement T's interfaces, or else restricted the odd boxing behavior to interface casts. As it is, Nullable<T> ends up in a really weird limbo. – Scholasticate 12/9, 2012 at 23:31

I guess that is basically what it does. The description given includes your suggestion (ie boxing into a Nullable<T>).

The extra is that it sets the hasValue field after boxing.

Myall answered 7/9, 2009 at 4:58 Comment(0)

I would posit that the reason for the behavior stems from the behavior of Object.Equals, most notably the fact that if the first object is null and the second object is not, Object.Equals returns false rather than call the Equals method on the second object.

If Object.Equals would have called the Equals method on the second object in the case where the first object was null but the second was not, then an object which was null-valued Nullable<T> could have returned True when compared to null. Personally, I think the proper remedy would have been to make the HasValue property of a Nullable<T> have nothing to do with the concept of a null reference. With regard to the overhead involved with storing a boolean flag on the heap, one could have provided that for every type Nullable<T> there would a be a static boxed empty version, and then provide that unboxing the static boxed empty copy would yield an empty Nullable<T>, and unboxing any other instance would yield a populated one.

Scholasticate answered 19/11, 2011 at 19:35 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Clarification:

Recommended topics

Hot tags