Why Nullable<T> is a struct?
Asked Answered
H

4

26

I was wondering why Nullable<T> is a value type, if it is designed to mimic the behavior of reference types? I understand things like GC pressure, but I don't feel convinced - if we want to have int acting like reference, we are probably OK with all the consequences of having real reference type. I can see no reason why Nullable<T> is not just boxed version of T struct.

As value type:

  1. it still needs to be boxed and unboxed, and more, boxing must be a bit different than with "normal" structs (to treat null-valued nullable like real null)
  2. it needs to be treated differently when checking for null (done simply in Equals, no real problem)
  3. it is mutable, breaking the rule that structs should be immutable (ok, it is logically immutable)
  4. it needs to have special restriction to disallow recursion like Nullable<Nullable<T>>

Doesn't making Nullable<T> a reference type solve that issues?

rephrased and updated:

I've modified my reason list a bit, but my general question is still open:

How will reference type Nullable<T> be worse than current value type implementation? Is it only GC pressure and "small, immutable" rule? It still feels strange for me...

Hennery answered 24/11, 2010 at 22:24 Comment(6)
The real question, is: why not a generic Maybe<T>, for all types T.Fouts
@pst F# has an Option<T> type, but the semantics wouldn't work in C# without a major shift from "All refs can be nulls" to "Only refs you specific can be nulls". Otherwise, what's the point of having a Maybe type?Vesicle
This is a really tough question, you should go and read the corresponsing CIL Standard sections. Nullable<T> has a sepcial position in the Type System and a lot of special handling in the CLR applies to it. I'm afraid the answer isn't as simple as you might wish.Amberjack
Nullable<T> is a completely insane type, and trying to reason logically about it is folly. Shut your eyes, cover your ears, and scream "la la la la la I can't hear you la la la la" until it goes away, that's my advice.Dugan
The only thing that I'm not convinced about is that it is "designed to mimic the behavior of reference types." Since when? Where did you get this idea? The whole question is flawed, that's why you're not getting the answer you want.Goshen
@Aaronaught, I agree, and tackled that in my answer.Arbitrage
A
19

The reason is that it was not designed to act like a reference type. It was designed to act like a value type, except in just one particular. Let's look at some ways value types and reference types differ.

The main difference between a value and reference type, is that value type is self-contained (the variable containing the actual value), while a reference type refers to another value.

Some other differences are entailed by this. The fact that we can alias reference types directly (which has both good and bad effects) comes from this. So too do differences in what equality means:

A value type has a concept of equality based on the value contained, which can optionally be redefined (there are logical restrictions on how this redefinition can happen*). A reference type has a concept of identity that is meaningless with value types (as they cannot be directly aliased, so two such values cannot be identical) that can not be redefined, which is also gives the default for its concept of equality. By default, == deals with this value-based equality when it comes to value types†, but with identity when it comes to reference types. Also, even when a reference type is given a value-based concept of equality, and has it used for == it never loses the ability to be compared to another reference for identity.

Another difference entailed by this is that reference types can be null - a value that refers to another value allows for a value that doesn't refer to any value, which is what a null reference is.

Also, some of the advantages of keeping value-types small relate to this, since being based on value, they are copied by value when passed to functions.

Some other differences are implied but not entailed by this. That it's often a good idea to make value types immutable is implied but not entailed by the core difference because while there are advantages to be found without considering implementation matters, there are also advantages in doing so with reference types (indeed some relating to safety with aliases apply more immediately to reference types) and reasons why one may break this guideline - so it's not a hard and fast rule (with nested value types the risks involved are so heavily reduced that I would have few qualms in making a nested value type mutable, even though my style leans heavily to making even reference types immutable when at all practical).

Some further differences between value types and reference types are arguably implementation details. That a value type in a local variable has the value stored on the stack has been argued as an implementation detail; probably a pretty obvious one if your implementation has a stack, and certainly an important one in some cases, but not core to the definition. It's also often overstated (for a start, a reference type in a local variable also has the reference itself in the stack, for another there are plenty of times when a value type value is stored in the heap).

Some further advantages in value types being small relate to this.


Now, Nullable<T> is a type that behaves like a value type in all the ways described above, except that it can take a null value. Maybe the matter of local values being stored on the stack isn't all that important (being more an implementation detail than anything else), but the rest is inherent to how it is defined.

Nullable<T> is defined as

struct Nullable<T>
{
    private bool hasValue;
    internal T value;
    /* methods and properties I won't go into here */
}

Most of the implementation from this point is obvious. Some special handling is needed allow null to be assigned to it - treated as if default(Nullable<T>) had been assigned - and some special handling when boxed, and then the rest follows (including that it can be compared for equality with null).

If Nullable<T> was a reference type, then we'd have to have special handling to allow for all the rest to occur, along with special handling for features in how .NET helps the developer (such as we'd need special handling to make it descend from ValueType). I'm not even sure if it would be possible.

*There are some restrictions on how we are allowed to redefine equality. Combining those rules with those used in the defaults, then generally we can allow for two values to be considered equal that would be considered unequal by default, but it rarely makes sense to consider two values unequal that the default would consider equal. A exception is the case where a struct contains only value-types, but where said value-types redefine equality. This the a result of an optimisation, and generally considered a bug rather than by design.

†An exception is float-point types. Because of the definition of value-types in the CLI standard, double.NaN.Equals(double.NaN) and float.NaN.Equals(float.NaN) return true. But because of the definition of NaN in ISO 60559, float.NaN == float.NaN and double.NaN == double.NaN both return false.

Arbitrage answered 28/11, 2010 at 0:33 Comment(11)
I would actually go one step further and say that Nullable<T> cannot even truly be null. Every variable or argument that is a Nullable<T> does in fact have a value, even if it is "null". Were it not for the == and Equals overloads, you wouldn't even be able to compare it directly to null. Any resemblance that Nullable<T> may have to a reference type is merely syntactic sugar.Goshen
@Aaronaught, I'd argue differently and say that the meaning of null is different, but overlapping. The null of int? is comparable to the null of a nullable integer field in a database - a theoretical and mathematical concept. The null of string x = null means both that same concept, and also relates to the the identity-based nature of references - in this case having no identified value. As such, they are both reasonable models of nullity, and there are reasonable comparisons, but they do model nullity differently.Arbitrage
That's all well and good, Jon - my point was that a variable or argument of type Nullable<T> can never actually hold the literal value null, and that the only thing that makes it even sort of like a reference type is its overloaded == operator.Goshen
Aaronaught, what does "holding a value" mean? It means having a property that compares as equal with that value. Whether the same bit-pattern is used anywhere is an implementation detail. If it walks like a null and quacks like a null, then it's a duck. I mean, a null.Arbitrage
@Jon, I have to disagree with you on some parts. While you're correct about the equality concept for primitives, you're wrong that this applies to value types in general. ValueType defines its own Equals() and GethashCode() methods. This Equals implementation does not compare the binary data, but rather compares for equality via reflection: msdn.microsoft.com/en-us/library/2dts52z7(v=VS.85).aspx - this does make an important difference and is is part of the reason why one should always override these two method on any struct. (to be continued)Catricecatrina
The other thing is that you're mixing up compiler magic and CLR behavior. Whether a language actually provides a "nice" interface to using Nullable<> has nothing to do with the CLR - it is technically not possible in the CLR to assign null to a unboxed Nullable<> instance (which is what @Goshen correctly noted); C# adds this abstraction to implement the concept of null values identically for value and reference types. Also differing between heap or stack is irrelevant; it is common to have a class with Nullable<> fields, puts them on the heap while keeping the value type semantics.Catricecatrina
@Lucero, I didn't claim that value type compares binary data (it does actually in some cases, which is the optimisation I noted as sometimes causing a bug - but that's an optimisation intended to produce the same semantics as the reflection method, but failing in the buggy case). It's that case that means you might have to override the equals even when the reflection provides the correct semantics - overriding otherwise is a matter of optimisation.Arbitrage
It's possible to put Nullable<T> into a state where it compares with null as equal. That's semantically equivalent to assigning null to it, whether a language lets us use the same syntax for assigning null to a reference type or not. To my mind that's assigning a semantic null to it. Again, if it quacks like a null...Arbitrage
As for heap vs stack being irrelevant, that's hardly differing in the opinion I stated when saying it has "been argued as an implementation detail", "often overstated" and pointed out that "for a start, a reference type in a local variable also has the reference itself in the stack, for another there are plenty of times when a value type value is stored in the heap". It can have effects that make the difference relevant to some low-level optimisations, but I had hoped I was clearly stating that I didn't think it particularly relevant or important, and was mentioning it for completeness.Arbitrage
I agree with Aaronaught above. Use of null with a Nullable is misleading syntactic sugar. I use HasValue instead as much as possible.Siliceous
Concrete Gannet: the type is called Nullable<T> you can assign nulls to it, compare it to nulls and it even throws null reference exception if you access it before it has been assigned a value (not null) by the application code even though there's an actual default value in the internal "value"-field, I'm not sure if there's any performance reasons to only be using HasValue instead of null compare, but the compiler have us covered (#676578). If it walks like a duck and quacks like a duck...Colic
C
9

Edited to address the updated question...

You can box and unbox objects if you want to use a struct as a reference.

However, the Nullable<> type basically allows to enhance any value type with an additional state flag which tells whether the value shall be used as null or if the stuct is "valid".

So to address your questions:

  1. This is an advantage when used in collections, or because of the different semantics (copying instead of referencing)

  2. No it doesn't. The CLR does respect this when boxing and unboxing, so that you actually never box a Nullable<> instance. Boxing a Nullable<> which "has" no value will return a null reference, and unboxing does the opposite.

  3. Nope.

  4. Again, this isn't the case. In fact generic constraints for a struct do not allow nullable structs to be used. This makes sense due to the special boxing/unboxing behavior. Therefore, if you have a where T: struct to constrain a generic type, nullable types will be disallowed. Since this constraint is defined on the Nullable<T> type as well, you cannot nest them, without any special treatment to prevent this.

Why not using references? I already mentioned the important semantic differences. But apart from this, reference types use much more memory space: Each reference, especially in 64-bit environments, uses up not only heap memory for the instance, but also memory for the reference, the instance type information, locking bits etc.. So, apart from the semantics and performance differences (indirection via reference), you end up with using a multiple of the memory used for the entity itself for most common entities. And the GC gets more objects to handle, which will make the total performance compared to structs even worse.

Catricecatrina answered 24/11, 2010 at 22:26 Comment(6)
Ad 2. int? a = null; bool b = (a == null); Since a is not real null reference, just Nullable holding null value, isn't it a special treatment? Ad 3. int? a = 1; a++; // a is now 2 - isn't it mutating the state?Hennery
It depends, I'd have to check the IL generated by the different C# versions. If == null always reverts to a reference comparison, then no, this wouldn't be special treatment in the compiler since the boxing required to compare with null will actually yield a null reference. Note that when using an unconstrained generic and comparing with null, it will in fact do a boxing operation (and issue a warning, at least R# does).Catricecatrina
I have to disagree with what you say about memory use. The plain difference is the pointer-size (32 or 64 bit), and if null or aliased, a reference type will take less total memory than a value-type if the class/struct size is greater than that. Since you will mostly not be copying the value itself, there tends to be much aliasing (though without the risks as they are aliased from different stack frames) so references tend to have lower total memory pressure. There are trade-offs that mean that value-type still wins upto around 16bytes, but the trade-offs can go either way.Arbitrage
@Jon, sorry you're wrong. Each heap instance has not only the fields in memory, But also uses additional memory for a syncblock, a type handle, etc.: msdn.microsoft.com/en-us/magazine/cc163791.aspx Depending on the type you'll have padding on the addresses which may also add additional wasted heap memory on each instance. Even in the null and (IMHO no so common) aliasing case you always waste the pointer memory, which is equal or more than a bool?, byte?, sbyte?, bool?, short? or ushort?, and on 64-bit also int?, uint?, single?, most nullable enum types and others!Catricecatrina
But there are still cases where the size of the value will outweigh that, and when it does, that increases with each stack frame. This reduces the effect of padding and the other info in memory, and brings the practical comparison back to closer to whether the size compares to pointer size. After around 16bytes, the balance is in a reference type's favour. It's just not always the case (and won't be in many cases that would be most common) nor particularly great. It would have to be a pretty massive advantage to reference types to make sense when value types is the obvious approach.Arbitrage
@Lucero: In 32-bit .net, the overhead for a heap object which is never used by a monitor lock or other such construct is eight bytes, plus whatever would be needed to bring the total size up to twelve. If one reference exists to a boxed 4-byte struct, the total cost will be 16 bytes. If two references exist to the same boxed struct, the total cost will be 20 bytes. If three, 24. Four, 28. One storage location of a nullable four-byte struct type costs 8 bytes; two, 16; three, 24; four, 32. So Nullable<T> wouldn't save much over an ImmutableHolder<T> class.Sportsman
M
6

It is not mutable; check again.

The boxing is different too; an empty "boxes" to null.

But; it is small (barely bigger than T), immutable, and encapsulates only structs - ideal as a struct. Perhaps more importantly, so long as T is truly a "value", then so is T? a logical "value".

Monkhood answered 24/11, 2010 at 22:26 Comment(11)
int? a = 1; a++; // a is now 2 - isn't it mutating the state?Hennery
@A. No you've swapped the state. That is like asking if ( for int) i++ means int is mutable.Monkhood
int a = 1; a++; //a is now 2 does that make int mutable? No. That is not what mutable means.Kamila
OK, looks like I don't fully understand mutability. So for example - if var x = DateTime.Now; x.AddMinutes(1); will change the x value, DateTime will be mutable, right? So where's the difference between my example?Hennery
@A. no, that doesn't change the x value at all; AddMinutes returns a value that you are ignoring. If you did x = x.AddMinutes(1), that is re-assigning the variable x. Mutability is when we can change the internal state of a value or object (which is different to changing a variable). So: cust.Name = "Fred" changes the internal state of the object referenced by cust; with value-types it is harder to explain because the copy-semantics get confusing; but with struct variable x, if I could do x.Foo = 123 to change it, then the value of x is mutable.Monkhood
However, you can always reassign a local variable; this is orthogonal to mutability; likewise you can reassign a field (as long as it isn't readonly) but that has nothing to do with mutability.Monkhood
I know that DateTime is mutable, it was a hypothetical example only. OK, so with my earlier example - struct int and its operator++ - it is immutable because we know about its copy semantics, but theoretically it could be implemented as mutable without changing anything in how it looks from the outside, right?Hennery
@A. - no, DateTime is immutable.Monkhood
Right, right, it was late, this is what I mean from the beginning.Hennery
a++ doesn't stand for something like a.Increment() but a = a + 1 so you're really assigning a new value replacing the old one and not modifying part of the struct. In particular if a is a property it first calls the getter, then calculates a+1 and then calls the setter with a+1 as parameter. And if a has a reference type a will point to a new instance and not the modified old one afterwards.Aeschines
Generally, immutability can refer to either the variable (e.g. readonly List<int> x; cannot have x replaced but can have x's members change) or the value itself (e.g. string x = "abc"; cannot have its parts changed, though x can be replaced), or both. When we talk of an immutable type, we mean the second. With a simple type like int that doesn't have component parts, the distinction disappears and comparing int with an immutable struct is meaningless, allowing us to consider int as mutable or immutable as it suits us. It suits us when comparing to other valuetypes, but not always.Arbitrage
C
-1

I coded MyNullable as a class. Can't really understand why it cannot be a class, beside for avoid heap memory pressure.

namespace ClassLibrary1

{ using NFluent;

using NUnit.Framework;

[TestFixture]
class MyNullableShould
{
    [Test]
    public void operator_equals_btw_nullable_and_value_works()
    {
        var myNullable = new MyNullable<int>(1);

        Check.That(myNullable == 1).IsEqualTo(true);
        Check.That(myNullable == 2).IsEqualTo(false);
    }

    [Test]
    public void Can_be_comparedi_with_operator_equal_equals()
    {
        var myNullable = new MyNullable<int>(1);
        var myNullable2 = new MyNullable<int>(1);

        Check.That(myNullable == myNullable2).IsTrue();
        Check.That(myNullable == myNullable2).IsTrue();

        var myNullable3 = new MyNullable<int>(2);
        Check.That(myNullable == myNullable3).IsFalse();
    }
}

} namespace ClassLibrary1 { using System;

public class MyNullable<T> where T : struct
{
    internal T value;

    public MyNullable(T value)
    {
        this.value = value;
        this.HasValue = true;
    }

    public bool HasValue { get; }

    public T Value
    {
        get
        {
            if (!this.HasValue) throw new Exception("Cannot grab value when has no value");
            return this.value;
        }
    }

    public static explicit operator T(MyNullable<T> value)
    {
        return value.Value;
    }

    public static implicit operator MyNullable<T>(T value)
    {
        return new MyNullable<T>(value);
    }

    public static bool operator ==(MyNullable<T> n1, MyNullable<T> n2)
    {
        if (!n1.HasValue) return !n2.HasValue;
        if (!n2.HasValue) return false;
        return Equals(n1.value, n2.value);
    }

    public static bool operator !=(MyNullable<T> n1, MyNullable<T> n2)
    {
        return !(n1 == n2);
    }

    public override bool Equals(object other)
    {
        if (!this.HasValue) return other == null;
        if (other == null) return false;
        return this.value.Equals(other);
    }

    public override int GetHashCode()
    {
        return this.HasValue ? this.value.GetHashCode() : 0;
    }

    public T GetValueOrDefault()
    {
        return this.value;
    }

    public T GetValueOrDefault(T defaultValue)
    {
        return this.HasValue ? this.value : defaultValue;
    }

    public override string ToString()
    {
        return this.HasValue ? this.value.ToString() : string.Empty;
    }
}

}

Choral answered 24/11, 2010 at 22:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.