Why does the CLR allow mutating boxed immutable value types?
Asked Answered
P

3

12

I have a situation where I have a simple, immutable value type:

public struct ImmutableStruct
{
    private readonly string _name;

    public ImmutableStruct( string name )
    {
        _name = name;
    }

    public string Name
    {
        get { return _name; }
    }
}

When I box an instance of this value type, I would normally expect that whatever it is that I boxed would come out the same when I do an unbox. To my big suprise this is not the case. Using Reflection someone may easily modify my box's memory by reinitializing the data contained therein:

class Program
{
    static void Main( string[] args )
    {
        object a = new ImmutableStruct( Guid.NewGuid().ToString() );

        PrintBox( a );
        MutateTheBox( a );
        PrintBox( a );;
    }

    private static void PrintBox( object a )
    {
        Console.WriteLine( String.Format( "Whats in the box: {0} :: {1}", ((ImmutableStruct)a).Name, a.GetType() ) );
    }

    private static void MutateTheBox( object a )
    {
        var ctor = typeof( ImmutableStruct ).GetConstructors().Single();
        ctor.Invoke( a, new object[] { Guid.NewGuid().ToString() } );
    }
}

Sample output:

Whats in the box: 013b50a4-451e-4ae8-b0ba-73bdcb0dd612 :: ConsoleApplication1.ImmutableStruct Whats in the box: 176380e4-d8d8-4b8e-a85e-c29d7f09acd0 :: ConsoleApplication1.ImmutableStruct

(There's actually a small hint in the MSDN that indicates this is the intended behavior)

Why does the CLR allow mutating boxed (immutable) value types in this subtle way? I know that readonly is no guarantee, and I know that using "traditional" reflection a value instance can be easily mutated. This behavior becomes an issue, when the reference to the box is copied around and mutations show up in unexpected places.

One thing I have though about is that this enables using Reflection on value types at all - since the System.Reflection API works with object only. But Reflection breaks apart when using Nullable<> value types (they get boxed to null if they do not have a Value). Whats the story here?

Pentstemon answered 22/8, 2011 at 16:51 Comment(3)
Never seen ConstructorInfo invoked like that.Obtect
You are invoking ConstructorInfo(object, object[]), re-running the constructor. No readonly-ness is violated with that. The cure is the one my doctor usually recommends: if it hurts then don't do it.Obtrude
@Hans: Sure, I know all that (and that readonly is not a clr enforced guarantee, though this is implementation specific and a CLI conformant implementation is not prohibited to enforce readonly). And of course I would never use this in production code. Re-running the constructor is a very strange corner case with value types, and certainly something that is not possible with reference types.Pentstemon
H
15

Boxes aren't immutable as far as the CLR is concerned. Indeed, in C++/CLI I believe there's a way of mutating them directly.

However, in C# the unboxing operation always takes a copy - it's the C# language which prevents you from mutating the box, not the CLR. The IL unbox instruction merely provides a typed pointer into the box. From section 4.32 of partition III of ECMA-335 (the unbox instruction):

The unbox instruction converts obj (of type O), the boxed representation of a value type, to valueTypePtr (a controlled-mutability managed pointer (§1.8.1.2.2), type &), its unboxed form. valuetype is a metadata token (a typeref, typedef or typespec). The type of valuetype contained within obj must be verifier-assignable-to valuetype.

Unlike box, which is required to make a copy of a value type for use in the object, unbox is not required to copy the value type from the object. Typically it simply computes the address of the value type that is already present inside of the boxed object.

The C# compiler always generates IL which results in unbox being followed by a copying operation, or unbox.any which is equivalent to unbox followed by ldobj. The generated IL isn't part of the C# spec of course, but this is (section 4.3 of the C# 4 spec):

An unboxing operation to a non-nullable-value-type consists of first checking that the object instance is a boxed value of the given non-nullable-value-type, and then copying the value out of the instance.

Unboxing to a nullable-type produces the null value of the nullable-type if the source operand is null, or the wrapped result of unboxing the object instance to the underlying type of the nullable-type otherwise.

In this case, you're using reflection and therefore bypassing the protection offered by C#. (It's a particularly odd use of reflection too, I must say... calling a constructor "on" a target instance is very strange - I don't think I've ever seen that before.)

Hussy answered 22/8, 2011 at 16:56 Comment(2)
Neither did I, this is what keeps me wondering here. That C# guarantees to always take a copy is nice, however this is only achieved by using unbox.any (which creates a copy regardless of nullable or not then!). Contrary, ECMA says about unbox: "typically, unbox simply computes the address of the value type that is already present inside of the boxed object."Pentstemon
AFAIK, unbox does not need to copy since it outputs an adress and this is usually used with ldobj then (4.13 Partition III): "The ldobj instruction copies a value to the evaluation stack". Regardless, I think only csc v1 used unbox, csc v2 and later seem to emit unbox.any for everything. Thanks for your thorough answer!Pentstemon
O
3

Just to add.

In IL, you can mutate a boxed value if you use some 'unsafe' (read unverifiable) code.

The C# equivalent is something like:

unsafe void Foo(object o)
{
  void* p = o;
  ((int*)p) = 2;
}

object a = 1;
Foo(a);
// now a is 2
Obtect answered 22/8, 2011 at 17:9 Comment(0)
O
0

Value-type instances should be considered immutable only in the following cases:

  1. There does not exist any means of creating an instance of a structure which is in any way distinguishable from a default instance. For example, a structure with no fields could be reasonably considered immutable, since there would be nothing to mutate.
  2. The storage location holding the instance is privately held by something that will never mutate it.

Although the first scenario would be a property of a type rather than an instance, the notion of "mutability" is rather irrelevant for stateless types. That's not to imply such types are useless(*), but rather that the notion of mutability is irrelevant for them. Otherwise, struct types which hold any state are mutable, even if they pretend to be otherwise. Note that, ironically, if one didn't try to make a struct "immutable" but simply exposed its fields (and possibly used a factory method rather than a constructor to set its value), mutating a struct instance via its "constructor" wouldn't work.

(*)A struct type with no fields may implement an interface and satisfy a new constraint; it's not possible to use static methods of a passed-in generic type, but one can define a trivial structure which implements an interface and pass the type of the structure to code which can create a new dummy instance and use its methods). One could, for example, define a type FormattableInteger<T> where T:IFormatableIntegerFormatter,new() whose ToString() method would perform T newT = new T(); return newT.Format(value); Using such an approach, if one had an array of 20,000 FormattableInteger<HexIntegerFormatter>, the default method for storing the integers would be stored once as part of the type, rather than being stored 20,000 times--once for each instance.

Oldenburg answered 26/2, 2012 at 20:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.