Why covariance and contravariance do not support value type
Asked Answered
S

4

168

IEnumerable<T> is co-variant but it does not support value type, just only reference type. The below simple code is compiled successfully:

IEnumerable<string> strList = new List<string>();
IEnumerable<object> objList = strList;

But changing from string to int will get compiled error:

IEnumerable<int> intList = new List<int>();
IEnumerable<object> objList = intList;

The reason is explained in MSDN:

Variance applies only to reference types; if you specify a value type for a variant type parameter, that type parameter is invariant for the resulting constructed type.

I have searched and found that some questions mentioned the reason is boxing between value type and reference type. But it does not still clear up my mind much why boxing is the reason?

Could someone please give a simple and detailed explanation why covariance and contravariance do not support value type and how boxing affects this?

Scavenge answered 17/9, 2012 at 7:27 Comment(2)
see also Eric's answer to my similar question: #4096799Berglund
possible duplicate of cant-convert-value-type-array-to-params-objectIceskate
C
137

Basically, variance applies when the CLR can ensure that it doesn't need to make any representational change to the values. References all look the same - so you can use an IEnumerable<string> as an IEnumerable<object> without any change in representation; the native code itself doesn't need to know what you're doing with the values at all, so long as the infrastructure has guaranteed that it will definitely be valid.

For value types, that doesn't work - to treat an IEnumerable<int> as an IEnumerable<object>, the code using the sequence would have to know whether to perform a boxing conversion or not.

You might want to read Eric Lippert's blog post on representation and identity for more on this topic in general.

EDIT: Having reread Eric's blog post myself, it's at least as much about identity as representation, although the two are linked. In particular:

This is why covariant and contravariant conversions of interface and delegate types require that all varying type arguments be of reference types. To ensure that a variant reference conversion is always identity-preserving, all of the conversions involving type arguments must also be identity-preserving. The easiest way to ensure that all the non-trivial conversions on type arguments are identity-preserving is to restrict them to be reference conversions.

Chronometer answered 17/9, 2012 at 7:37 Comment(9)
Thank you for your answer, the link from Eric Lippert is very valuable, I am still a little bit confused your statement variance applies when the CLR can ensure that it doesn't need to make any representational change to the values? this is the first criteria to apply variance?Scavenge
@CuongLe: Well it's an implementation detail in some senses, but it's the underlying reason for the restriction, I believe.Chronometer
I don't think this has anything to do with representation: int is not a subtype of object, so there is no covariant conversion to be made. In contrast, IEnumerable<Integer> would be a covariant return type.Shagreen
@AndréCaron: Eric's blog post is important here - it's not just representation, but also identity preservation. But representation preservation means the generated code doesn't need to care about this at all.Chronometer
Precisely, identity cannot be preserved because int is not a subtype of object. The fact that a representational change is required is just a consequence of this.Shagreen
It helps to think in terms of blittable layout, which is, I believe what the use of "representation" is referring to in this discussion. Since value types exhibit an arbitrarily-sized memory image (as opposed to IntPtr.Size for every reference type), their use as Type Arguments in generic code instantiation--which necessarily incorporates the size of the underlying entity--will thus result in instance-specific runtime code that can't sensibly be shared.Dayan
How is int not a subtype of object? Int32 inherits from System.ValueType, which inherits from System.Object.Excipient
@DavidKlempfner I think @AndréCaron comment is poorly phrased. Any value type such as Int32 has two representational forms, "boxed" and "unboxed". The compiler has to insert code to convert from one form to the other, even though this is normally invisible at a source code level. In effect, only the "boxed" form is considered by the underlying system to be a subtype of object, but the compiler automatically deals with this whenever a value type is assigned to a compatible interface or to something of type object.Republicanism
FYI, the link for Eric Lippert's blog post doesn't work. I found (what I think) is the post at ericlippert.com/2009/03/03/representation-and-identityQuotient
R
12

It is perhaps easier to understand if you think about the underlying representation (even though this really is an implementation detail). Here is a collection of strings:

IEnumerable<string> strings = new[] { "A", "B", "C" };

You can think of the strings as having the following representation:

[0] : string reference -> "A"
[1] : string reference -> "B"
[2] : string reference -> "C"

It is a collection of three elements, each being a reference to a string. You can cast this to a collection of objects:

IEnumerable<object> objects = (IEnumerable<object>) strings;

Basically it is the same representation except now the references are object references:

[0] : object reference -> "A"
[1] : object reference -> "B"
[2] : object reference -> "C"

The representation is the same. The references are just treated differently; you can no longer access the string.Length property but you can still call object.GetHashCode(). Compare this to a collection of ints:

IEnumerable<int> ints = new[] { 1, 2, 3 };
[0] : int = 1
[1] : int = 2
[2] : int = 3

To convert this to an IEnumerable<object> the data has to be converted by boxing the ints:

[0] : object reference -> 1
[1] : object reference -> 2
[2] : object reference -> 3

This conversion requires more than a cast.

Romeo answered 17/9, 2012 at 7:51 Comment(1)
Boxing is not just an "implementation detail". Boxed value types are stored the same way as class objects, and behave, as far as the outside world can tell, like class objects. The only difference is that within the definition of a boxed value type, this refers to a struct whose fields overlay those of the heap object that stores it, rather than referring to the object which holds them. There is no clean way for a boxed value type instance to get a reference to the enclosing heap object.Cymoid
M
7

I think everything starts from definiton of LSP (Liskov Substitution Principle), which climes:

if q(x) is a property provable about objects x of type T then q(y) should be true for objects y of type S where S is a subtype of T.

But value types, for example int can not be substitute of object in C#. Prove is very simple:

int myInt = new int();
object obj1 = myInt ;
object obj2 = myInt ;
return ReferenceEquals(obj1, obj2);

This returns false even if we assign the same "reference" to the object.

Male answered 17/9, 2012 at 7:40 Comment(7)
I think you're using the right principle but there's no proof to be made: int is not a subtype of object so the principle does not apply. Your "proof" relies on an intermediate representation Integer, which is a subtype of object and for which the language has an implicit conversion (object obj1=myInt; is actualy expanded to object obj1=new Integer(myInt);).Shagreen
The language takes care of correct casting between types, but ints behaviour does not correspond to that one we would expect from object's subtype.Male
My whole point is precisely that int is not a subtype of object. Moreover, LSP does not apply because myInt, obj1 and obj2 refer to three different objects: one int and two (hidden) Integers.Shagreen
@André: C# is not Java. The C#'s int keyword is an alias for the BCL's System.Int32, which is in fact a subtype of object (an alias of System.Object). In fact, int's base class is System.ValueType who's base class is System.Object. Try evaluating the following expression and see: typeof(int).BaseType.BaseType. The reason ReferenceEquals returns false here is that the int is boxed into two seperate boxes, and each box's identity is different for any other box. Thus two boxing operation always yield two objects that are never identical, regardless of the boxed value.Elroy
@AllonGuralnek: Each value type (e.g. System.Int32 or List<String>.Enumerator) actually represents two kinds of things: a storage-location type, and a heap-object type (sometimes called a "boxed value type"). Storage locations whose types derive from System.ValueType will hold the former; heap objects whose types do likewise will hold the latter. In most languages, a widening cast exists from the former the latter, and a narrowing cast from the latter to the former. Note that while boxed value types have the same type descriptor as value-type storage locations, ...Cymoid
...they behave semantically more like mutable reference types (even supposedly "immutable" value types are mutable when boxed). For example, copying one variable of type List<String>.Enumerator to another will copy its state; casting it to IEnumerator<String> will convert it to its boxed equivalent. Copying that to another variable of type IEnumerator<String>, however, will store a reference to the original boxed object rather than copying its state.Cymoid
@Male By that logic, every class in C# violates LSP because GetType() always returns a value different from that of a base class. The "property" form the LSP definition is really an implied expected behavior in a given context, which is a higher level of abstraction than a mere return value. In addition, ReferenceEquals() is not a property or behavior of integers, therefore your proof is invalid.Ricketts
A
3

It does come down to an implementation detail: Value types are implemented differently to reference types.

If you force value types to be treated as reference types (i.e. box them, e.g. by referring to them via an interface) you can get variance.

The easiest way to see the difference is simply consider an Array: an array of Value types are put together in memory contiguously (directly), where as an array of Reference types only have the reference (a pointer) contiguously in memory; the objects being pointed to are separately allocated.

The other (related) issue(*) is that (almost) all Reference types have the same representation for variance purposes and much code does not need to know of the difference between types, so co- and contra-variance is possible (and easily implemented -- often just by omission of extra type checking).

(*) It may be seen to be the same issue...

Acetic answered 17/9, 2012 at 10:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.