Indexers in List vs Array
Asked Answered
R

6

13

How are the Indexers are defined in List and Arrays.

List<MyStruct> lists=new List<MyStruct>(); where MyStruct is a Structure. Now Consider MyStruct[] arr=new MyStruct[10];

arr[0] gives a reference to the first Structure item.But lists[0] gives me a copy of it. Is there any reason why it is done like that. Also since Int32 is structure List<Int32> list1 =new List<Int32>(); how it is possible for me to access list1[0] or assign list1[0]=5 where as it is not possible to do lists[0]._x=5

Radack answered 15/7, 2011 at 10:19 Comment(3)
not possible to do lists[0]._x=5: I think you should post the code. At the very least your structBolometer
@Bolometer - it kind-of is, actually - but not in the object sense; more in the ref MyStruct sense.Edlun
Seriously, can you see how much confusion a mutable struct has caused here? I restate: mutable structs are evil.Edlun
C
13

Although they look the same, the array indexer and list indexer are doing completely separate things.

The List<T> indexer is declared as a property with a parameter:

public T this[int index] { get; set; }

This gets compiled to get_Item and set_Item methods that are called like any other method when the parameter is accessed.

The array indexer has direct support within the CLR; there is a specific IL instruction ldelema (load element address) for getting a managed pointer to the n'th element of an array. This pointer can then be used by any of the other IL instructions that take a pointer to directly alter the thing at that address.

For example, the stfld (store field value) instruction can take a managed pointer specifying the 'this' instance to store the field in, or you can use the pointer to call methods directly on the thing in the array.

In C# parlance, the array indexer returns a variable, but the list indexer returns a value.

Confessional answered 15/7, 2011 at 10:26 Comment(1)
Where can i find the implementation of the array indexer ? i am looking at the referencesource.microsoft.com/#mscorlib/system/… but can't find itCapture
E
7

The final point:

lists[0]._x=5

is actually just a restatement of your earlier point:

arr[0] gives a reference to the first Structure item.But lists[0] gives me a copy of it.

If you edited a copy of it, the change would be lost into the ether, i.e.

var throwawayCopy = lists[0];
throwawayCopy._x = 5;
// (and never refer to throwawayCopy again, ever)

Since that is almost certainly not what you intended, the compiler doesn't let you. However, mutable structs are evil. A better option here would be don't use mutable structs. They bite.


Taking this down a level, to a simple but concrete example:

using System;
struct Evil
{
    public int Yeuch;
}
public class Program
{
    public static void Main()
    {
        var pain = new Evil[] { new Evil { Yeuch = 2 } };
        pain[0].Yeuch = 27;
        Console.WriteLine(pain[0].Yeuch);
    }
}

This compiles (looking at the last 2 lines here) as:

L_0026: ldloc.0 <== pain
L_0027: ldc.i4.0 <== 0
L_0028: ldelema Evil <== get the MANAGED POINTER to the 0th element
                           (not the 0th element as a value)
L_002d: ldc.i4.s 0x1b <== 27
L_002f: stfld int32 Evil::Yeuch <== store field

L_0034: ldloc.0 <== pain
L_0035: ldc.i4.0 <== 0
L_0036: ldelema Evil <== get the MANAGED POINTER to the 0th element
                           (not the 0th element as a value)
L_003b: ldfld int32 Evil::Yeuch <== read field
L_0040: call void [mscorlib]System.Console::WriteLine(int32) <== writeline
L_0045: ret 

It never actual talks to the struct as a value - no copies, etc

Edlun answered 15/7, 2011 at 10:24 Comment(10)
+1 Finally see your point: list[0]=5 calls List<int>::set_Item(0, 5), whereas list[0]._x=5 calls List<int>::get_Item(0) and works with the copy of the element. (I had ignoring the mention of ._x because it wasn't declared; I was focusing on arr[] versus lst[] only)Bolometer
It doesn't work with a copy at all. It works with a copy of the reference, sure, but you're getting the same object backBernardina
@Bolometer exactly; and with the array, it is the ._x=5 that operates on the struct reference, thus updating the item inside the array. But assigning it to a variable would break this as it dereferences/copies.Edlun
@Bernardina with a list it is a copy of the opject; with an array it is the reference (pointing inside the array), unless you dereference it by assigning to a variable.Edlun
It's a copy of the reference value, not a copy of the object itself. If the 'MyStruct' definition was a class, it's perfectly fine and legal to call someList[0].x = 5;, In that you have two different pointers pointing to the same object. At no point does it copy the object itselfBernardina
@Marc: I wish I could personally upvote this more. Especially the fact that arr[0]._x=5 *will update a valuetype element is quite enlightening. Never knew that before.Bolometer
@Bernardina except in that example, value isn't a reference - I'm talking about the difference between int and ref int (unfortunately C# lacks a syntax for local variables which are pointers to value-types), which is pretty much what ldelema does. I'll update my answer to illustrate (/cc @sehe)Edlun
@Marc.. Thanks for that wonderful explanation with IL examples.Radack
@Marc.. any reason why they not calling 'ldelema` when accessing with a 'list[0]`Radack
@Ashley because ldelema operates on arrays only; obtaining via an indexer (in the list) requires de-referencing to a value rather than a managed pointer. The CLI supports returning managed pointers internally, but very few languages allow you to return managed pointers.Edlun
D
2

List<T> has a normal indexer which behaves like a property. The access goes through accessor functions, and those are by-value.

T this[int index]
{
    get{return arr[index];}
    set{arr[index]=value;}}
}

Arrays are special types, and their indexer is field-like. Both the runtime and the C# compiler have special knowledge of arrays, and that enables this behavior. You can't have the array like behavior on custom types.

Fortunately this is rarely a problem in practice. Since you only use mutable structs in rare special cases(high performance or native interop), and in those you usually prefer arrays anyways due to their low overhead.


You get the same behavior with properties vs. fields. You get a kind of reference when using a field, but a copy when you use a property. Thus you can write to members of value-type fields, but not members of value-type properties.

Dowling answered 15/7, 2011 at 10:21 Comment(0)
V
2

I ran into this as well, when I was inspecting lambda expression types. When the lambda is compiled into an expression tree you can inspect the expression type for each node. It turns out that there is a special node type ArrayIndex for the Array indexer:

Expression<Func<string>> expression = () => new string[] { "Test" }[0];
Assert.Equal(ExpressionType.ArrayIndex, expression.Body.NodeType);

Whereas the List indexer is of type Call:

Expression<Func<string>> expression = () => new List<string>() { "Test" }[0];
Assert.Equal(ExpressionType.Call, expression.Body.NodeType);

This just to illustrate that we can reason about the underlying architecture with lambda expressions.

Vikki answered 28/1, 2012 at 14:6 Comment(0)
B
0

Your problem isn't with List<>, it's with structs themselves.

Take this for example:

public class MyStruct
{
    public int x;
    public int y;
}

void Main()
{
    var someStruct = new MyStruct { x = 5, y = 5 };

    someStruct.x = 3;
}

Here, you're not modifying the value of x of the original struct, you're creating a new object with y = y, and x = 3. The reason you can't directly modify this with a list, is because the list indexer is a function (as opposed to the array indexer), and it doesn't know how to 'set' the new struct in the list.

Modify the keyword struct to class and you'll see it works just fine (with classes you're not creating a brand new object every time you mutate it).

Bernardina answered 15/7, 2011 at 11:33 Comment(0)
S
0

One unfortunate limitation of .net languages is that they don't have any concept of a property doing anything other than returning a value, which can then be used however the caller sees fit. It would be very helpful (and if I had a means of petitioning for language features, I'd seek this) if there were a standard compiler-supported means of exposing properties as delegate callers, such that a statement like:

  MyListOfPoints[4].X = 5;

could be translated by the compiler into something like:

  MyListOfPoints.ActOnItem(4, (ref Point it) => it.X = 5);

Such code could be relatively efficient, and not create any GC pressure, if ActOnItem took an extra ref parameter of generic type, and passed it to a delegate which also took a parameter of that type. Doing that would allow the called function to be static, eliminating the need to create a closures or delegates for each execution of the enclosing function. If there were a way for ActOnItem to accept a variable number of generic 'ref' parameters, it would even be possible to handle constructs like:

  SwapItems(ref MyListOfPoints[4].X, ref MyListofPoints[4].Y);

with arbitrary combinations of 'ref' parameters, but even just having the ability to handle the cases where the property was "involved in" the left of an assignment, or a function was called with a single property-ish 'ref' parameter, would be helpful.

Note that being able to do things this way would offer an extra benefit beyond the ability to access fields of structs. It would also mean that the object exposing the property would receive notification that the consumer was done with it (since the consumer's delegate would return). Imagine, for example, that one has a control that shows a list of items, each with a string and a color, and one wants to be able to do something like:

  MyControl.Items(5).Color = Color.Red;

An easy statement to read, and the most natural-reading way to change the color of the fifth list item, but trying to make such a statement work would require that the object returned by Items(5) have a link to MyControl, and send it some sort of notification when it changed. Rather complicated. By contrast, if the style of call-through indicated above were supported, such a thing would be much simpler. The ActOnItem(index, proc, param) method would know that once proc had returned, it would have to redraw the item specified by index. Of some importance, if Items(5) were a call-through proc and didn't support any direct read method, one could avoid scenarios like:

  var evil = MyControl.Items(5);
  MyControl.items.DeleteAt(0);
  // Should the following statement affect the item that used to be fifth,
  // or the one that's fifth now, or should it throw an exception?  How
  // should such semantics be ensured?
  evil.Color = Color.Purple;

The value of MyControl.Items(5) would remain bound to MyControl only for the duration of the call-through involving it. After that, it would simply be a detached value.

Strickman answered 29/1, 2012 at 19:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.