Why do C# Arrays use a reference type for Enumeration, but List<T> uses a mutable struct?
Asked Answered
M

2

24

From what I've read, a design decision was made for certain Collections's Enumerator Types to be mutable structs instead of reference types for performance reasons. List.Enumerator is the most well known.

I was investigating some old code that used arrays, and was surprised to discover that C# Arrays return the type SZGenericArrayEnumerator as their generic enumerator type, which is a reference type.

I am wondering if anyone knows why Array's generic iterator was implemented as a reference type when so many other performance critical collections used mutable structs instead.

Matland answered 7/2, 2012 at 21:7 Comment(0)
A
38

From what I've read, a design decision was made for certain Collections's Enumerator Types to be mutable structs instead of reference types for performance reasons.

Good question.

First off, you are correct. Though in general, mutable value types are a bad code smell, in this case they are justified:

  • The mutation is almost entirely concealed from the user.
  • It is highly unlikely that anyone is going to use the enumerator in a confusing manner.
  • The use of a mutable value type actually does solve a realistic performance problem in an extremely common scenario.

I am wondering if anyone knows why Array's generic iterator was implemented as a reference type when so many other performance critical collections used mutable structs instead.

Because if you're the sort of person who is concerned about the performance of enumerating an array then why are you using an enumerator in the first place? It's an array for heaven's sake; just write a for loop that iterates over its indicies like a normal person and never allocate the enumerator. (Or a foreach loop; the C# compiler will rewrite the foreach loop into the equivalent for loop if it knows that the loop collection is an array.)

The only reason why you'd obtain an enumerator from an array in the first place is if you are passing it to a method that takes an IEnumerator<T>, in which case if the enumerator is a struct then you're going to be boxing it anyway. Why take on the expense of making the value type and then boxing it? Just make it a reference type to begin with.

Aeromechanics answered 7/2, 2012 at 21:15 Comment(10)
Given the prevalence (and expressive power) of LINQ, enumerators are likely being retrieved for arrays quite often. For instance, the reflection classes in .NET return arrays for many properties/methods - and using LINQ w/ reflection is quite handy. To my knowledge, most LINQ operators do not perform special case checks to determine if the IEnumerable<> they are dealing with is actually an array.Melbourne
@LBushkin: Correct. If you are unwilling to take on the cost of heap allocating and then garbage collecting an enumerator then you are also presumably unwilling to take on the cost of heap allocating and garbage collecting the query, and all of the enumerator objects it creates, and all the delegates you have to create. Not to mention all the overhead of calling that query (checking arguments for being correct, and so on). LINQ to objects is reasonably performant but it was certainly not designed to minimize heap allocations! LINQ allocates heap memory all over the place.Aeromechanics
Another factor, I suspect, is that the enumerator for the original List was designed in the days before generics. The struct-ness of List.Enumerator only affects three types of code: code which explicitly uses type List.Enumerator, code which accepts a generic parameter constrained to IEnumerable, or code which uses var to declare a variable which will hold a list's enumerator. If a programmer is going to specify List.Enumerator, the programmer should know what he's doing. As for the other two types of code, they couldn't exist when List.Enumerator was designed.Earthbound
Although List<T>.Enumerator was clearly developed after the invention of generics, the Principle of Least Astonishment would favor its working in the same way as List.Enumerator, even if the development of generics would mean that some corner-case behaviors might sometimes seem stranger than they otherwise might.Earthbound
@supercat: You are correct, and more generally, the whole business of foreach being pattern based rather than calling IEnumerable methods only was primarily to avoid the boxing penalty in pre-generics strongly typed collections. In the counterfactual world where we started in v1 with generics, there is far less impetus to invent value-typed iterators.Aeromechanics
@EricLippert: Even with generics, using a value-typed iterator would avoid the creation of a heap object every time the 'foreach' loop is entered. In some ways, though, it might have been nicer if the compiler could look for the existence of a foreach_GetEnumerator method; a public GetEnumerator method could return an enumerator class, while foreach_GetEnumerator returned a struct. That would seem to give the best of both worlds.Earthbound
@EricLippert: "Or a foreach loop; the C# compiler will rewrite the foreach loop into the equivalent for loop if it knows that the loop collection is an array", I never knew that!? Is that documented anywhere (not that I don't believe it), I would be interested to read about itWendolyn
For anyone else reading this, this answer covers the optimisations the compiler does https://mcmap.net/q/65913/-performance-difference-for-control-structures-39-for-39-and-39-foreach-39-in-c (Decided not to be lazy and try and search for it myself ;-)Wendolyn
@supercat, what is this non-generic List.Enumerator you mention? Do you mean ArrayList.ArrayListEnumerator? If so, that's a class, not a struct.Shrive
@phoog: Mea culpa. If ArrayListEnumerator was a class, that makes the decision to have List<T>.Enumerator be a struct a little more interesting. I thought there was some class that used an enumerator struct prior to generics, but maybe not. In any case, it's only in very rare corner cases that the struct-ness of List<T>.Enumerator causes any problem; in a very common case, it is advantageous.Earthbound
C
3

Arrays get some special treatment in the C# compiler. When you use foreach on them, the compiler translates it into a for loop. So there is no performance benefit in using struct enumerators.

List<T> on the other hand is a plain class without any special treatment, so using a struct results in better performance.

Charin answered 7/2, 2012 at 21:14 Comment(4)
isn't all foreach compiled into for loops?Junno
@OskarKjellin no, normal foreach loops become something like while(enumerator.MoveNext()){var element=enumerator.Current;...} whereas foreach loops over arrays index into the array(for(int i=0;i<arr.Length;i++).Charin
@OskarKjellin: All foreach and for loops are ultimately compiled into while loops. CodeInChaos's point is that foreach loops on arrays are actually first compiled into the traditional for (int i = 0; i < array.Length; ++i) style loop, not the while(enumtor.MoveNext()) style loop that most foreach loops are compiled into.Aeromechanics
@EricLippert Now that you say it's quite obvious given the IEnumerator interface and it's methods. Wasn't really thinking :(Junno

© 2022 - 2024 — McMap. All rights reserved.