What is the most efficient loop in c#
Asked Answered
T

3

40

There are a number of different way to accomplish the same simple loop though the items of an object in c#.

This has made me wonder if there is any reason be it performance or ease of use, as to use on over the other. Or is it just down to personal preference.

Take a simple object

var myList = List<MyObject>; 

Lets assume the object is filled and we want to iterate over the items.

Method 1.

foreach(var item in myList) 
{
   //Do stuff
}

Method 2

myList.Foreach(ml => 
{
   //Do stuff
});

Method 3

while (myList.MoveNext()) 
{
  //Do stuff
}

Method 4

for (int i = 0; i < myList.Count; i++)
{
  //Do stuff   
}

What I was wondering is do each of these compiled down to the same thing? is there a clear performance advantage for using one over the others?

or is this just down to personal preference when coding?

Have I missed any?

Tonguing answered 6/3, 2013 at 12:22 Comment(16)
You could always do your own benchmark, if you want to know how they differ in performance. System.Diagnostics.StopWatch is your friendAry
see this : forums.asp.net/t/1041090.aspxAucoin
You missed the do .. while(). Besides that, why not just compile them and look at the results in ILDASM?Treviso
Did you benchmark performance? What did your results dictate? Largely, any of your own logic is going to be the bottleneck, how you iterate largely negligible, and sometimes dictated by the situation. Bit of a non-issue, purely academical question.Kacykaczer
I was wondering if anyone already knew, before I go off and reinvent the wheel..Tonguing
@Tonguing Some things you should redo.Kacykaczer
I guess Parallel.ForEach() would be faster.Mattingly
Use what is most readable (for you) and what you need. So if you want to change the implementation later change the List<MyObject> to IEnumerable<MyObject> and use foreach or if you don't need indices anyway. I must admit that i have never used List.ForEach in the last 10 years for more than answers on SO.Posit
Interesting question, after reading Sam1 link, it shows that for and while produce the same MSIL so it's basically the same. For each it's slower because it uses the IEnumerable interface and also some casting is needed since the types are know.Babblement
@Treviso Yes Do { //do stuff } while(true) is missing, but this is functionaliy different to the other loops as this will always do at least one iteration of your //Do stuff regardless of the condition.Cryptonymous
[foreach vs Linq] (geekswithblogs.net/BlackRabbitCoder/archive/2010/04/23/…)Absorptance
The question is closely related https://mcmap.net/q/408280/-when-to-use-which-forYesterday
Note this is not a duplicate of https://mcmap.net/q/408280/-when-to-use-which-for/50776 that @Yesterday pointed out; the scope of that one includes Parallel extensions, which expands the scope of the question (and modifies the answers) greatly.Losse
@casperOne, its a good point, the choice of parallel extensions or PLINQ could have a significant effect on performance. However, the two questions could be merged into one better resource.Yesterday
@Yesterday Not really, because the answers from here would be lacking; they'd not provide any details about PLINQ. It's a bad candidate for merging.Losse
You missed: for (int i = myList.Count-1; i >= 0; i--) which can be faster in some circumstances. (Bounds check only needs to happen once, and simpler branch instructions).Afb
L
70

The answer the majority of the time is it does not matter. The number of items in the loop (even what one might consider a "large" number of items, say in the thousands) isn't going to have an impact on the code.

Of course, if you identify this as a bottleneck in your situation, by all means, address it, but you have to identify the bottleneck first.

That said, there are a number of things to take into consideration with each approach, which I'll outline here.

Let's define a few things first:

  • All of the tests were run on .NET 4.0 on a 32-bit processor.
  • TimeSpan.TicksPerSecond on my machine = 10,000,000
  • All tests were performed in separate unit test sessions, not in the same one (so as not to possibly interfere with garbage collections, etc.)

Here's some helpers that are needed for each test:

The MyObject class:

public class MyObject
{
    public int IntValue { get; set; }
    public double DoubleValue { get; set; }
}

A method to create a List<T> of any length of MyClass instances:

public static List<MyObject> CreateList(int items)
{
    // Validate parmaeters.
    if (items < 0) 
        throw new ArgumentOutOfRangeException("items", items, 
            "The items parameter must be a non-negative value.");

    // Return the items in a list.
    return Enumerable.Range(0, items).
        Select(i => new MyObject { IntValue = i, DoubleValue = i }).
        ToList();
}

An action to perform for each item in the list (needed because Method 2 uses a delegate, and a call needs to be made to something to measure impact):

public static void MyObjectAction(MyObject obj, TextWriter writer)
{
    // Validate parameters.
    Debug.Assert(obj != null);
    Debug.Assert(writer != null);

    // Write.
    writer.WriteLine("MyObject.IntValue: {0}, MyObject.DoubleValue: {1}", 
        obj.IntValue, obj.DoubleValue);
}

A method to create a TextWriter which writes to a null Stream (basically a data sink):

public static TextWriter CreateNullTextWriter()
{
    // Create a stream writer off a null stream.
    return new StreamWriter(Stream.Null);
}

And let's fix the number of items at one million (1,000,000, which should be sufficiently high to enforce that generally, these all have about the same performance impact):

// The number of items to test.
public const int ItemsToTest = 1000000;

Let's get into the methods:

Method 1: foreach

The following code:

foreach(var item in myList) 
{
   //Do stuff
}

Compiles down into the following:

using (var enumerable = myList.GetEnumerable())
while (enumerable.MoveNext())
{
    var item = enumerable.Current;

    // Do stuff.
}

There's quite a bit going on there. You have the method calls (and it may or may not be against the IEnumerator<T> or IEnumerator interfaces, as the compiler respects duck-typing in this case) and your // Do stuff is hoisted into that while structure.

Here's the test to measure the performance:

[TestMethod]
public void TestForEachKeyword()
{
    // Create the list.
    List<MyObject> list = CreateList(ItemsToTest);

    // Create the writer.
    using (TextWriter writer = CreateNullTextWriter())
    {
        // Create the stopwatch.
        Stopwatch s = Stopwatch.StartNew();

        // Cycle through the items.
        foreach (var item in list)
        {
            // Write the values.
            MyObjectAction(item, writer);
        }

        // Write out the number of ticks.
        Debug.WriteLine("Foreach loop ticks: {0}", s.ElapsedTicks);
    }
}

The output:

Foreach loop ticks: 3210872841

Method 2: .ForEach method on List<T>

The code for the .ForEach method on List<T> looks something like this:

public void ForEach(Action<T> action)
{
    // Error handling omitted

    // Cycle through the items, perform action.
    for (int index = 0; index < Count; ++index)
    {
        // Perform action.
        action(this[index]);
    }
}

Note that this is functionally equivalent to Method 4, with one exception, the code that is hoisted into the for loop is passed as a delegate. This requires a dereference to get to the code that needs to be executed. While the performance of delegates has improved from .NET 3.0 on, that overhead is there.

However, it's negligible. The test to measure the performance:

[TestMethod]
public void TestForEachMethod()
{
    // Create the list.
    List<MyObject> list = CreateList(ItemsToTest);

    // Create the writer.
    using (TextWriter writer = CreateNullTextWriter())
    {
        // Create the stopwatch.
        Stopwatch s = Stopwatch.StartNew();

        // Cycle through the items.
        list.ForEach(i => MyObjectAction(i, writer));

        // Write out the number of ticks.
        Debug.WriteLine("ForEach method ticks: {0}", s.ElapsedTicks);
    }
}

The output:

ForEach method ticks: 3135132204

That's actually ~7.5 seconds faster than using the foreach loop. Not completely surprising, given that it uses direct array access instead of using IEnumerable<T>.

Remember though, this translates to 0.0000075740637 seconds per item being saved. That's not worth it for small lists of items.

Method 3: while (myList.MoveNext())

As shown in Method 1, this is exactly what the compiler does (with the addition of the using statement, which is good practice). You're not gaining anything here by unwinding the code yourself that the compiler would otherwise generate.

For kicks, let's do it anyways:

[TestMethod]
public void TestEnumerator()
{
    // Create the list.
    List<MyObject> list = CreateList(ItemsToTest);

    // Create the writer.
    using (TextWriter writer = CreateNullTextWriter())
    // Get the enumerator.
    using (IEnumerator<MyObject> enumerator = list.GetEnumerator())
    {
        // Create the stopwatch.
        Stopwatch s = Stopwatch.StartNew();

        // Cycle through the items.
        while (enumerator.MoveNext())
        {
            // Write.
            MyObjectAction(enumerator.Current, writer);
        }

        // Write out the number of ticks.
        Debug.WriteLine("Enumerator loop ticks: {0}", s.ElapsedTicks);
    }
}

The output:

Enumerator loop ticks: 3241289895

Method 4: for

In this particular case, you're going to gain some speed, as the list indexer is going directly to the underlying array to perform the lookup (that's an implementation detail, BTW, there's nothing to say that it can't be a tree structure backing the List<T> up).

[TestMethod]
public void TestListIndexer()
{
    // Create the list.
    List<MyObject> list = CreateList(ItemsToTest);

    // Create the writer.
    using (TextWriter writer = CreateNullTextWriter())
    {
        // Create the stopwatch.
        Stopwatch s = Stopwatch.StartNew();

        // Cycle by index.
        for (int i = 0; i < list.Count; ++i)
        {
            // Get the item.
            MyObject item = list[i];

            // Perform the action.
            MyObjectAction(item, writer);
        }

        // Write out the number of ticks.
        Debug.WriteLine("List indexer loop ticks: {0}", s.ElapsedTicks);
    }
}

The output:

List indexer loop ticks: 3039649305

However the place where this can make a difference is arrays. Arrays can be unwound by the compiler to process multiple items at a time.

Instead of doing ten iterations of one item in a ten item loop, the compiler can unwind this into five iterations of two items in a ten item loop.

However, I'm not positive here that this is actually happening (I have to look at the IL and the output of the compiled IL).

Here's the test:

[TestMethod]
public void TestArray()
{
    // Create the list.
    MyObject[] array = CreateList(ItemsToTest).ToArray();

    // Create the writer.
    using (TextWriter writer = CreateNullTextWriter())
    {
        // Create the stopwatch.
        Stopwatch s = Stopwatch.StartNew();

        // Cycle by index.
        for (int i = 0; i < array.Length; ++i)
        {
            // Get the item.
            MyObject item = array[i];

            // Perform the action.
            MyObjectAction(item, writer);
        }

        // Write out the number of ticks.
        Debug.WriteLine("Enumerator loop ticks: {0}", s.ElapsedTicks);
    }
}

The output:

Array loop ticks: 3102911316

It should be noted that out-of-the box, Resharper offers a suggestion with a refactoring to change the above for statements to foreach statements. That's not to say this is right, but the basis is to reduce the amount of technical debt in code.


TL;DR

You really shouldn't be concerned with the performance of these things, unless testing in your situation shows that you have a real bottleneck (and you'll have to have massive numbers of items to have an impact).

Generally, you should go for what's most maintainable, in which case, Method 1 (foreach) is the way to go.

Losse answered 6/3, 2013 at 12:44 Comment(22)
+1 Excellent answer! But I would think the tl;dr part itself should not be, it is in fact very important conclusion.Chaudfroid
@KenKin Thanks, but I'm not sure what you mean by "should not be"?Losse
ah.. I think it should not be tl;dr, it's important, and must read.Chaudfroid
@KenKin I would say the TL;DR emphasizes that's the part that must be read. Also, it's a duplication of what's up top.Losse
Thanks for the explanation of tl;dr. I might misunderstand of what tl;dr mean.Chaudfroid
@KenKin Just to be explicitly clear TL;DR is used to state "if you don't want to read all that other text, this is what you should read."Losse
Great work, what's interesting is that from the links supplied earlier from Chris Gessler it looked like using Linq was slower than a foreach() but they where testing .Any() and .TakeWhile(), But from your work it shows that the linq result is the fastest, other than moving to a for(). Thanks again for your great work.Tonguing
@Tonguing No problem, but to be honest, I had to process a million objects. I didn't do it on a smaller scale because the measurements in ticks wouldn't be consistent. The point being, you aren't really gaining much in the way of performance by not using LINQ, you're getting maintainability, which more often than not, is the better trade off. You should only be doing this kind of optimization if you find that you have a problem to begin with. The deep dive is to emphasize the point that performance isn't the issue here.Losse
@Losse Yep I 100% agree, its all about readability and maintainability and I guess a bit of personal preference, unless you are processing huge numbers.Tonguing
“there's nothing to say that it can't be a tree structure” Actually there is, the documentation for the indexer says: “Retrieving the value of this property is an O(1) operation”. And I don't know about any tree that supports O(1) lookups. Though you're right that this is an implementation detail.Die
@Die Agreed, but ideally, you wouldn't ever be dealing with a List<T> but an IList<T>.Losse
If you're that concerned with performance that it matters, then maybe .NET is not the right platform for your application. Code readability trumps optimization in most applications.Milliken
@DeniseSkidmore "Performance that matters" is a pretty unsubstantiated statement, as well as highly subjective.Losse
@Losse hence it being a comment not a statement. But generally what happens inside your loop costs a lot more than the loop itself. The difference in speed between the different loops should be a small percentage of the total loop time. If you are so concerned about performance that loop types matter, then you might be more concerned about using managed memory and allowing the garbage collector to whisk away CPU cycles on it's own whim. Managed memory languages exist to make the programmer's job easier, they don't necessarily create better code, and they rarely create faster code.Milliken
@DeniseSkidmore I think you didn't read the answer. The answer was it doesn't matter. Everything else was to show why it doesn't matter, a fact you seem to have overlooked.Losse
@Losse I'm saying the same thing as the answer, don't be concerned about performance, go for maintainable code. I'm just adding if that's not an acceptable answer for you, then maybe you should be switching languages, not just trying to hyper-optimize a managed memory language.Milliken
@DeniseSkidmore I'm not sure where my acceptance comes into the picture. You realize I wrote the answer, right?Losse
@Losse you the general you, not the specific you... The reader... who is pondering the question... I'm not in any way trying to criticize the answer, I was trying to add to it.Milliken
Is there a way to iterate an array in C# without using the indexing lookup operator? What I mean is that typically in C I would do something like set a pointer to the first element and increment it each time rather than perform the lookup from the base address. Is there an equivalent way to do that in C#?Vendible
@Vendible You can always use unsafe code to do the same thing. However, I'd argue that in .NET, that will give you the same result (that the JIT compiler will compile this down to something equivalent). It should also be noted that you're doing the same thing in C; the lookup is a pointer dereference, and you're just incrementing the pointer by the size of elements in the array.Losse
@Losse - Yeah I guess I am just mistrusting of whats going on within the internals of the [] operator for a given class in .NET. Feels like there is likely to be more overhead than just a simple addition of a pointer value.Vendible
(+) its a good answer. But, the question is not "which one should be used" but is "which one is the most efficient". So I wish TL:DR part mentions which one is the fastest.Carmelo
E
4

In regards to the final bit of the question, "Did I miss any?" Yes, and I feel I would be remiss to not mention this even though the question is quite old. While those four ways of doing it will execute in relatively the same amount of time, there is a way not shown above that runs faster than all of them. Quite significantly, in fact, as the number of items in the iterated list increases. It would be the exact same way as the last method but instead of getting .Count in the condition check of the loop, you assign this value to a variable before setting up the loop and use that instead. Which leaves you with something like this:

var countVar = list.Count;
for(int i = 0; i < countVar; i++)
{
 //loop logic
}

By doing it this way, you're only looking up a variable value at each iteration, rather than resolving the Count or Length properties, which is considerably less efficient.

Employer answered 25/7, 2019 at 15:40 Comment(1)
I quite agree with this analysis. Caching the length of the list reduces the completion time and more efficient.Czarra
N
2

I would suggest an even better and not well-known approach for faster loop iteration over a list. I would recommend you to first read about Span<T>. Note that you can use it if you are using .NET Core.

List<MyObject> list = new();
foreach (MyObject item in CollectionsMarshal.AsSpan(list))
{
    // Do something
}

Be aware of the caveats:

The CollectionsMarshal.AsSpan method is unsafe and should be used only if you know what you're doing. CollectionsMarshal.AsSpan returns a Span<T> on the private array of List<T>. Iterating over a Span<T> is fast as the JIT uses the same tricks as for optimizing arrays. Using this method, it won't check the list is not modified during the enumeration.

This is a more detailed explanation of what it does behind the scenes and more, super interesting!

Ngocnguyen answered 15/9, 2022 at 13:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.