For vs. Linq - Performance vs. Future
Asked Answered
C

9

46

Very brief question. I have a randomly sorted large string array (100K+ entries) where I want to find the first occurance of a desired string. I have two solutions.

From having read what I can my guess is that the 'for loop' is going to currently give slightly better performance (but this margin could always change), but I also find the linq version much more readable. On balance which method is generally considered current best coding practice and why?

string matchString = "dsf897sdf78";
int matchIndex = -1;
for(int i=0; i<array.length; i++)
{
    if(array[i]==matchString)
    {
        matchIndex = i;
        break;
    }
}

or

int matchIndex = array.Select((r, i) => new { value = r, index = i })
                         .Where(t => t.value == matchString)
                         .Select(s => s.index).First();
Charolettecharon answered 15/2, 2013 at 11:39 Comment(5)
Related: for vs. foreach vs. LINQAdama
I wouldn't even use the LINQ in this case, since you really have to fight to find the index - I'd use Array.IndexOf :)Towne
I use LINQ on large datatables (100k+ records, ~40 columns) without any performance issue.Certitude
@Mondrian I do not use Linq2Sql. I use LINQ to search, group & filter a DataTable. And DataTable isn't always a SQL operation's result.Certitude
retracted comment then.Mondrian
P
64

The best practice depends on what you need:

  1. Development speed and maintainability: LINQ
  2. Performance (according to profiling tools): manual code

LINQ really does slow things down with all the indirection. Don't worry about it as 99% of your code does not impact end user performance.

I started with C++ and really learnt how to optimize a piece of code. LINQ is not suited to get the most out of your CPU. So if you measure a LINQ query to be a problem just ditch it. But only then.

For your code sample I'd estimate a 3x slowdown. The allocations (and subsequent GC!) and indirections through the lambdas really hurt.

Propound answered 15/2, 2013 at 11:42 Comment(4)
Agreed. Linq comes at a small performance cost, but in many cases it's negligable. In fact, from what i recall most of the code behind StackOverflow uses LinqHighflown
+1 and want to add, that only 20% of code runs 80% of time, so only bottlenecks should be optimized if there is a performance problemsElsewhere
indirections through the lambdas really hurt I don't agree. Once the expression evaluated, JIT finds a way to avoid virtual function call overhead.Couchant
@Couchant the JVM HotSpot compiler often can do that. The .NET JITs never devirtualize calls, often not even if the call target type is statically known. Delegate calls are not devirtualized under any circumstances.Propound
S
37

Slightly better performance? A loop will give SIGNIFICANTLY better performance!

Consider the code below. On my system for a RELEASE (not debug) build, it gives:

Found via loop at index 999999 in 00:00:00.2782047
Found via linq at index 999999 in 00:00:02.5864703
Loop was 9.29700432810805 times faster than linq.

The code is deliberately set up so that the item to be found is right at the end. If it was right at the start, things would be quite different.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace Demo
{
    public static class Program
    {
        private static void Main(string[] args)
        {
            string[] a = new string[1000000];

            for (int i = 0; i < a.Length; ++i)
            {
                a[i] = "Won't be found";
            }

            string matchString = "Will be found";

            a[a.Length - 1] = "Will be found";

            const int COUNT = 100;

            var sw = Stopwatch.StartNew();
            int matchIndex = -1;

            for (int outer = 0; outer < COUNT; ++outer)
            {
                for (int i = 0; i < a.Length; i++)
                {
                    if (a[i] == matchString)
                    {
                        matchIndex = i;
                        break;
                    }
                }
            }

            sw.Stop();
            Console.WriteLine("Found via loop at index " + matchIndex + " in " + sw.Elapsed);
            double loopTime = sw.Elapsed.TotalSeconds;

            sw.Restart();

            for (int outer = 0; outer < COUNT; ++outer)
            {
                matchIndex = a.Select((r, i) => new { value = r, index = i })
                             .Where(t => t.value == matchString)
                             .Select(s => s.index).First();
            }

            sw.Stop();
            Console.WriteLine("Found via linq at index " + matchIndex + " in " + sw.Elapsed);
            double linqTime = sw.Elapsed.TotalSeconds;

            Console.WriteLine("Loop was {0} times faster than linq.", linqTime/loopTime);
        }
    }
}
Strenta answered 15/2, 2013 at 12:17 Comment(9)
Problem is the new operator which slows the linq query down. If the array can be converted to a list than linq can be combined with FindIndex and this time the for loop is only around 1.5 times faster. 'matchIndex = a.ToList().FindIndex(x => x.Equals(matchString));'Scholiast
changing your query to something closer to the regular loop, reduces the difference dramatically: string tst = a.First(s => matchIndex++ !=-2 && s == matchString);Doorknob
@Doorknob Well that's hardly surprising... Although in a release build on my PC, the loop is still more than 3 times faster.Strenta
On mine it varies from 1.9 to 3.2 -- no idea as to why the variation. Anyway, my point is that in your answer the two aren't really comparable -- as @Scholiast points out, you are making the LINQ query do more. Unfortunately, his solution to that is to make it do something entirely different. I really like the objective approach of your answer -- give some actual numbers. But they should be valid numbers.Doorknob
Dude! Your linq query is wrong! The correct one is the following and this one is less than 10% slower. matchIndex = a.Where(t => t == matchString).Select((r, i) => i).First();Deathwatch
@Deathwatch My answer is responding to the OP. If you have a look, my Linq query is the same as the one in the OP; therefore it cannot be "wrong". My answer specifically targets the actual question, not some other question involving different code...Strenta
@Deathwatch Nevertheless I tried your suggested Linq and it GIVES THE WRONG ANSWER, 0 instead of 999999. I suggest you actually try it. It's obvious why it's wrong, by the way. You're filtering first and THEN getting the index from the filtered results, so of course the answer is wrong.Strenta
i used your sample and made some changes, changing string to List<string> and using a.IndexOf(a.Find(o => o == matchString)); made a difference. output became "Found via linq at index 999999 in 00:00:00.0221552"Fontana
@Fontana (Sorry for the later response) Yes, so your change means that it is no longer using Linq, but rather it is using methods from List<T>, namely List<T>.IndexOf() and List<T>.Find().Strenta
X
7

LINQ, according to declarative paradigm, expresses the logic of a computation without describing its control flow. The query is goal oriented, selfdescribing and thus easy to analyse and understand. Is also concise. Moreover, using LINQ, one depends highly upon abstraction of data structure. That involves high rate of maintanability and reusability.

Iteration aproach addresses imperative paradigm. It gives fine-grained control, thus ease obtain higher performance. The code is also simpler to debug. Sometimes well contructed iteration is more readable than query.

Xenophobe answered 15/2, 2013 at 13:22 Comment(0)
E
3

There is always dilemma between performance and maintainability. And usually (if there is no specific requirements about performance) maintainability should win. Only if you have performance problems, then you should profile application, find problem source, and improve its performance (by reducing maintainability at same time, yes that's the world we live in).

About your sample. Linq is not very good solution here, because it do not add match maintainability into your code. Actually for me projecting, filtering, and projecting again looks even worse, than simple loop. What you need here is simple Array.IndexOf, which is more maintainable, than loop, and have almost same performance:

Array.IndexOf(array, matchString)
Elsewhere answered 15/2, 2013 at 12:5 Comment(0)
P
2

Well, you gave the answer to your question yourself.

Go with a For loop if you want the best performance, or go with Linq if you want readability.

Also perhaps keep in mind the possibility of using Parallel.Foreach() which would benefit from in-line lambda expressions (so, more closer to Linq), and that is much more readable then doing paralelization "manually".

Predominate answered 15/2, 2013 at 11:43 Comment(2)
I have always wondered why LINQ and lambda expressions are automatically considered more readable. Sometimes a simple foreach or for is more readable than LINQ IMOTryma
@LeeDale of course. And i would like to add my answer was regarding the Fluent-style layout of Linq, like in the question, not the declarative style.Predominate
T
2

I don't think either is considered best practice some people prefer looking at LINQ and some don't.

If performance is a issue the I would profile both bits of code for your scenario and if the difference is negligible then go with the one you feel more conformable with, after all it will most likely be you who maintains the code.

Also have you thought about using PLINQ or making the loop run in parallel?

Tryma answered 15/2, 2013 at 11:45 Comment(0)
B
2

Just an interesting observation. LINQ Lambda queries for sure add a penalty over LINQ Where queries or a For Loop. In the following code, it fills a list with 1000001 multi-parameter objects and then searches for a specific item that in this test will always be the last one, using a LINQ Lamba, a LINQ Where Query and a For Loop. Each test iterates 100 times and then averages the times to get the results.

LINQ Lambda Query Average Time: 0.3382 seconds

LINQ Where Query Average Time: 0.238 seconds

For Loop Average Time: 0.2266 seconds

I've run this test over and over, and even increase the iteration and the spread is pretty much identical statistically speaking. Sure we are talking 1/10 of a second for essentially that a million item search. So in the real world, unless something is that intensive, not sure you would even notice. But if you do the LINQ Lambda vs LINQ Where query does have a difference in performance. The LINQ Where is near the same as the For Loop.

private void RunTest()
{
    try
    {
        List<TestObject> mylist = new List<TestObject>();

        for (int i = 0; i <= 1000000; i++)
        {
            TestObject testO = new TestObject(string.Format("Item{0}", i), 1, Guid.NewGuid().ToString());
            mylist.Add(testO);
        }


        mylist.Add(new TestObject("test", "29863", Guid.NewGuid().ToString()));

        string searchtext = "test";

        int iterations = 100;

        // Linq Lambda Test
        List<int> list1 = new List<int>();
        for (int i = 1; i <= iterations; i++)
        {
            DateTime starttime = DateTime.Now;
            TestObject t = mylist.FirstOrDefault(q => q.Name == searchtext);
            int diff = (DateTime.Now - starttime).Milliseconds;
            list1.Add(diff);
        }

        // Linq Where Test
        List<int> list2 = new List<int>();
        for (int i = 1; i <= iterations; i++)
        {
            DateTime starttime = DateTime.Now;
            TestObject t = (from testO in mylist
                            where testO.Name == searchtext
                            select testO).FirstOrDefault();
            int diff = (DateTime.Now - starttime).Milliseconds;
            list2.Add(diff);
        }

        // For Loop Test
        List<int> list3 = new List<int>();
        for (int i = 1; i <= iterations; i++)
        {
            DateTime starttime = DateTime.Now;
            foreach (TestObject testO in mylist)
            {
                if (testO.Name == searchtext)
                {
                    TestObject t = testO;
                    break;
                }
            }
            int diff = (DateTime.Now - starttime).Milliseconds;
            list3.Add(diff);
        }

        float diff1 = list1.Average();
        Debug.WriteLine(string.Format("LINQ Lambda Query Average Time: {0} seconds", diff1 / (double)100));

        float diff2 = list2.Average();
        Debug.WriteLine(string.Format("LINQ Where Query Average Time: {0} seconds", diff2 / (double)100));

        float diff3 = list3.Average();
        Debug.WriteLine(string.Format("For Loop Average Time: {0} seconds", diff3 / (double)100));
    }
    catch (Exception ex)
    {
        Debug.WriteLine(ex.ToString());
    }
}

private class TestObject
{
    public TestObject(string _name, string _value, string _guid)
    {
        Name = _name;
        Value = _value;
        GUID = _guid;
    }
    public string Name;
    public string Value;
    public string GUID;
}
Brightman answered 6/2, 2020 at 15:51 Comment(2)
In what machine did you run your tests? does it matter in speed the machine that runs it? for example, if we use linq in Xamarin.Android and so we care to see the speed of ruuning applications in mobile?Pistol
The speed of the machine should be irrelevant as it comparing the speed of the different operations to one another on that same machine.Brightman
S
0

The Best Option Is To Use IndexOf method of Array Class. Since it is specialized for arrays it will b significantly faster than both Linq and For Loop. Improving on Matt Watsons Answer.

using System;
using System.Diagnostics;
using System.Linq;


namespace PerformanceConsoleApp
{
    public class LinqVsFor
    {

        private static void Main(string[] args)
        {
            string[] a = new string[1000000];

            for (int i = 0; i < a.Length; ++i)
            {
                a[i] = "Won't be found";
            }

            string matchString = "Will be found";

            a[a.Length - 1] = "Will be found";

            const int COUNT = 100;

            var sw = Stopwatch.StartNew();

            Loop(a, matchString, COUNT, sw);

            First(a, matchString, COUNT, sw);


            Where(a, matchString, COUNT, sw);

            IndexOf(a, sw, matchString, COUNT);

            Console.ReadLine();
        }

        private static void Loop(string[] a, string matchString, int COUNT, Stopwatch sw)
        {
            int matchIndex = -1;
            for (int outer = 0; outer < COUNT; ++outer)
            {
                for (int i = 0; i < a.Length; i++)
                {
                    if (a[i] == matchString)
                    {
                        matchIndex = i;
                        break;
                    }
                }
            }

            sw.Stop();
            Console.WriteLine("Found via loop at index " + matchIndex + " in " + sw.Elapsed);

        }

        private static void IndexOf(string[] a, Stopwatch sw, string matchString, int COUNT)
        {
            int matchIndex = -1;
            sw.Restart();
            for (int outer = 0; outer < COUNT; ++outer)
            {
                matchIndex = Array.IndexOf(a, matchString);
            }
            sw.Stop();
            Console.WriteLine("Found via IndexOf at index " + matchIndex + " in " + sw.Elapsed);

        }

        private static void First(string[] a, string matchString, int COUNT, Stopwatch sw)
        {
            sw.Restart();
            string str = "";
            for (int outer = 0; outer < COUNT; ++outer)
            {
                str = a.First(t => t == matchString);

            }
            sw.Stop();
            Console.WriteLine("Found via linq First at index " + Array.IndexOf(a, str) + " in " + sw.Elapsed);

        }

        private static void Where(string[] a, string matchString, int COUNT, Stopwatch sw)
        {
            sw.Restart();
            string str = "";
            for (int outer = 0; outer < COUNT; ++outer)
            {
                str = a.Where(t => t == matchString).First();

            }
            sw.Stop();
            Console.WriteLine("Found via linq Where at index " + Array.IndexOf(a, str) + " in " + sw.Elapsed);

        }

    }

}

Output:

Found via loop at index 999999 in 00:00:01.1528531
Found via linq First at index 999999 in 00:00:02.0876573
Found via linq Where at index 999999 in 00:00:01.3313111
Found via IndexOf at index 999999 in 00:00:00.7244812
Slowworm answered 28/11, 2018 at 12:34 Comment(0)
I
0

A bit of a non-answer, and really just an extension to https://stackoverflow.com/a/14894589, but I have, on and off, been working on an API-compatible replacement for Linq-to-Objects for a while now. It still doesn't provide the performance of a hand-coded loop, but it is faster for many (most?) linq scenarios. It does create more garbage, and has some slightly heavier up front costs.

The code is available https://github.com/manofstick/Cistern.Linq

A nuget package is available https://www.nuget.org/packages/Cistern.Linq/ (I can't claim this to be battle hardened, use at your own risk)

Taking the code from Matthew Watson's answer (https://stackoverflow.com/a/14894589) with two slight tweaks, and we get the time down to "only" ~3.5 time worse than the hand-coded loop. On my machine it take about 1/3 of the time of original System.Linq version.

The two changes to replace:

using System.Linq;

...

matchIndex = a.Select((r, i) => new { value = r, index = i })
             .Where(t => t.value == matchString)
             .Select(s => s.index).First();

With the following:

// a complete replacement for System.Linq
using Cistern.Linq;

...

// use a value tuple rather than anonymous type
matchIndex = a.Select((r, i) => (value: r, index: i))
             .Where(t => t.value == matchString)
             .Select(s => s.index).First();

So the library itself is a work in progress. It fails a couple of edge cases from the corefx's System.Linq test suite. It also still needs a few functions to be converted over (they currently have the corefx System.Linq implementation, which is compatible from an API perspective, if not a performance perspective). But anymore who wants to help, comment, etc would be appreciated....

Ipecac answered 19/8, 2019 at 4:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.