Which method performs better: .Any() vs .Count() > 0?
Asked Answered
S

11

727

in the System.Linq namespace, we can now extend our IEnumerable's to have the Any() and Count() extension methods.

I was told recently that if i want to check that a collection contains 1 or more items inside it, I should use the .Any() extension method instead of the .Count() > 0 extension method because the .Count() extension method has to iterate through all the items.

Secondly, some collections have a property (not an extension method) that is Count or Length. Would it be better to use those, instead of .Any() or .Count()?

yea / nae?

Surgeon answered 20/11, 2008 at 12:11 Comment(2)
Better to use Any() on Enumerables and Count on Collections. If someone feels writing '(somecollection.Count > 0)' will confuse or cause readability issues, better write it as an extension method name it Any(). Then everyone satisfied. Performance-wise as well as Readability-wise. So that all your code will have consistency and individual developer in your project need not worry about choosing Count vs Any.Younglove
You've seen Count() > 0 vs Any(), but have you seen Distinct().Count() > 1 vs Distinct().Skip(1).Any()? That latter is definitely waaaay faster for a large number of items where Count actually has to iterate over the whole set to get a count. Skip(1).Any() avoids the full enumeration. 100k iterations of the check for a 1000 element string array with 1 character strings that runs in about 4000ms for Count() > 1, runs in only 20ms for Skip(1).Any().Nakitanalani
H
877

If you are starting with something that has a .Length or .Count (such as ICollection<T>, IList<T>, List<T>, etc) - then this will be the fastest option, since it doesn't need to go through the GetEnumerator()/MoveNext()/Dispose() sequence required by Any() to check for a non-empty IEnumerable<T> sequence.

For just IEnumerable<T>, then Any() will generally be quicker, as it only has to look at one iteration. However, note that the LINQ-to-Objects implementation of Count() does check for ICollection<T> (using .Count as an optimisation) - so if your underlying data-source is directly a list/collection, there won't be a huge difference. Don't ask me why it doesn't use the non-generic ICollection...

Of course, if you have used LINQ to filter it etc (Where etc), you will have an iterator-block based sequence, and so this ICollection<T> optimisation is useless.

In general with IEnumerable<T> : stick with Any() ;-p

Hixson answered 20/11, 2008 at 12:37 Comment(21)
Marc: ICollection<T> does not actually derive from ICollection. I was surprised too, but Reflector doesn't lie.Dominick
Doesn't Any() implementation check for ICollection interface and check after for Count property?Sextillion
No, it doesn't (having checked reflector)Hixson
I think there is another reason for using Any() most of the time. It signals the precise intent of the developer. If you are not interested in knowing the number of items, but only if there are some, then somecollection.Any() is simpler and clearer than somecollection.Count > 0Prehensile
@MarcGravell Do you have any comment on Entity Framework example I've just posted?Sarson
@kape123 as already noted by another user: would need to see the SQLHixson
@MarcGravell Added... try not to look too much at generated SQL as you eyes may pop-out ;)Sarson
@huttelihut - How many developers do you know who are genuinely confused by the statement (somecollection.Count > 0) ? Was all of our code prior to the introduction of LINQ's .Any() method difficult to understand?Loxodromic
An even better option may be to create your own IsEmpty() extension method that tries to cast to collection or array (to use Count or Length), and falls back on Any(). Optimised in all cases, and makes intent clear.Grappa
@EldritchConundrum This sounds like the best option here I feelBedazzle
You: Don't ask me why it doesn't use the non-generic ICollection It would be very rare for a type that implemented generic IEnumerable<> to implement only the non-generic ICollection. Checking for both ICollection<> and ICollection would be too much.Detoxify
I think @Marc's advice has made it into Resharper, it tells you to replace Count() with Any() (I refused to heed to its advice, I prefer Count() == 0 than !Any() )Magma
@JeppeStigNielsen: If an instance of List<Cat> is passed to a method expecting an IEnumerable<Animal>, would any interface that implements Count be more "findable" than the non-generic ICollection? Note that a List<Cat> is not going to implement IList<Animal>, and the code which receives the IEnumerable<Animal> will likely know nothing about class Cat or any generics involving that type.Predecease
@Predecease Very good observation. You are absolutely right. This is because of the covariance of IEnumerable<out T>. The Linq extension Count() will check for ICollection<TSource> only, in your example that is ICollection<Animal>. That interface is not implemented, since ICollection<> cannot be covariant (it contains methods such as bool Remove(T item)). Linq Count() does not discover that ICollection<Cat> is in fact implemented.Detoxify
@Predecease I tested this for myself by making a class MyColl<T> : ICollection<T> where I write out a message if Count property or GetEnumerator() method is called. Then (new MyGenericColl<string>()).Count(); takes the shortcut; (new MyGenericColl<string>()).Count<object>(); does not.Detoxify
@Predecease Because of your comment I will update my question elsewhere on SO.Detoxify
@Loxodromic No, that's not what huttelihut is saying at all. The point is that when there is an available option that explicitly conveys what the developer is doing, that's the one to use (assuming there aren't any practical drawbacks to doing so).Hoarding
@Hoarding - I still feel that someCollection.Count > 0 is just as clear as someCollection.Any() and has the added benefit of greater performance and of not requiring LINQ. Granted, this is a very simple case and other constructs using LINQ operators will convey the developers intent much clearer than the equivalent non-LINQ option.Loxodromic
@CraigTP, don't confuse property Count with method Count(). Two different things and the question was about the method, not the property...Stash
@walther, the properties Count and Length were indeed mentioned in the question, and also in this answer as the fastest option where available. I also agree that they are probably just as clear as Any() to developers. Swear on me mum.Nigercongo
@MarcGravell, After decompiling the source code I noticed that the optimization in Count is not true for Any, do you have any idea why?Coolidge
S
79

The exact details differ a bit in .NET Framework vs .NET Core, but it also somewhat depends on what you're doing: if you're using an ICollection or ICollection<T> type (such as with List<T>) there is a .Count property that's cheap to access, whereas other types might require enumeration.

TL;DR:

Use .Count > 0 if the property exists, and otherwise .Any().

Using .Count() > 0 is never the best option, and in some cases could be dramatically slower.

This applies to both .NET Framework and .NET Core.


Now we can dive into the details..

Lists and Collections

Let's start with a very common case: using List<T> (which is also ICollection<T>).

The .Count property is implemented as:

    private int _size;

    public int Count {
        get {
            Contract.Ensures(Contract.Result<int>() >= 0);
            return _size; 
        }
    }

What this is saying is _size is maintained by Add(),Remove() etc, and since it's just accessing a field this is an extremely cheap operation -- we don't need to iterate over values.

ICollection and ICollection<T> both have .Count and most types that implement them are likely to do so in a similar way.

Other IEnumerables

Any other IEnumerable types that aren't also ICollection require starting enumeration to determine if they're empty or not. The key factor affecting performance is if we end up enumerating a single item (ideal) or the entire collection (relatively expensive).

If the collection is actually causing I/O such as by reading from a database or disk, this could be a big performance hit.


.NET Framework .Any()

In .NET Framework (4.8), the Any() implementation is:

public static bool Any<TSource>(this IEnumerable<TSource> source) {
    if (source == null) throw Error.ArgumentNull("source");
    using (IEnumerator<TSource> e = source.GetEnumerator()) {
        if (e.MoveNext()) return true;
    }
    return false;
}

This means no matter what, it's going to get a new enumerator object and try iterating once. This is more expensive than calling the List<T>.Count property, but at least it's not iterating the entire list.

.NET Framework .Count()

In .NET Framework (4.8), the Count() implementation is (basically):

public static int Count<TSource>(this IEnumerable<TSource> source)
{
    ICollection<TSource> collection = source as ICollection<TSource>;
    if (collection != null)
    { 
        return collection.Count;
    }
    int num = 0;
    using (IEnumerator<TSource> enumerator = source.GetEnumerator())
    {
        while (enumerator.MoveNext())
        {
            num = checked(num + 1);
        }
        return num;
    }
}

If available, ICollection.Count is used, but otherwise the collection is enumerated.


.NET Core .Any()

The LINQ Any() implementation in .NET Core is much smarter. You can see the complete source here but the relevant bits to this discussion:

    public static bool Any<TSource>(this IEnumerable<TSource> source)
    {
        //..snip..
        
        if (source is ICollection<TSource> collectionoft)
        {
            return collectionoft.Count != 0;
        }
        
        //..snip..

        using (IEnumerator<TSource> e = source.GetEnumerator())
        {
            return e.MoveNext();
        }
    }

Because a List<T> is an ICollection<T>, this will call the Count property (and though it calls another method, there's no extra allocations).

.NET Core .Count()

The .NET Core implementation (source) is basically the same as .NET Framework (see above), and so it will use ICollection.Count if available, and otherwise enumerates the collection.


Summary

.NET Framework

  • With ICollection:

    • .Count > 0 is best
    • .Count() > 0 is fine, but ultimately just calls ICollection.Count
    • .Any() is going to be slower, as it enumerates a single item
  • With non-ICollection (no .Count property)

    • .Any() is best, as it only enumerates a single item
    • .Count() > 0 is bad as it causes complete enumeration

.NET Core

  • .Count > 0 is best, if available (ICollection)
  • .Any() is fine, and will either do ICollection.Count > 0 or enumerate a single item
  • .Count() > 0 is bad as it causes complete enumeration
Strobile answered 24/8, 2020 at 22:41 Comment(3)
Thank you for adding the differences between .NET Framework and .NET Core. Would you mind expanding if this changed in .NET 5 and 6?Regolith
+1 I was just checking to make sure someone pointed out that .Count > 0 is different than .Count() > 0!Adrell
You can see benchmarks herePhila
S
70

Note: I wrote this answer when Entity Framework 4 was actual. The point of this answer was not to get into trivial .Any() vs .Count() performance testing. The point was to signal that EF is far from perfect. Newer versions are better... but if you have part of code that's slow and it uses EF, test with direct TSQL and compare performance rather than relying on assumptions (that .Any() is ALWAYS faster than .Count() > 0).


While I agree with most up-voted answer and comments - especially on the point Any signals developer intent better than Count() > 0 - I've had situation in which Count is faster by order of magnitude on SQL Server (EntityFramework 4).

Here is query with Any that thew timeout exception (on ~200.000 records):

con = db.Contacts.
    Where(a => a.CompanyId == companyId && a.ContactStatusId <= (int) Const.ContactStatusEnum.Reactivated
        && !a.NewsletterLogs.Any(b => b.NewsletterLogTypeId == (int) Const.NewsletterLogTypeEnum.Unsubscr)
    ).OrderBy(a => a.ContactId).
    Skip(position - 1).
    Take(1).FirstOrDefault();

Count version executed in matter of milliseconds:

con = db.Contacts.
    Where(a => a.CompanyId == companyId && a.ContactStatusId <= (int) Const.ContactStatusEnum.Reactivated
        && a.NewsletterLogs.Count(b => b.NewsletterLogTypeId == (int) Const.NewsletterLogTypeEnum.Unsubscr) == 0
    ).OrderBy(a => a.ContactId).
    Skip(position - 1).
    Take(1).FirstOrDefault();

I need to find a way to see what exact SQL both LINQs produce - but it's obvious there is a huge performance difference between Count and Any in some cases, and unfortunately it seems you can't just stick with Any in all cases.

EDIT: Here are generated SQLs. Beauties as you can see ;)

ANY:

exec sp_executesql N'SELECT TOP (1) 
[Project2].[ContactId] AS [ContactId], 
[Project2].[CompanyId] AS [CompanyId], 
[Project2].[ContactName] AS [ContactName], 
[Project2].[FullName] AS [FullName], 
[Project2].[ContactStatusId] AS [ContactStatusId], 
[Project2].[Created] AS [Created]
FROM ( SELECT [Project2].[ContactId] AS [ContactId], [Project2].[CompanyId] AS [CompanyId], [Project2].[ContactName] AS [ContactName], [Project2].[FullName] AS [FullName], [Project2].[ContactStatusId] AS [ContactStatusId], [Project2].[Created] AS [Created], row_number() OVER (ORDER BY [Project2].[ContactId] ASC) AS [row_number]
    FROM ( SELECT 
        [Extent1].[ContactId] AS [ContactId], 
        [Extent1].[CompanyId] AS [CompanyId], 
        [Extent1].[ContactName] AS [ContactName], 
        [Extent1].[FullName] AS [FullName], 
        [Extent1].[ContactStatusId] AS [ContactStatusId], 
        [Extent1].[Created] AS [Created]
        FROM [dbo].[Contact] AS [Extent1]
        WHERE ([Extent1].[CompanyId] = @p__linq__0) AND ([Extent1].[ContactStatusId] <= 3) AND ( NOT EXISTS (SELECT 
            1 AS [C1]
            FROM [dbo].[NewsletterLog] AS [Extent2]
            WHERE ([Extent1].[ContactId] = [Extent2].[ContactId]) AND (6 = [Extent2].[NewsletterLogTypeId])
        ))
    )  AS [Project2]
)  AS [Project2]
WHERE [Project2].[row_number] > 99
ORDER BY [Project2].[ContactId] ASC',N'@p__linq__0 int',@p__linq__0=4

COUNT:

exec sp_executesql N'SELECT TOP (1) 
[Project2].[ContactId] AS [ContactId], 
[Project2].[CompanyId] AS [CompanyId], 
[Project2].[ContactName] AS [ContactName], 
[Project2].[FullName] AS [FullName], 
[Project2].[ContactStatusId] AS [ContactStatusId], 
[Project2].[Created] AS [Created]
FROM ( SELECT [Project2].[ContactId] AS [ContactId], [Project2].[CompanyId] AS [CompanyId], [Project2].[ContactName] AS [ContactName], [Project2].[FullName] AS [FullName], [Project2].[ContactStatusId] AS [ContactStatusId], [Project2].[Created] AS [Created], row_number() OVER (ORDER BY [Project2].[ContactId] ASC) AS [row_number]
    FROM ( SELECT 
        [Project1].[ContactId] AS [ContactId], 
        [Project1].[CompanyId] AS [CompanyId], 
        [Project1].[ContactName] AS [ContactName], 
        [Project1].[FullName] AS [FullName], 
        [Project1].[ContactStatusId] AS [ContactStatusId], 
        [Project1].[Created] AS [Created]
        FROM ( SELECT 
            [Extent1].[ContactId] AS [ContactId], 
            [Extent1].[CompanyId] AS [CompanyId], 
            [Extent1].[ContactName] AS [ContactName], 
            [Extent1].[FullName] AS [FullName], 
            [Extent1].[ContactStatusId] AS [ContactStatusId], 
            [Extent1].[Created] AS [Created], 
            (SELECT 
                COUNT(1) AS [A1]
                FROM [dbo].[NewsletterLog] AS [Extent2]
                WHERE ([Extent1].[ContactId] = [Extent2].[ContactId]) AND (6 = [Extent2].[NewsletterLogTypeId])) AS [C1]
            FROM [dbo].[Contact] AS [Extent1]
        )  AS [Project1]
        WHERE ([Project1].[CompanyId] = @p__linq__0) AND ([Project1].[ContactStatusId] <= 3) AND (0 = [Project1].[C1])
    )  AS [Project2]
)  AS [Project2]
WHERE [Project2].[row_number] > 99
ORDER BY [Project2].[ContactId] ASC',N'@p__linq__0 int',@p__linq__0=4

Seems that pure Where with EXISTS works much worse than calculating Count and then doing Where with Count == 0.

Let me know if you guys see some error in my findings. What can be taken out of all this regardless of Any vs Count discussion is that any more complex LINQ is way better off when rewritten as Stored Procedure ;).

Sarson answered 14/6, 2012 at 23:22 Comment(3)
Would love to see some Sql Query plans that are generated by each linq-query for each scenario.Surgeon
based on the SQL, all I can say is: both queries look horrible. I knew there was a reason I normally write my own TSQL...Hixson
!Any would have to look through all rows just as Count would. That your example gives such a horrific result is a bit strange, in worst case !Any should only be a bit slower than Count. In your case I would look for ways to simplify the selection, perhaps splitting it up in stages or reordering the conditions if that is possible. But your point that the Any is better than Count rule does not hold for !Any is better than Count is a very good one.Mitziemitzl
B
36

Since this is a rather popular topic and answers differ, I had to take a fresh look on the problem.

Testing env: EF 6.1.3, SQL Server, 300k records

Table model:

class TestTable
{
    [Key]
    public int Id { get; set; }

    public string Name { get; set; }

    public string Surname { get; set; }
}

Test code:

class Program
{
    static void Main()
    {
        using (var context = new TestContext())
        {
            context.Database.Log = Console.WriteLine;

            context.TestTables.Where(x => x.Surname.Contains("Surname")).Any(x => x.Id > 1000);
            context.TestTables.Where(x => x.Surname.Contains("Surname") && x.Name.Contains("Name")).Any(x => x.Id > 1000);
            context.TestTables.Where(x => x.Surname.Contains("Surname")).Count(x => x.Id > 1000);
            context.TestTables.Where(x => x.Surname.Contains("Surname") && x.Name.Contains("Name")).Count(x => x.Id > 1000);

            Console.ReadLine();
        }
    }
}

Results:

Any() ~ 3ms

Count() ~ 230ms for first query, ~ 400ms for second

Remarks:

For my case, EF didn't generate SQL like @Ben mentioned in his post.

Bellda answered 28/5, 2015 at 8:14 Comment(2)
For a proper comparison, you should do Count() > 0. :DEinsteinium
Andrew, Count() > 0 is not going to run differently than Count() in this particular test.Electrodynamic
Q
13

EDIT: it was fixed in EF version 6.1.1. and this answer is no more actual

For SQL Server and EF4-6, Count() performs about two times faster than Any().

When you run Table.Any(), it will generate something like(alert: don't hurt the brain trying to understand it)

SELECT 
CASE WHEN ( EXISTS (SELECT 
    1 AS [C1]
    FROM [Table] AS [Extent1]
)) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT 
    1 AS [C1]
    FROM [Table] AS [Extent2]
)) THEN cast(0 as bit) END AS [C1]
FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]

that requires 2 scans of rows with your condition.

I don't like to write Count() > 0 because it hides my intention. I prefer to use custom predicate for this:

public static class QueryExtensions
{
    public static bool Exists<TSource>(this IQueryable<TSource> source, Expression<Func<TSource, bool>> predicate)
    {
        return source.Count(predicate) > 0;
    }
}
Quintanilla answered 3/1, 2014 at 16:32 Comment(3)
I noticed this too. The Any() SQL doesn't make any sense at all. I'm not sure why the they don't do: CASE WHEN(EXISTS(sql)) THEN 1 ELSE 0 END. I can't think of a reason why they need to do a NOT EXISTS in order to return 0.Contortionist
This is false. You found a bad query plan by random chance. This happens. Any is, almost always, faster.Microelectronics
I checked the sql generated in 6.1.3, them fixed it:SELECT CASE WHEN ( EXISTS (SELECT 1 AS [C1] FROM [dbo].[TestTables] AS [Extent1] WHERE [Extent1].[Id] > 1000 )) THEN cast(1 as bit) ELSE cast(0 as bit) END AS [C1] FROM ( SELECT 1 AS X ) AS [SingleRowTable1]Quintanilla
I
7

It depends, how big is the data set and what are your performance requirements?

If it's nothing gigantic use the most readable form, which for myself is any, because it's shorter and readable rather than an equation.

Idell answered 22/12, 2014 at 23:43 Comment(0)
D
4

If you are using the Entity Framework and have a huge table with many records Any() will be much faster. I remember one time I wanted to check to see if a table was empty and it had millions of rows. It took 20-30 seconds for Count() > 0 to complete. It was instant with Any().

Any() can be a performance enhancement because it may not have to iterate the collection to get the number of things. It just has to hit one of them. Or, for, say, LINQ-to-Entities, the generated SQL will be IF EXISTS(...) rather than SELECT COUNT ... or even SELECT * ....

Disfeature answered 11/3, 2019 at 5:5 Comment(0)
P
4

Using Count() to test for emptiness works, but using Any() makes the intent clearer, and the code more readable. However, there are some cases where special attention should be paid:

if the collection is an EntityFramework or other ORM query, calling Count() will cause executing a potentially massive SQL query and could put a large overhead on the application database. Calling Any() will also connect to the database, but will generate much more efficient SQL.

if the collection is part of a LINQ query that contains Select() statements that create objects, a large amount of memory could be unnecessarily allocated. Calling Any() will be much more efficient because it will execute fewer iterations of the enumerable.

Example to use Any():

private static bool IsEmpty(IEnumerable<string> strings)
{
  return !strings.Any();
}
Probable answered 18/11, 2021 at 13:31 Comment(0)
P
3

You can make a simple test to figure this out:

var query = //make any query here
var timeCount = new Stopwatch();
timeCount.Start();
if (query.Count > 0)
{
}
timeCount.Stop();
var testCount = timeCount.Elapsed;

var timeAny = new Stopwatch();
timeAny.Start();
if (query.Any())
{
}
timeAny.Stop();
var testAny = timeAny.Elapsed;

Check the values of testCount and testAny.

Passable answered 20/1, 2017 at 17:15 Comment(4)
Here is test with your code for Count property vs Any() Count property wins vs Any() with +2x - linkStyrax
For a better result, you could do these comparisons 1000 times (or more). It helps to average out the results and avoid any random spikes.Medicinal
When you are testing like the above mentioned method, you need to consider many more factors, such as load on your database/network, plan caching in database side, etc. So to do an accurate test you should design an isolated and accurate environment tooGaillard
for better comparing should be Count replaced by method Count() vs .Any() not a property. You need time of iterations.Sorci
C
2

About the Count() method, if the IEnumarable is an ICollection, then we can't iterate across all items because we can retrieve the Count field of ICollection, if the IEnumerable is not an ICollection we must iterate across all items using a while with a MoveNext, take a look the .NET Framework Code:

public static int Count<TSource>(this IEnumerable<TSource> source)
{
    if (source == null) 
        throw Error.ArgumentNull("source");

    ICollection<TSource> collectionoft = source as ICollection<TSource>;
    if (collectionoft != null) 
        return collectionoft.Count;

    ICollection collection = source as ICollection;
    if (collection != null) 
        return collection.Count;

    int count = 0;
    using (IEnumerator<TSource> e = source.GetEnumerator())
    {
        checked
        {
            while (e.MoveNext()) count++;
        }
    }
    return count;
}

Reference: Reference Source Enumerable

Capapie answered 28/3, 2018 at 17:35 Comment(0)
H
-3

I have created a sample application using IList with 100 elements to 1 millions items to see Count vs Any which is best.

Code

class Program
{
    static void Main()
    {

        //Creating List of customers
        IList<Customer> customers = new List<Customer>();
        for (int i = 0; i <= 100; i++)
        {
            Customer customer = new Customer
            {
                CustomerId = i,
                CustomerName = string.Format("Customer{0}", i)
            };
            customers.Add(customer);
        }

        //Measuring time with count
        Stopwatch stopWatch = new Stopwatch();
        stopWatch.Start();
        if (customers.Count > 0)
        {
            Console.WriteLine("Customer list is not empty with count");
        }
        stopWatch.Stop();
        Console.WriteLine("Time consumed with count: {0}", stopWatch.Elapsed);

        //Measuring time with any
        stopWatch.Restart();
        if (customers.Any())
        {
            Console.WriteLine("Customer list is not empty with any");
        }
        stopWatch.Stop();
        Console.WriteLine("Time consumed with count: {0}", stopWatch.Elapsed);
        Console.ReadLine();

    }
}

public class Customer
{
    public int CustomerId { get; set; }
    public string CustomerName { get; set; }
}

Result : enter image description here

Any is better than count.

Hennessey answered 15/6, 2021 at 5:2 Comment(2)
You're comparing .Count with .Any() and with these tiny amounts all you're measuring is the time it takes to write to the console which varies widely with each run. Without the Console.WriteLine calls, Count is faster, which really doesn't need yet more evidence.Abernon
@Hennessey - check out github.com/dotnet/BenchmarkDotNet for how to do some nice benchmarking of C# code. This will help you, heaps!Surgeon

© 2022 - 2024 — McMap. All rights reserved.