LINQ to SQL using GROUP BY and COUNT(DISTINCT)
Asked Answered
P

6

67

I have to perform the following SQL query:

select answer_nbr, count(distinct user_nbr)
from tpoll_answer
where poll_nbr = 16
group by answer_nbr

The LINQ to SQL query

from a in tpoll_answer 
where a.poll_nbr = 16 select a.answer_nbr, a.user_nbr distinct 

maps to the following SQL query:

select distinct answer_nbr, distinct user_nbr
from tpoll_answer
where poll_nbr = 16

So far, so good. However the problem raises when trying to GROUP the results, as I'm not being able to find a LINQ to SQL query that maps to the first query I wrote here (thank you LINQPad for making this process a lot easier). The following is the only one that I've found that gives me the desired result:

from answer in tpoll_answer where answer.poll_nbr = 16 _
group by a_id = answer.answer_nbr into votes = count(answer.user_nbr)

Which in turns produces the follwing ugly and non-optimized at all SQL query:

SELECT [t1].[answer_nbr] AS [a_id], (
    SELECT COUNT(*)
    FROM (
        SELECT CONVERT(Bit,[t2].[user_nbr]) AS [value], [t2].[answer_nbr], [t2].[poll_nbr]
        FROM [TPOLL_ANSWER] AS [t2]
        ) AS [t3]
    WHERE ([t3].[value] = 1) AND ([t1].[answer_nbr] = [t3].[answer_nbr]) AND ([t3].[poll_nbr] = @p0)
    ) AS [votes]
FROM (
    SELECT [t0].[answer_nbr]
    FROM [TPOLL_ANSWER] AS [t0]
    WHERE [t0].[poll_nbr] = @p0
    GROUP BY [t0].[answer_nbr]
    ) AS [t1]
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [16]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 3.5.30729.1

Any help will be more than appreciated.

Philippopolis answered 15/1, 2009 at 19:51 Comment(0)
E
106

There isn't direct support for COUNT(DISTINCT {x})), but you can simulate it from an IGrouping<,> (i.e. what group by returns); I'm afraid I only "do" C#, so you'll have to translate to VB...

 select new
 {
     Foo= grp.Key,
     Bar= grp.Select(x => x.SomeField).Distinct().Count()
 };

Here's a Northwind example:

    using(var ctx = new DataClasses1DataContext())
    {
        ctx.Log = Console.Out; // log TSQL to console
        var qry = from cust in ctx.Customers
                  where cust.CustomerID != ""
                  group cust by cust.Country
                  into grp
                  select new
                  {
                      Country = grp.Key,
                      Count = grp.Select(x => x.City).Distinct().Count()
                  };

        foreach(var row in qry.OrderBy(x=>x.Country))
        {
            Console.WriteLine("{0}: {1}", row.Country, row.Count);
        }
    }

The TSQL isn't quite what we'd like, but it does the job:

SELECT [t1].[Country], (
    SELECT COUNT(*)
    FROM (
        SELECT DISTINCT [t2].[City]
        FROM [dbo].[Customers] AS [t2]
        WHERE ((([t1].[Country] IS NULL) AND ([t2].[Country] IS NULL)) OR (([t1]
.[Country] IS NOT NULL) AND ([t2].[Country] IS NOT NULL) AND ([t1].[Country] = [
t2].[Country]))) AND ([t2].[CustomerID] <> @p0)
        ) AS [t3]
    ) AS [Count]
FROM (
    SELECT [t0].[Country]
    FROM [dbo].[Customers] AS [t0]
    WHERE [t0].[CustomerID] <> @p0
    GROUP BY [t0].[Country]
    ) AS [t1]
-- @p0: Input NVarChar (Size = 0; Prec = 0; Scale = 0) []
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 3.5.30729.1

The results, however, are correct- verifyable by running it manually:

        const string sql = @"
SELECT c.Country, COUNT(DISTINCT c.City) AS [Count]
FROM Customers c
WHERE c.CustomerID != ''
GROUP BY c.Country
ORDER BY c.Country";
        var qry2 = ctx.ExecuteQuery<QueryResult>(sql);
        foreach(var row in qry2)
        {
            Console.WriteLine("{0}: {1}", row.Country, row.Count);
        }

With definition:

class QueryResult
{
    public string Country { get; set; }
    public int Count { get; set; }
}
Extraordinary answered 16/1, 2009 at 9:42 Comment(6)
Thank you Marc. Yes, I also considered this approach, in fact a very similar query that returns the correct results. Sadly it is a shame the SQL generated is not optimal, but it works and that´s enough for the time being.Spud
@Leandro - it might be more TSQL, but I'd compare the actual query plans before claiming anything about optimal - it might well be identical to the optimiser.Extraordinary
IF the sql is like crap, why don't you write it in plain sql instead in the first place?Glandulous
This is great, thanks! It's exactly what I needed for my POCO collections :-) I'm not using a SQL backend, so plain SQL wasn't an option for me.Villegas
FYI, distinct is totally supported now.Radical
I compared both queries in execution plan and it shows linq generated query cost as 55% and the optimum query as 45%. So, sadly the query generated by linq is not optimal :(Groom
D
11

The Northwind example cited by Marc Gravell can be rewritten with the City column selected directly by the group statement:

from cust in ctx.Customers
where cust.CustomerID != ""
group cust.City /*here*/ by cust.Country
into grp
select new
{
        Country = grp.Key,
        Count = grp.Distinct().Count()
};
Deadeye answered 4/3, 2011 at 21:38 Comment(1)
I was excited about this syntax, but the SQL generated exactly the same as above cited by Marc GravellShipe
K
1

Linq to sql has no support for Count(Distinct ...). You therefore have to map a .NET method in code onto a Sql server function (thus Count(distinct.. )) and use that.

btw, it doesn't help if you post pseudo code copied from a toolkit in a format that's neither VB.NET nor C#.

Kagu answered 16/1, 2009 at 8:31 Comment(2)
Thank you Frans, I have considering creating a view or stored procedure. Thank you very much Marc.Spud
@Leandro - you might also consider a table-valued function (UDF) in place of a stored procedure; the system can get the metadata more accurately, and it is composable at the server. Only works with LINQ-to-SQL, though - not Entity Framework (AFAIK).Extraordinary
N
1

This is how you do a distinct count query. Note that you have to filter out the nulls.

var useranswercount = (from a in tpoll_answer
where user_nbr != null && answer_nbr != null
select user_nbr).Distinct().Count();

If you combine this with into your current grouping code, I think you'll have your solution.

Nates answered 16/1, 2009 at 8:58 Comment(1)
Thank you GeekyMonkey. Columns are NOT NULL, so it is one less thing to worry about. If I'm not mistaken, I think your query the total count and not a grouped by count.Spud
I
1

simple and clean example of how group by works in LINQ

http://www.a2zmenu.com/LINQ/LINQ-to-SQL-Group-By-Operator.aspx

Integumentary answered 25/12, 2010 at 11:9 Comment(3)
Sorry found nothing about count(distinct)Santanasantayana
packages.GroupBy(p => p.OrderId).Count()Holmium
That count the entire OrderId in the package. The OP asked how to COUNT(DISTINCT) on the specific group - ie. count the number of cities for each country.Shipe
S
-5

I wouldn't bother doing it in Linq2SQL. Create a stored Procedure for the query you want and understand and then create the object to the stored procedure in the framework or just connect direct to it.

Stutzman answered 23/6, 2016 at 15:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.