It seems like there is a common misconception about what LINQ GroupBy
does and what SQL GROUP BY
is able to do. Since I fell into the exact same trap and had to wrap my head around this recently, I decided to write a more thorough explanation of this issue.
Short answer:
The LINQ GroupBy
is much different from the SQL GROUP BY
statement: LINQ just divides the underlying collection into chunks depending on a key, while SQL additionally applies an aggregation function to condense each of these chunks down into a single value.
This is why EF has to perform your LINQ-kind GroupBy
in memory.
Before EF Core 3.0, this was done implicitly, so EF downloaded all result rows and then applied the LINQ GroupBy
. However, this implicit behavior might let the programmer expect that the entire LINQ query is executed in SQL, with potentially enormous performance impact when the result set is rather large. For this reason, implicit client side evaluation of GroupBy
was disabled completely in EF Core 3.0.
Now it is required to explicitly call functions like .AsEnumerable()
or .ToList()
, which download the result set and continue with in-memory LINQ operations.
Long answer:
The following table solvedExercises
will be the running example for this answer:
+-----------+------------+
| StudentId | ExerciseId |
+-----------+------------+
| 1 | 1 |
| 1 | 2 |
| 2 | 2 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
+-----------+------------+
A record X | Y
in this table denotes that student X
has solved exercise Y
.
In the question, a common use case of LINQ's GroupBy
method is described: Take a collection and group it into chunks, where the rows in each chunk share a common key.
In our example, we might want to get a Dictionary<int, List<int>>
, which contains a list of solved exercises for each student. With LINQ, this is very straightforward:
var result = solvedExercises
.GroupBy(e => e.StudentId)
.ToDictionary(e => e.Key, e => e.Select(e2 => e2.ExerciseId).ToList());
Output (for full code see dotnetfiddle):
Student #1: 1 2
Student #2: 2
Student #3: 1 2 3
This is easy to represent with C# datatypes, since we can nest List
and Dictionary
as deep as we like to.
Now we try to imagine this as an SQL query result. SQL query results are usually represented as a table, where we can freely choose the returned columns. To represent our above query as SQL query result, we would need to
- generate multiple result tables,
- put the grouped rows into an array or
- somehow insert a "result set separator".
As far as I know, none of these approaches is implemented in practice. At most, there are some hacky work-arounds like MySQL's GROUP_CONCAT
, which allows to combine the result rows into a string (relevant SO answer).
Thus we see, that SQL cannot yield results that match LINQ's notion of GroupBy
.
Instead, SQL only allows so-called aggregation: If we, for example, wanted to count how many exercises have been passed by a student, we would write
SELECT StudentId,COUNT(ExerciseId)
FROM solvedExercises
GROUP BY StudentId
...which will yield
+-----------+-------------------+
| StudentId | COUNT(ExerciseId) |
+-----------+-------------------+
| 1 | 2 |
| 2 | 1 |
| 3 | 3 |
+-----------+-------------------+
Aggregation functions reduce a set of rows into a single value, usually a scalar. Examples are row count, sum, maximum value, minimum value, and average.
This is implemented by EF Core: Executing
var result = solvedExercises
.GroupBy(e => e.StudentId)
.Select(e => new { e.Key, Count = e.Count() })
.ToDictionary(e => e.Key, e => e.Count);
generates the above SQL. Note the Select
, which tells EF which aggregation function it should use for the generated SQL query.
In summary, the LINQ GroupBy
function is much more general than the SQL GROUP BY
statement, which due to SQL's restrictions only allows to return a single, two dimensional result table. Thus, queries like the one in the question and the first example in this answer have to be evaluated in memory, after downloading the SQL result set.
Instead of implicitly doing this, in EF Core 3.0 the developers chose to throw an exception in this case; this prevents accidental downloading of an entire, potentially large table with millions of rows, which might get unnoticed during development due to a small test database.
.AsEnumerable()
or.ToList()
beforeGroupBy
to bypass the ef core query translator's bugs. You can useWhere
clause first to fetch as minimum data as possible – Werewolf