Getting COUNT and SKIP TAKE in one operation with Linq to Entities
Asked Answered
M

4

8

I have a data call in a Linq to Entities powered data access layer that is designed to make paged calls.

In doing so, I need to select a subset of the data, say 50 rows, but also get the count of all matches to know how many total matches exist to be paged through.

Currently, I'm doing the following:

var queryResult = DatabaseContext.Table
    .Where(x => !x.IsDeleted)
    .Where(p => (
            p.PropertyOne.ToLower().Contains(query) ||
            p.PropertyTwo.ToLower().Contains(query) 
            ));

int count = queryResult.Count();

var returnData = queryResult
    .OrderBy(i => i.ID)
    .Skip(start).Take((length))
    .Select(y => new ObjectDTO
    {
        PropertyOne = y.PropertyOne,
        PropertyTwo = y.PropertyTwo
    }
    .AsEnumerable();

This results in two costly database operations. The COUNT operation for some reason actually takes longer than the SELECT operation.

Is there a way to get count and a subset in the same operation?

The logical flow to me says we do the following:

  • Look at Table
  • Find Items in Table that match a condition
  • Get a Count of all matches
  • Return a numbered subset of the matches

This seems possible in one operation, but I cannot figure out how.

Attempt One, Slower

Tried D Stanley's suggestion of casting the full result set to a List and doing count and memory in paging, but it is roughly 2x slower (6.9s avg vs 3.9s avg)

It's worth mentioning the data set is roughly 25,000 records, with over a dozen related tables that are searched in a JOIN.

Mosenthal answered 23/3, 2015 at 18:54 Comment(7)
What if you enumerate the queryResult to a List<T> and use .Count on the list, is it faster?Quadratics
FYI the count takes longer because it has to go through all the rows in your table where as the second one can stop as soon as it get start + length matches.Rodriguez
@diemaus That would pull a lot more data than is needed and likely would be slower depending on the amount of data.Rodriguez
@Rodriguez as you said, it depends on how much data you are retrieving. Why not try it out? :)Quadratics
You can try Entity Framework Extended github.com/loresoft/EntityFramework.Extended This will allow you to to do both operations in a single call to the database. It's still not likely to be quick given the nature of the query.Stereochrome
You didn't use the value of count in the code. I mean int count = queryResult.Count();Electromotive
@HuiZhao Happens later, wasn't relevant hereMosenthal
W
0

It may be possible, but it probably won't be much faster because of the criteria you're using. Since you're searching for text within a column value, you cannot use an index and thus must do a table scan. You could do a single query to get all records and do the Count and Skip/Take in linq-to-objects:

var queryResult = DatabaseContext.Table
    .Where(x => !x.IsDeleted)
    .OrderBy(i => i.ID)
    .Where(p => (
            p.PropertyOne.ToLower().Contains(query) ||
            p.PropertyTwo.ToLower().Contains(query) 
            ))
    .ToList();

int count = queryResult.Count();  // now this will be a linq-to-objects query

var returnData = queryResult
    .Skip(start).Take((length))
    .AsEnumerable();

but you'd have to try it to see if it would be any faster.

Wenda answered 23/3, 2015 at 19:4 Comment(8)
But, of course this will be problem if there are not few rows in the table.Hot
Possible, but it may still be faster that doing two table scans on the database.Wenda
Once you've called .ToList(), you've got a list in memory. Just access the Count property instead of calling the .Count() method -- though LINQ should see the underlying property and use that anyway.Catton
The only downside is I'm casting to a data transfer object with a SELECT statement I didn't show. That might change things. Let me update the question.Mosenthal
Then just add the Select to the base query - then you're only transferring the columns you need over the wire and it doesn't affect the Count.Wenda
I'll try it both ways and let you knowMosenthal
Unfortunately @DStanley, casting to the list and filtering in memory is actually slower on most tables.Mosenthal
@Catton fun fact: IEnumerable<T>.Count() internally looks at .Count on Lists (or ICollections) so the performance-impact is very slight. However; it does not do this for IEnumerable<T>.Any() which coincidentally means that on lists, .Count() is actually faster than .Any(). source: referencesource.microsoft.com/#System.Core/System/Linq/…Quadratics
A
0

How about something like this:

db.Products
   .Take(10)
   .Select(p => new 
                {
                    Total = db.Products.Count, 
                    Product = p
                })

If that's no good, it may help to know that in SQL, you get the total results and one page of them using OVER().

Allanite answered 11/6, 2022 at 9:45 Comment(0)
B
0

I have tried very hard to find an answer to this myself with no success.

I decided this is a limitation of LINQ to Entities, and made a stored procedure instead of spending more time in LINQ.

In the stored procedure, I would do something like this

--Get expensive query into temp table
SELECT *
INTO #t
FROM table
WHERE <expensive query here>

--Set the output param of your stored procedure
SET @totalRows = (SELECT COUNT(*) FROM #t);

--Select your final result set
SELECT *
FROM #t
OFFSET (0) ROWS
FETCH NEXT (50) ROWS ONLY;

--Cleanup
DROP TABLE IF EXISTS #t;
Brockbrocken answered 8/9, 2023 at 19:48 Comment(0)
D
0

Explanation:

Base Query:

The queryResult variable holds the filtered records from the database. This base query is used both for counting the total records and fetching the paginated data.

Parallel Execution:

countTask retrieves the total number of matching records with CountAsync(). returnDataTask applies pagination with Skip() and Take() and fetches the paginated data.

Performance Optimization:

Task.WhenAll(countTask, returnDataTask) executes both tasks concurrently, reducing the overall time spent compared to running them sequentially.

Projection:

The Select() method ensures that only the necessary fields (PropertyOne, PropertyTwo) are fetched, improving performance by reducing the data being transferred from the database.

    // Step 1: Filter the base query
    var queryResult = DatabaseContext.Table
        .Where(x => !x.IsDeleted)
        .Where(p => p.PropertyOne.ToLower().Contains(query) || p.PropertyTwo.ToLower().Contains(query));
    
    // Step 2: Run the COUNT and data fetch operations in parallel
    var countTask = queryResult.CountAsync(); // Get the total count of matching records
    
    var returnDataTask = queryResult
        .OrderBy(i => i.ID)
        .Skip(start).Take(length) // Pagination
        .Select(y => new ObjectDTO
        {
            PropertyOne = y.PropertyOne,
            PropertyTwo = y.PropertyTwo
        })
        .ToListAsync(); // Get paginated data
    
    // Step 3: Await both operations to complete
    await Task.WhenAll(countTask, returnDataTask);
    
    // Step 4: Retrieve the results
    int totalCount = countTask.Result;
    var returnData = returnDataTask.Result;
    
    // Step 5: Return the paginated data and total count
    return new PagedList<ObjectDTO>(returnData, totalCount);
Dynast answered 25/9 at 9:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.