Select COUNT(*) of subquery without running it twice
Asked Answered
T

5

19

I've got a procedure to return a result set which is limited by page number and some other stuff. As an OUTPUT parameter I need to return a total amount of selected rows according to the parameters except the page number. So I have something like that:

WITH SelectedItems AS
(SELECT Id, Row1, Row2, ROW_NUMBER() OVER (ORDER BY Row1) AS Position
FROM Items
WHERE Row2 = @Row2)
SELECT Id, Row1, Row2
FROM SelectedItems
WHERE Position BETWEEN @From AND @To

And then I need to set the OUTPUT parameter to the number of rows in the innerquery. I can just copy the query and count it, but this query could returns thousands of rows (and will be more in the future), so I am looking for method to do that with a good performance. I was thinking about table variables, is it a good idea? Or any other suggestions?

To be more specific, it's the Microsoft SQL Server 2008.

Thank you, Jan

Tweeddale answered 24/2, 2009 at 11:34 Comment(0)
C
19

You can count the total rows as a separate column in your main query using COUNT(*). Like this:

WITH SelectedItems AS
(SELECT Id, Row1, Row2, ROW_NUMBER() OVER (ORDER BY Row1) AS Position, 
COUNT(*) OVER () AS TotalRows
FROM Items
WHERE Row2 = @Row2)
SELECT Id, Row1, Row2
FROM SelectedItems
WHERE Position BETWEEN @From AND @To

This will return the count in your result set rather than in a output parameter, but that should fit your requirements. Otherwise, combine with a temp table:

DECLARE @tmp TABLE (Id int, RowNum int, TotalRows int);

WITH SelectedItems AS
(SELECT Id, Row1, Row2, ROW_NUMBER() OVER (ORDER BY Row1) AS Position, 
COUNT(*) OVER () AS TotalRows
FROM Items
WHERE Row2 = @Row2)
INSERT @tmp
SELECT Id, Row1, Row2
FROM SelectedItems
WHERE Position BETWEEN @From AND @To

SELECT TOP 1 @TotalRows = TotalRows FROM @tmp
SELECT * FROM @tmp

You will find using a temp table for just your paged result will not use much memory (depending on your page size of course) and you're only keeping it live for a short period of time. Selecting the full result set from the temp table and selecting the TotalRows will only take a tiny bit longer.

This will be much faster than running a totally separate query, which in my test (repeating the WITH) doubled the execution time.

Chaffer answered 24/2, 2009 at 15:46 Comment(2)
+1 for Count(*) over() but wouldn't Count(1) over() would be better?Lipson
It makes no difference in performance terms, SQL internally changes count() to the same as count(1). This is the case in MySql also, where it translates count() to count(0). Although count(1) is quicker to type!Chaffer
B
3

I think you should do it in a separate query. While those two queries might look pretty much the same, but the way query optimizer deals with them would differ pretty significantly.

Theoretically, SQL Server might not even go through all the rows in the subquery to be able to count it.

Boating answered 24/2, 2009 at 11:38 Comment(0)
C
2

I don't have access to my code base right now, but I believe that you can use COUNT() OVER (or a similar command) to return the total number of rows as part of the subquery. You can then return that as part of the final result set. It gets duplicated in every row, but that's a minor performance hit in my opinion for an application that is using paging and should have limited final results anyway.

In a couple hours I'll post the exact code.

EDIT: Here's the line that I've used to generate the count. In the end our developers wanted a separate method to get the count by itself, so now I'm maintaining the search criteria in two places within the same stored procedure.

COUNT(*) OVER (PARTITION BY '') AS TotalCount

Add that to your CTE and then you can select the TotalCount and it will be a column in each of your rows.

Casmey answered 24/2, 2009 at 12:3 Comment(0)
S
2

You MUST run the entire query, without limiting the range, at least once in order to get the full row count. Since you're going to do this anyway, you should select @@RowCount in order to output the total rows found, rather than overloading your data reader with a redundant count(*) column in every row.

1. When running NEW query for first time:

select YOUR_COLUMNS 
from YOUR_TABLE 
where YOUR_SEARCH_CONDITION 
order by YOUR_COLUMN_ORDERING_LIST;
select @@rowcount;

2. Only READ the first X rows

The above query avoids flooding your SqlDataReader with a redundant COUNT(*) column, that would otherwise be sent for every call to SqlDataReader.Read(). Since you are running the query for the first time... instead of selecting a range, just READ only the first X rows. This gives you exactly what you want... the full result count, the first X records, and efficient streaming of the result set without the redundant count column.

3. For subsequent runs of the SAME query to get a subset of the results

select YOUR_COLUMNS 
from (select YOUR_COLUMNS, ROW_NUMBER() 
over(order by BY YOUR_COLUMN_ORDERING_LIST) as RowNum) Results 
where Results.RowNum between @From and @To;

In any case, @@rowcount is the most direct way to access the count on the first run of the query without limiting the result set (ur gonna want first X results anyway), without running a separate count() query, without using a temp table, and without including a redundant count() column.

Sucrase answered 17/5, 2010 at 19:25 Comment(0)
H
1

Couldn't you just set the output variable to @@RowCount? This will get the rows affected by the last executed statement:

SELECT stuff FROM mytable

SET @output = @@ROWCOUNT

This should give you what you need, and doesn't involve running the query again.

Hectare answered 25/2, 2009 at 3:13 Comment(1)
Afraid not, this would give the number of records in the selected page, and not the total records in the set.Chaffer

© 2022 - 2024 — McMap. All rights reserved.