"select count(id) from table" takes up to 30 minutes to calculate in SQL Azure
Asked Answered
Z

4

22

I have a database in SQL Azure which is not taking between 15 and 30 minutes to do a simple:

select count(id) from mytable

The database is about 3.3GB and the count is returning approx 2,000,000 but I have tried it locally and it takes less than 5 seconds!

I have also run a:

ALTER INDEX ALL ON mytable REBUILD

On all the tables in the database.

Would appreciate if anybody could point me to some things to try to diagnose/fix this.

(Please skip to UPDATE 3 below as I now think this is the issue but I still do not understand it).

UPDATE 1: It appears to take 99% of the time in a clustered index scan as image below shows. I have

enter image description here

UPDATE 2: And this is what the statistics messages come back as when I do:

SET STATISTICS IO ON
SET STATISTICS TIME ON
select count(id) from TABLE

Statistics:

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 317037 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

(1 row(s) affected)
Table 'TABLE'. Scan count 1, logical reads 279492, physical reads 8220, read-ahead reads 256018, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 297 ms,  elapsed time = 438004 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

UPDATE 3: OK - I have another theory now. The Azure portal is suggesting each time I do test this simply select query it is maxing out my DTU percentage to nearly 100%. I am using a Standard Azure SQL instance with performance level S1 (20 DTUs). Is it possible that this simple query is being slowed down by my DTU limit?

Zebe answered 14/9, 2014 at 7:47 Comment(6)
Have you checked to make sure there is no deadlock?Deepfry
Not formally but I am pretty sure there is not - I have turned off any updates so my query is the only thing which should be hitting the DB.Zebe
Q: So did you ever resolve your Azure performance problem?Bandeau
@FoggyDay Not yet, I think I have narrowed it to be something to do with my understanding of DTUs but need to find time to investigate further. I can't understand how a simply count(id) can max out my quota so I must be missing something.Zebe
I have re-stated this question more clearly here (#26073579) without all the earlier noise now it has been narrowed down to DTUs.Zebe
The main answer is that DTU is a terrible metric because some queries are IO bound and others CPU bound, and DTU is some black box "blend" of both. Upgrade to a higher performance tier, but even then you may be throttled... Seems to be a design choice by MS - it's about small transactions, not aggregation or analytics.Putout
C
13

I realize this is old, but I had the same issue. I had a table with 2.5 million rows that I imported from an on-prem database into Azure SQL and ran at S3 level. Select Count(0) from Table resulted in a 5-7 minute execution time vs milliseconds on-premise.

In Azure, index and table scans seem to be penalized tremendously in performance, so adding a 'useless' WHERE to the query that forces it to perform an index seek on the clustered index helped.

In my case, this performed almost identical Select count(0) from Table where id > 0 resulted in performance matching the on premise query.

Characharabanc answered 10/2, 2016 at 20:2 Comment(2)
WOW. This worked for me too. I have a rather complex query, which contains a subquery. When i add the "useless" where clause to the subquery it went from 22 seconds to 1 second on Azure and 600ms to 160ms locally. However, this is a query I have control over. My app also uses EF which causes me enough uncertainty to look into moving back to VM.Broca
I have a count query that already uses non-clusetered index and it takes 20 sec to count 1m rows, tried your suggestion with adding dummy PK condition but to no luck...Intrigue
B
3

Suggestion: try select count(*) instead: it might actually improve the response time:

Also, have you done an "explain plan"?

============ UPDATE ============

Thank you for getting the statistics.

You're doing a full table scan of 2M rows - not good :(

POSSIBLE WORKAROUND: query system table row_count instead:

http://blogs.msdn.com/b/arunrakwal/archive/2012/04/09/sql-azure-list-of-tables-with-record-count.aspx

select t.name ,s.row_count from sys.tables t
join sys.dm_db_partition_stats s
ON t.object_id = s.object_id
  and t.type_desc = 'USER_TABLE'
  and t.name not like '%dss%'
  and s.index_id = 1
Bandeau answered 14/9, 2014 at 7:59 Comment(11)
The select count(*) has not helped - actually, after a long period I got "A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.)" I will check your other points soon.Zebe
Have ammended my answer above after viewing also the plan - seems a clustered index scan is taking up all the time. Should this really take so long though for a simply select count(id)?Zebe
Thank you for getting the statistics. You're doing a full table scan of 2M rows - that's bad. POSSIBLE WORKAROUND: query system table "row_count" instead: blogs.msdn.com/b/arunrakwal/archive/2012/04/09/…Bandeau
Thanks for the response. The problem is, this is only a sample query - but what if I want to do something like "select count(id) from TABLE where column = 1" for example? It is still slow. Should such a simple query really be so slow? Is 2M rows really a lot for SQL Server such that it would take over 10 minutes? Do I need to manually add new indexes? I thought part of the SQL Azure benefit was minimal management.Zebe
Should have also re-iterated - This query runs in 4 seconds on my local machine! So does the index scan really explain this?Zebe
1) Adding indexes will NOT help "count()". 2) Yes, the problem is clearly Azure. But a) what exactly IS the problem with Azure? b) what can you do about it? 3) Your suggestion about maxing out DTU's is a good one. Further reading: Performance in the new Azure.Bandeau
@Zebe If column is indexed and highly selective the performance won't be slow. The Database Engine Tuning Advisor can suggest what indexes you need to improve performance of your queriesMartino
@Panagiotis Kanavos: there's no "where" clause - it's just counting all the rows in a table. Q: Do you really think an index would help? MAIN QUESTION: Do you have any ideas why Azure is so many orders of magnitude slower than ordinary MSSQL? Q: Do you think DTUs might indeed be a factor? Q: Any other suggestions for troubleshooting Azure?Bandeau
You miss the point. Comparing queries with and without criteria is pointless, the first will use no indexes, the second will use indexes. The difference can be several orders of magnitude. Create a realistic query and worry only if it is slow. As for slower - DTU or IOPS means there is IO and CPU limiting while your desktop has full and dedicated access to a 6GB/s SATA. No problem with Azure, you bought a slow-IO instance for this kind of query.Martino
No, you're missing the point. I want to do a gosh darn "select count(*)", and I'd like for it to take closer to 5 seconds than 20 minutes. It sounds like "Standard Azure SQL instance with performance level S1 (20 DTUs)" is the culprit. Caveat emptor?Bandeau
I've had similar issues on a very small 500 megabyte table - full scans on this table take milliseconds when running from a SQL Server VM locally, but in Azure on a 10 DTU instance, it takes minutes to do a simple SELECT max(column). The myth of "full table scans are bad" persists. If my workload is analysis & aggregation, SQL Azure significantly underperforms. It really seems it's designed for OLTP workloads that can be easily sharded.Putout
C
3

Quick refinement of @FoggyDay post. If your tables are partitioned, you'll want to sum the rowcount.

SELECT t.name, SUM(s.row_count) row_count
FROM sys.tables t
JOIN sys.dm_db_partition_stats s
ON t.object_id = s.object_id
  AND t.type_desc = 'USER_TABLE'
  AND t.name not like '%dss%'
  AND s.index_id = 1
GROUP BY t.name
Catharine answered 26/2, 2019 at 14:12 Comment(0)
M
0

This is an old question, but I found it while I was looking into a similar problem. If you just want to know the number of records on the table (which is what this looks like), then you could query SQL metadata as follows

SELECT row_count
FROM sys.dm_db_partition_stats
where object_id = object_id('[dbo].[mytable]') and index_id <= 1

I think this should give you the information you want but in a much more reasonable time and cost.

Margrettmarguerie answered 4/12, 2023 at 10:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.