Simple select count(id) uses 100% of Azure SQL DTUs
Asked Answered
C

3

15

This started off as this question but now seems more appropriately asked specifically since I realised it is a DTU related question.

Basically, running:

select count(id) from mytable

EDIT: Adding a where clause does not seem to help.

Is taking between 8 and 30 minutes to run (whereas the same query on a local copy of SQL Server takes about 4 seconds).

Below is a screen shot of the MONITOR tab in the Azure portal when I run this query. Note I did this after not touching the Database for about a week and Azure reporting I had only used 1% of my DTUs.

enter image description here

A couple of extra things:

  • In this particular test, the query took 08:27s to run.
  • While it was running, the above chart actually showed the DTU line at 100% for a period.
  • The database is configured Standard Service Tier with S1 performance level.
  • The database is about 3.3GB and this is the largest table (the count is returning approx 2,000,000).

I appreciate it might just be my limited understanding but if somebody could clarify if this is really the expected behaviour (i.e. a simple count taking so long to run and maxing out my DTUs) it would be much appreciated.

Contrastive answered 27/9, 2014 at 9:5 Comment(7)
The answers in the original question already cover what happens. The "simple" count is actually contrived, expensive and has to scan the entire table. Don't do that. If you had a WHERE clause that used indexed fields the optimizer would use an Index Seek operation resulting in far better performance. A seek reads the index B-Tree to find matching rows before calculating the total (ie far less IO, better speed). If you use a field with high selectivity you'll have to read far fewer index pages.Bodwell
Just FTR, the 'simple' count isn't contrived - it's what I need to do with/without various where clauses.Contrastive
Did you ever get to a satisfying answer why this is happening? I see the same thing with S2 (50 DTU) over a 127GB database with a single table of 550 million rows, and count(1) takes close to an hour.Stricture
I also see a similar thing when bulk-importing this data. I used freebcp utility, and the rate was abysmally low, something like 5000 rows per second. I wrote it off to the utility's inefficiency but I now checked, and that bulk insert also maxed out DTUs.Stricture
This DTU limit does not make sense to me: one of the comments in your other question on this topic suggested that P1 is similar to a quad core 12GB RAM server but P1 only has 125 DTU - a factor of 2.5 from what I have, and that is not enough to get the count to a few seconds range, where it should be.Stricture
Somebody said something about the pool size but I have no idea what it is and how to check or change that. Also, it does not seem to be a problem with a local DB. Any ideas?Stricture
P.S. A moderator made me repost as a comment, so I apologize for the many comments - it did not fit into one.Stricture
C
3

From the query stats in your previous question we can see:

300ms CPU time
8000 physical reads

8:30 is about 500sec. We certainly are not CPU bound. 300ms CPU over 500sec is almost no utilization. We get 16 physical reads per second. That is far below what any physical disk can deliver. Also, the table is not fully cached as evidenced by the presence of physical IO.

I'd say you are throttled. S1 corresponds to

934 transactions per minute

for some definition of transaction. Thats about 15 trans/sec. Maybe you are hitting a limit of one physical IO per transaction?! 15 and 16 are suspiciously similar numbers.

Test this theory by upgrading the instance to a higher scale factor. You might find that SQL Azure Database cannot deliver the performance you want at an acceptable price.

You also should find that repeatedly scanning half of the table results in a fast query because the allotted buffer pool seems to fit most of the table (just not all of it).

Curvature answered 8/10, 2014 at 8:4 Comment(4)
Thanks for your analysis here. I have not had time to verify this (if/when I do I will mark it as correct) but I think you are probably right here - I was just surprised that a simple count...where could take up so many DTUs. I am not super experienced with this side of SQL Server though so it might be obvious to many that this is such an expensive operation.Contrastive
Define "expensive". It requires scanning the table (or some index of it). I think it makes sense that this produces IO in proportion to the table size. It produces negligible CPU load.Curvature
I guess then I mean expensive in terms of DTUs which is essentially what you are paying for with the different Azure SQL levels.Contrastive
Th count() has nothing to do with this query being expensive. Exceeding the allotted buffer pool memory is the root cause. Any query that does this has the exact same problem. You can run a count() on a table half this big an it will run in 150ms of CPU time and probably about that much wall-clock time.Curvature
P
0

I had the same issue. Updating the statistics with fullscan on the table solved it:

update statistics mytable with fullscan
Proceleusmatic answered 12/7, 2017 at 6:47 Comment(1)
I'm cusious - what does running the above query changes? How did it solved it?Abubekr
J
-1

select count

should perform clustered index scan if one is available and its up to date. Azure SQL should update statistics automatically, but does not rebuild indexes automatically if they are completely out of date.

if there's a lot of INSERT/UPDATE/DELETE traffic on that table I suggest manually rebuilding the indexes every once in a while.

http://blogs.msdn.com/b/dilkushp/archive/2013/07/28/fragmentation-in-sql-azure.aspx

and SO post for more info

SQL Azure and Indexes

Jhansi answered 7/10, 2014 at 7:57 Comment(5)
Thanks, I have been rebuilding the indexes, I think I am using the SQL in the MSDN article you linked to. This table appears to have a CLUSTERED INDEX which is 0.2% fragmented. So as far as I can tell this isn't the issue.Contrastive
@Contrastive You are using a contrived example that's guaranteed to result in expensive reads (in IO terms). It's not representative of real queries. If you used even a single indexed field in the WHERE clause performance would be far fasterBodwell
@Contrastive I think this is where we gonna need to take a look at statistics and execution plans. here's the basic guide how to turn them on: solidq.com/tuning-sql-azure-databases-part-1Jhansi
@Jhansi Check the original question, this was already covered. You'll also find the statistics there. I think the OP should try to measure a realistic query instead of a query that's guaranteed to scan the tableBodwell
Appreciate the replies, a few things. 1) Is this example really that contrived? I need to run this or similar queries regularly and adding a where clause doesn't help. 2) Running the same query on my laptop takes < 4 seconds, on Azure it takes 8 minutes (and in some tests up to 30 minutes!). Why would it be so different? 3) When I put an additional where clause in it is still slow - currently running at over 5 minutes.Contrastive

© 2022 - 2024 — McMap. All rights reserved.