How to calculate Azure SQL Data Warehouse DWU?
Asked Answered
E

4

14

I am analyzing Azure SQL DW and I came across the term DWU (Data warehouse units). The link on Azure site only mentions a crude definition of DWU. I want to understand how DWU is calculated and how should I scale my system accordingly.

I have also referred to the link but it does not cover my question:

Extrasystole answered 25/1, 2016 at 11:38 Comment(0)
O
12

In addition to the links you found it is helpful to know that Azure SQL DW stores data in 60 different parts called "distributions". If your DW is DWU100 then all 60 distributions are attached to one compute node. If you scale to DWU200 then 30 distributions are detached and reattached to a second compute node. If you scale all the way to DWU2000 then you have 20 compute nodes each with 3 distributions attached. So you see how DWU is a measure of the compute/query power of your DW. As you scale you have more compute operating on less data per compute node.

Update: For Gen2 there are still 60 distributions but the DWU math is a bit different. DWU500c is one full size node (playing both compute and control node roles) where all 60 distributions are mounted. Scales smaller than DWU500c are single nodes that are not full size (meaning fewer cores and less RAM than full size nodes on larger DWUs). DWU1000c is 2 compute nodes each with 30 distributions mounted and there is a separate control node. DWU1500c is 3 compute nodes and a separate control node. And the largest is DWU30000c which is 60 compute nodes each with one distribution mounted.

Ott answered 26/1, 2016 at 2:26 Comment(9)
Can 100 DWU be roughly related to six cores in the underlying virtualized infrastructure?Preceptive
@Preceptive I have heard that before and the following DMV makes me believe you are right that the control node has 6 cores and each compute node has 6 cores. I'll let you know if I learn anything that contradicts that: select * from sys.dm_pdw_nodes_os_schedulers where status = 'VISIBLE ONLINE'Ott
The Gen2 core count per node is different (approximately 80 hyperthreaded cores on full size nodes). That DMV will show you the exact count on your DW.Ott
interesting Greg. were you able to find out how much memory per node as well?Hereinafter
@Hereinafter it’s around 300GB per node: learn.microsoft.com/en-us/azure/synapse-analytics/…Ott
i ran the DMV on a 500c and got 72 rows, on the 1000c I got 222 rows. Is the idea the number of rows corresponds to # schedulers, and thus # of vCPU?Hereinafter
That’s my understanding. And yes DW500c is one full size node playing both control and compute node roles. DW1000c is 2 compute nodes and one separate control node.Ott
For the 1000c then, 222/3 comes out to around 74 vCPU for each? Is that how you got to the around 80 cores per node (assuming as you go up, you get up to a max of around 80?)Hereinafter
@Hereinafter I’ve seen different node counts over time. I don’t think they specify this implementation detail since the underlying hardware can change over time.Ott
E
2

I just found this link which shows the throughput to DWU relation

Extrasystole answered 25/1, 2016 at 13:42 Comment(0)
C
1

You can also checkout the dwucalculator. This site walks you through the process of taking a capture for your existing workload and makes a recommendation on the number of DWUs necessary to fulfill the workload in Azure SQL DW.

http://dwucalculator.azurewebsites.net/

Cholesterol answered 25/8, 2016 at 15:22 Comment(1)
This link is no longer valid.Approachable
A
1

Depending on the amount of time and the number of tables, you may choose DWU.

For eg: If 100 DWU's are taking 15 mins of time for 3 tables and to implement the same in 3 mins you may choose 500 DWU.

Aloud answered 26/12, 2018 at 11:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.