In addition to the links you found it is helpful to know that Azure SQL DW stores data in 60 different parts called "distributions". If your DW is DWU100 then all 60 distributions are attached to one compute node. If you scale to DWU200 then 30 distributions are detached and reattached to a second compute node. If you scale all the way to DWU2000 then you have 20 compute nodes each with 3 distributions attached. So you see how DWU is a measure of the compute/query power of your DW. As you scale you have more compute operating on less data per compute node.
Update: For Gen2 there are still 60 distributions but the DWU math is a bit different. DWU500c is one full size node (playing both compute and control node roles) where all 60 distributions are mounted. Scales smaller than DWU500c are single nodes that are not full size (meaning fewer cores and less RAM than full size nodes on larger DWUs). DWU1000c is 2 compute nodes each with 30 distributions mounted and there is a separate control node. DWU1500c is 3 compute nodes and a separate control node. And the largest is DWU30000c which is 60 compute nodes each with one distribution mounted.