Optimal SSIS data flow settings to load to stage table in Azure SQL DW
Asked Answered
S

1

7

I have a 350MB table that's fairly wide with two varchar(2000) columns. Via an SSIS data flow it takes 60 minutes to load via OLEDB "fast load" destination to Azure SQL DW. I changed the destination on that data flow to be the Azure Blob Destination (from the SSIS Azure feature pack) and that same data flow completed in 1.5 minutes (and Polybase from that new flat file takes about 2 minutes).

For another source I have an existing 1GB flat file. SSIS data flow into an OLEDB destination in Azure SQL DW takes 90 minutes. Copy the file to blob storage and Polybase load takes 5 minutes.

SSIS is SSIS 2014 and it's running on an Azure VM in the same region as Azure SQL DW. I know that bulk load is much slower than Polybase since bulk load funnels through the control node but Polybase is parallelized on all compute nodes. But those bulk load numbers are extremely slow.

What are the optimal settings for the SSIS data flow and destination in order to load to an Azure SQL DW stage table as fast as possible via bulk load? Particularly I'm interested in the optimal value for the following settings in addition to any other settings I'm not considering:

  • Stage table geometry = HEAP (is the fastest I believe)
  • Data flow settings:
    • DefaultBufferMaxRows = ?
    • DefaultBufferSize = ?
  • OLEDB destination settings
    • Data access mode = Table or view - fast load
    • Keep Identity = unchecked
    • Keep Nulls = ?
    • Table Lock = ?
    • Check constraints = ?
    • Rows per batch = ?
    • Maximum insert commit size = ?
Stemware answered 17/3, 2016 at 6:0 Comment(0)
S
7

Polybase is certainly the fastest way to load to SQL DW. HEAP as you suggested is also the fastest destination type. Take a look at this article from the SQL CAT team on best practices for loading to Clustered Columnstore using SSIS. The recommendation from the engineering team here is to try adjusting DefaultBufferMaxRows (default is 10K), DefaultBufferSize (default is 10 MB), Rows per batch, and Maximum insert commit size.

Many years ago I did extensive performance testing of SSIS to our on premise version of Azure SQL Data Warehouse, PDW also known as Parallel Data Warehouse or APS, Appliance Platform System. In that testing I often found that the local CPU was the bottleneck, specifically a single core. This could clearly be seen using Perfmon if you monitor the CPU Utilization by core.

There were a couple of things I was able to do to improve throughput. If you are CPU bound on a single core, running multiple concurrent SSIS packages will enable you to utilize more cores and will run faster. To do this you will need to have your source files broken into multiple files and the destination should be multiple tables. If you partition your destination table and each load contains a different partition, you could use partition switching after loading your data in order to consolidate it to a single table.

You can also try creating multiple data flows in your package, which will achieve the same performance as running multiple SSIS loaders in parallel, but I believe you will still need to have your source file broken into multiple files as well as the destination, multiple tables to maximize throughput.

Another approach I tried was having parallel loaders inside one data flow. While this was faster than one loader, it was slower than the prior two approaches I mentioned above.

I also found that if I had SSIS do the char to binary char conversion, that this sped up loads. Also, using a source of SQL was faster than using a text file as a source.

Another thing you can try is the SSIS Balanced Data Distributor. BDD is another way to utilize multiple cores on your source system without having to run multiple concurrent SSIS packages.

When you run your SSIS packages, do monitor the CPU using perfmon to see if you are running on a single core or spread over multiple cores. If you are pegging a single core, then that is most likely your bottleneck.

Also, regarding the VARCHAR(2000) columns. If you don't truly expect your incoming data to be this size, then reduce the size of your VARCHAR columns. While we will improve this behavior in the future, currently our data movement service will pad out your VARCHAR data to a fixed length. This of course means that more data is being moved than needed if the widest value is much less than 2000 characters.

I hope this helps.

Stellastellar answered 26/3, 2016 at 5:57 Comment(1)
Thanks Sonya. On the data flow which was taking 60 minutes, switching the stage table from columnstore to HEAP made it 2-3x faster and maxing out DefaultBufferSize (which due to the width of the row resulted in 10,000 row buffers even if DefaultBufferMaxRows was 100,000) made it about another 2-3x faster. So now it's running in under 8 minutes. BDD didn't make a significant difference in this particular test (DWU400 with a mediumrc user). The other data flow destination settings I tested didn't make a significant difference either. I think we have found the top two culprits.Stemware

© 2022 - 2024 — McMap. All rights reserved.