I have a 350MB table that's fairly wide with two varchar(2000) columns. Via an SSIS data flow it takes 60 minutes to load via OLEDB "fast load" destination to Azure SQL DW. I changed the destination on that data flow to be the Azure Blob Destination (from the SSIS Azure feature pack) and that same data flow completed in 1.5 minutes (and Polybase from that new flat file takes about 2 minutes).
For another source I have an existing 1GB flat file. SSIS data flow into an OLEDB destination in Azure SQL DW takes 90 minutes. Copy the file to blob storage and Polybase load takes 5 minutes.
SSIS is SSIS 2014 and it's running on an Azure VM in the same region as Azure SQL DW. I know that bulk load is much slower than Polybase since bulk load funnels through the control node but Polybase is parallelized on all compute nodes. But those bulk load numbers are extremely slow.
What are the optimal settings for the SSIS data flow and destination in order to load to an Azure SQL DW stage table as fast as possible via bulk load? Particularly I'm interested in the optimal value for the following settings in addition to any other settings I'm not considering:
- Stage table geometry = HEAP (is the fastest I believe)
- Data flow settings:
- DefaultBufferMaxRows = ?
- DefaultBufferSize = ?
- OLEDB destination settings
- Data access mode = Table or view - fast load
- Keep Identity = unchecked
- Keep Nulls = ?
- Table Lock = ?
- Check constraints = ?
- Rows per batch = ?
- Maximum insert commit size = ?