SSIS vs. DTS performance
Asked Answered
F

1

4

Seems crazy to be doing this at this late date, but...

I am rebuilding some ETL infrastructure with a Rocket Software UniVerse source and an SQL destination. The old destination platform was SQL 2000 on Windows Server 2003, the new platform is SQL 2012 on Windows Server 2012. In both cases, an ODBC driver is used to connect to the source. Everything seems to work fine on the new platform, but the execution time for a package is exponentially slower. For example, one table with roughly 1.3 Million rows and 28 Columns takes about an hour using SQL 2000/DTS and over 3.5 hours using SQL 2012/SSIS. Both SQL servers are virtualized on Xen Server, the 2012 server has more RAM and more vCPUs, neither machine has an advantage in disk infrastructure. No metrics (Memory, disk IO, etc.) are red-lining (or really even coming close) on the 2012 server during package execution.

I have read several forum posts describing the same scenario, but none really seemed to have a solution that works for me. Since all of these posts were quite dated (most of these conversions from DTS to SSIS happened in the SQL 2005 days), I was curious if there was any fresher info out there.

The packages are very simple table copies, no transforms. I am using a "SELECT column, column,.. FROM sourcetable" for my source connection and 'Table or View - Fast Load' for my destination. The slow down APPEARS to be on the source side of the equation, though I can't be certain.

Any help appreciated.

Fuller answered 31/5, 2013 at 19:8 Comment(3)
You have my condolences on still working with DTS ;) That said, it might be interesting to remove your current destination and pipe those results to a script task or something that can act as a data sink without having an impact on performance. The goal being to see what's the theoretical max rate SSIS is able to pull data from your source. Run that N times from the server to establish a baseline. If it's comparable to the time with the destination, well that'd be interesting... My next place to look would be to see whether changing buffer row count to a lower value improves performanceImmixture
Article that talks about an approach for tardy SQL Server source sqlblog.com/blogs/rob_farley/archive/2011/02/17/…Immixture
billinkc, I don't know how I reward you for this, but you are my hero for the day and now I am off to the weekend with a feeling of triumph in place of defeat! Rather counter-intuitive to shrink the buffer, but that was the magic bullet. I will do some fine tuning next week, but reducing the from the default of 10,000 rows to 10 reduced the execution time of one package from an average of an hour and change to 23 minutes, equivalent to the old DTS times. Can't thank you enough!Fuller
I
4

One option to investigate is lowering the buffer size in your data flow. By default, it's set at 10k rows. If you have a slow data source, it can take quite a while to fill up the "bucket" of data just to start sending a batch of information down to the destination. While it might seem counterintuitive, lowering that number can increase performance as 5k, or 1k or 100 rows of data fill the bucket much sooner. That data then gets shuffled through the data flow and lands in the source while bucket 2, 3, etc are being filled.

If you have a SQL Server source, you can optimize your query by hinting that you'd like a fast N rows, which you'd align with your SSIS package's row size.

See Rob Farley's article for more details about that.

Immixture answered 1/6, 2013 at 0:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.