Source control in SSIS and Concurrent work on dtsx file
Asked Answered
J

4

7

I am working on building a new SSIS project from scratch. I want to work with couple of my teammates. I was hoping to get a suggestion on how we can have some have some source control, so that few of us can work concurrently on the same SSIS project (same dtsx file, building new packages.) Version: SQL Server Integration Service v11 Microsoft Visual Studio 2010

Jocund answered 11/5, 2016 at 19:17 Comment(4)
Merging changes in SSIS is nightmare fuel.Berhley
I hear ya Siyual. Looking for a better way than manually tracking dtsx file for mergesJocund
I've used TFS for SSIS version control, which works well if you only have one dev actively working on it at the same time. Otherwise, merging conflicting changes becomes a large pain.Berhley
@kushalbhola: I'd add my (bitter) experience to the other commenters. Merge and diff simply do not work for SSIS packages. Just don't even try to do it. Split packages into smaller chunks, as suggested by another commenter, and never have more than one dev working on a package.Simms
H
10

It is my experience that there are two opportunities for any source control system and SSIS projects to get out of whack: adding new items to the project and concurrent changes to an existing package.

Adding new items

An SSIS project has the .dtproj extension. Inside there, it's "just" XML defining what all belongs to the project. At least for 2005/2008 and 2012+ on the package deployment model. The 2012+ project deployment model carries a good bit more information about the state of the packages in the project.

When you add new packages (or project level connection managers or .biml files) the internal structure of the .dtproj file is going to change. Diff tools generally don't handle merging XML well. Or at all really. So, to prevent the need for merging the project definition, you need to find a strategy that works for you team.

I've seen two approaches work well. The first is to upfront define all the packages you think you'll need. DimFoo, DimDate, DimFoo, DimBar, FactBlee. Check that project and the associated empty packages in and everyone works on what is out there. When the initial cut of packages is complete, then you'll ensure everyone is sync'ed up and then add more empty packages to the project. The idea here is that there is one person, usually the lead, who is responsible for changing the "master" project definition and everyone consumes from their change.

The other approach requires communication between team members. If you discover a package needs to be added, communicate with your mates "I need to add a new package - has anyone modified the project?" The answer should be No. Once you've notified that a change to the project definition is coming, make it and immediately commit it. The idea here is that people commit and sync/check in whatever terminology with great frequency. If you as a developer don't keep your local repository up to date, you're going to be in for a bad time.

Concurrent edits

Don't. Really, that's about it. The general problem with concurrent changes to an SSIS package is that in addition to the XML diff issue above, SSIS also includes layout data alongside tasks so I can invert the layout and make things flow from bottom to top or right to left and there's no material change to SSIS package but as Siyual notes "Merging changes in SSIS is nightmare fuel"

If you find your packages are so large and that developers need to make concurrent edits, I would propose that you are doing too much in there. Decompose your packages into smaller, more tightly focused units of work and then control their execution through a parent package. That would allow a better level of granularity to your development and debugging process in addition to avoiding the concurrent edit issue.

Happenstance answered 11/5, 2016 at 19:41 Comment(3)
I totally agree with all this, but want to underscore the point about concurrent edits. If you come to a point where two people need to edit the same package, it's doing too much. You'll get more efficient and maintainable code if you treat SSIS packages like classes - they should have a single concern like extracting, cleansing, conforming or loading, but not do any of these activities together.Doscher
Excellent summary. SSIS and source-control don't really work well together at all, because of the layout information in the file. You only have to click on a dataflow to see what's inside to get an automatic checkout, which you have to then remember to undo (or did I make any changes? Did I? Diff.. doesn't work). The potential for mistakes is enormous, and so you have to substitute clear, hard human procedures (of the kind billinkc suggests) for source-control's inadequacies with SSIS.Simms
Agree that packages lend themselves to one editor at a time, but you can use smart diff to diff your SSIS packages from the free BIDS Helper (BI Developer Extensions) for Visual Studio - bideveloperextensions.github.io I'm using it and its a godsend when trying to compare packages - it pre-sorts the elements and ignores the fluff.Vertical
C
1

A dtsx file is basically just an xml file. Compare it to a bunch of people trying to write the same book. The solution I suggest is to use Team Foundation Server as a source control. That way everyone can check in and out and merge packages. If you really dont have that option try to split your ETL process in logical parts and at the end create a master package that calls each sub packages in the right order.

An example: Let's say you need to import stock data from one source, branches and other company information from an internal server and sale amounts from different external sources. After u have all information gathered, you want to connect those and run some analyses.

You first design the target database entities that you need and the relations. One of your member creates a package that does all the import to staging tables. Another guy maybe handles external sources and parallelizes / optimizes the loading. You would build a package that in merges your staging and production tables, maybe historicizing and so on. At the end you have a master package that calls each of the mentioned packages and maybe some additional logging or such.

Childs answered 11/5, 2016 at 19:47 Comment(0)
F
0

In our multi-developer operation, we follow this rough plan:

  • Each dev has their own branch, separate from master branch
  • Once a week, devs push all their changes to remote
  • One of us pulls all changes, and merges all branches into master, manually resolving .dtproj conflicts as we go
  • Merge master in all dev branches - now all branches agree
  • Test in VS
  • Push all branches to remote, other devs can now pull and keep working

It's not a perfect solution, but it helps quarantine the amount of merge pain we have to experience.

Fruin answered 17/9, 2019 at 3:54 Comment(0)
R
0

We have large ssis solutions with 20+ packages in one solution, with TFS Git. One project required adding a bunch of new packages to the existing solution. We thought we were smart and knew to assign only one person to work on each new package, 2 people working on the same package would be suicide. Wasn't good enough. When 2 people tried add a different named, new, package at the same time, each showed dtproj as a file that had changed/needed to be checked in and suddenly I found myself looking at the xml for dtproj and trying to figure out which lines to keep (Microsoft should never ask end users to manually edit their internal files, which only they wrote and understand). Billinkc's solutions here are very good and the problem is very real. You may think that Microsoft is the great Wise One, and that your team can always add new packages to an existing solution without conflicts, but you'd be wrong. It also doesn't work to put dtproj in .gitignore. If you do that, you won't see other peoples new packages (actually the .dtsx file will come down in git, but you won't see that package in Solution Explorer because dtproj is what feeds Solution Explorer). This is a current problem (2021) and we are using Visual Studio 2017 Enterprise with SSDT.

To explain this problem to people, git obviously can handle a group of independent, individual files in a directory (like say .bat files) and can add, change, and delete those files easily. The problem comes in when you have a file that is naming, describing, and counting all the files in a directory (what dtproj does). When you have a file like dtproj you are creating a conflict on dtproj itself, when 2 people try to a add a new package at the same time. Your dtproj file has a line that shows the package you added, and my dtproj file shows the package I added, and tfs/git sees that as a Conflict.

Some are suggesting ways to deal with this if you have to add a lot of new packages, my idea is a little different. For the people who have to add new packages, don't work in the primary solution where this problem is, work somewhere else. Probably best to work in the "Projects" directory you get when you install Visual Studio, outside of TFS/Git. Obviously follow all the standards, Variable naming, and Package Configuration conventions for the target Solution. Then when the new packages are ready, give the .dtsx files to your Solution Gatekeeper for them to check in. Only the Gatekeeper can check in new packages using Add From Existing, avoiding conflicts. Once the package is checked in, developers can work on them in the main Solution.

Rumpus answered 17/3, 2021 at 21:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.