Distributed Version Control Systems and the Enterprise - a Good mix? [closed]
Asked Answered
D

9

52

I can see why distributed source control systems (DVCS - like Mercurial) make sense for open source projects.

But do they make sense for an enterprise? (over a centralized Source Control System such as TFS)

What features of a DVCS make it better or worse suited for an enterprise with many developers? (over a centralized system)

Defelice answered 15/4, 2011 at 23:18 Comment(1)
Off topic, but I believe the acronym DVCS is much more common/accepted than DSCSByblow
D
96

I have just introduced a DVCS (Git in this case) in a large banking company, where Perforce, SVN or ClearCase was the centralized VCS of choices:
I already knew of the challenges (see my previous answer "Can we finally move to DVCS in Corporate Software? Is SVN still a 'must have' for development?")

I have been challenged on three fronts:

  • centralization: while the decentralized model has its merits (and allows for private commits or working without the network while having access to the full history), there still needs to be a clear set of centralized repos, acting as the main reference for all developers.

  • authentication: a DVCS allows you to "sign-off" (commit) your code as... pretty much anyone (author "foo", email "[email protected]").
    You can do a git config user.name foo, or git config user.name whateverNameIFeelToHave, and have all your commits with bogus names in it.
    That doesn't mix well with the unique centralized "Active Directory" user referential used by big enterprises.

  • authorization: by default, you can clone, push from or pull to any repository, and modify any branch, or any directory.
    For sensitive projects, that can be a blocking issue (the banking world is usually very protective of some pricing or quants algorithms, which require strict read/write access for a very limited number of people)

The answer (for a Git setup) was:

  • centralization: a unique server has been set up for any repository having to be accessible by all users.
    Backup has been taking care of (incremental every day, full every week).
    DRP (Disaster Recovery Plan) has been implemented, with a second server on another site, and with real-time data replication through SRDF.
    This setup in itself is independent of the type of referential or tool you need (DVCS, or Nexus repo, or main Hudson scheduler, or...): any tool which can be critical for a release into production needs to be installed on servers with backup and DR.

.

  • authentication: only two protocols allow users to access the main repos:
    • ssh based, with public/private key:
      • useful for users external to the organization (like off-shore development),
      • and useful for generic accounts that Active Directory manager don't want to create (because it would be an "anonymous" account): a real person has to be responsible for that generic account, and that would be the one owning the private key
    • https-based, with an Apache authenticating the users through a LDAP setting: that way, an actual login must be provided for any git operation on those repos.
      Git offers it with its smart http protocol, allowing not just pull (read) through http, but also push (write) through http.

The authentication part is also reinforced at the Git level by a post-receive hook which makes sure that at least one of the commits you are pushing to a repo has a "committer name" equals to the user name detected through shh or http protocol.
In other words, you need to set up your git config user.name correctly, or any push you want to make to a central repo will be rejected.

.

  • authorization: both previous settings (ssh or https) are wired to call the same set of perl script, named gitolite, with as parameters:
    • the actual username detected by those two protocols
    • the git command (clone, push or pull) that user wants to do

The gitolite perl script will parse a simple text file where the authorizations (read/write access for a all repository, or for branches within a given repository, or even for directories within a repository) have been set.
If the access level required by the git command doesn't match the ACL defined in that file, the command is rejected.


The above describes what I needed to implement for a Git setting, but more importantly, it lists the main issues that need to be addressed for a DVCS setting to make sense in a big corporation with a unique user base.

Then, and only then, a DVCS (Git, Mercurial, ...) can add values because of:

  • data exchange between multiple sites: while those users are all authenticated through the same Active Directory, they can be located across the world (the companies I have worked for have developments usually between teams across two or three countries). A DVCS is naturally made for exchanging efficiently data between those distributed teams.

  • replication across environments: a setting taking care of authentication/authorization allows for cloning those repositories on other dedicated servers (for integration testing, UAT testing, pre-production, and pre-deployment purposes)

  • process automation: the ease with which you can clone a repo can also be used locally on one user's workstation, for unit-testing purposes with the "guarded commits" techniques and other clever uses: see "What is the cleverest use of source repository that you have ever seen?".
    In short, you can push to a second local repo in charge of:

    • various tasks (unit test or static analysis of the code)
    • pushing back to the main repo if those tasks are successful
    • while you are still working in the first repo without having to wait for the result of those tasks.

.

  • killer features: Any DVCS comes with those, the main one being merging (ever tried to do a complex merge workflow with SVN? Or sloooowly merge 6000 files with ClearCase?).
    That alone (merging) means you can really take advantage of branching, while being able at all time to merge back your code to another "main" line of development because you would do so:
    • first locally within your own repo, without disturbing anybody
    • then on the remote server, pushing the result of that merge on the central repo.
Doradorado answered 16/4, 2011 at 10:1 Comment(1)
See also programmers.stackexchange.com/questions/85845/…Doradorado
S
1

Absolutely a distributed source model can make sense in an enterprise, but it does depend on the structure of your teams.

Distributed source control gives you the flexibility to create your own workflows.

Imagine, if you will, a larger team, within which are smaller teams working on separate feature branches.

  • These teams can all have their own central repositories, with their own build automation/checkin control mechanisms.
  • They can work anywhere, and backup their local work whenever they so desire.
  • They can then choose what checkins they'd like to share between groups.
  • They can have a single individual integrator, working on their own machine, performing merging, without impacting others.

These are things you could achieve with a traditional centralised server, but as @Brook points out, the centralised model has to scale, whereas the distributed model is already sharded, so no (or at least, less) need to vertically scale any servers.

Stuccowork answered 15/4, 2011 at 23:34 Comment(5)
You may want to read up on TFS. Team Projects can work off of feature and/or release branches. TFS2010 goes further by making it much easier to do merges, as well as tracking which branches have which bug fixes. You've always been able to merge locally.Deepset
As I said, you can do these things with a centralised server. But you can't work in a disconnected state. Also, TFS is expensive. DVCS is free.Stuccowork
you may want to add "free" to your answer, then. However, I can certainly work disconnected using TFS. What makes you think I can't?Deepset
You can not really work disconnected using TFS (or show here how to create a branch, or do a checkin, or revert a project to the state it had 3 changesets ago while disconnected)Colb
You can not check in code when disconnected in TFS. You can not revert to a previous version of your code when disconnected in TFS. You can not do a binary search for the revision that introduced a bug when disconnected in TFS. You can not shelve code when disconnected in TFS. You can not compare your code with the latest version when disconnected in TFS. You can not branch and merge when disconnected in TFS. The only thing you can do with your code when disconnected in TFS is edit it.Brinn
Z
1

To add to the other comments, I would observe that there's no reason you can't have a Corporate Central Repository. Technically it's just another repository, but it's the one you ship production from. I've been using one form or another of VCS for over 30 years and I can say that switching to Mercurial was like a city boy breathing clean country air for the first time.

Zoosporangium answered 15/4, 2011 at 23:35 Comment(0)
M
1

DSCS have a better story (generally) than centralized systems for offline or slow networks. They tend to be faster, which is really noticable for developers (using TDD) who do lots of check-ins.

Centralized systems are somewhat easier to grasp initially and might be a better choice for less experienced developers. DVCSes allow you to create lots of mini-branches and isolate new features while still doing red-gree-refactor checkin on green style of coding. Again this is very powerful but only attractive to fairly savvy development teams.

Having a single central repository for support for exclusive locks makes sense if you deal with files that are not mergable like digital assets and non-text documents (PDFs and Word etc) as it prevents you getting yourself into a mess and manually merging.

I don't think the number of developers or codebase size plays into it that much, both systems have been show to support large source trees and numbers of committers. However for large code bases and projects DVCS gives a lot of flexibility in quickly creating decentralized remote branches. You can do this with centralized systems but you need to be more deliberate about it which is both good and bad.

In short there are some technical aspects to consider but you should also think about the maturity of your team and their current process around SCCS.

Maritsa answered 15/4, 2011 at 23:35 Comment(2)
Note that TFS has proxy server support. See msdn.microsoft.com/en-us/library/ms245478.aspx. Also, what prevents the creation of "mini-branches" in TFS? It has gated check-ins, shelving, etc.Deepset
@John Saunders: A shelveset is effectively a mini-branch that is restricted to a single revision. Git/Mercurial allows ad-hoc mini-branches of any length. And gated check-ins are nothing whatsoever to do with creating mini-branches.Brinn
M
1

At least with tfs 2013 you do have the ability to work disconnected with local workspaces. Distributed vs centralized is defined by the business and depends on the needs and requirements of the projects under development.

For enterprise projects the ability to connect workflow and documents to code changes can be critical in connecting business requirements and higher order elements to specific code changes that address a specific change, bug or feature addition.

This connection between workflow and code repository separates TFS from code repository only solutions. For some places where a higher order of project auditing is required only a product like TFS would satisfy more of the project auditing requirements.

An overview of the application lifecycle management process can be found here.

http://msdn.microsoft.com/en-us/library/vstudio/fda2bad5(v=vs.110).aspx

Martineau answered 10/7, 2014 at 3:45 Comment(0)
I
1

The biggest issue we face with Git in enterprise setting is the lack of path based read-access control. It is inherent in Git's architecture (and I would assume most DVCSs) that if you got read access to repository you get the whole thing. But sometimes a project would require a sparse checkout (i.e. you want to version control sensitive data close to the source, or you want to give third party a selective view of part of the project).

Out of the box, Git provides no permissions - you've got hooks to write your own.

Most of the popular repo managers GithubEnterprise, Gitlab, Bitbucket provide branch based write restrictions. Gitolite allows to be finer grained, providing path (and more) based write restrictions.

The only repo manager I've heard of supporting read access is the Perforce Helix, which reimplements git protocol on top of perforce backend, but I have no hands-on experience with it. It is promising, but I would be concerned how compatible it is with "plain" git.

Intenerate answered 12/10, 2017 at 0:31 Comment(0)
B
0

To me the biggest thing they offer is Speed. They're orders of magnitude faster for the most common operations than centralized source control.

Working disconnected is also a huge plus.

Byblow answered 15/4, 2011 at 23:29 Comment(11)
TFS allows you to work disconnected.Deepset
@John Saunders: My experience with TFS is that it works ok if you let it know you're disconnected when you start VS, but if you lose connection once it's up it is extremely flakey. Also, unless it's new in 2010, you cannot view history, branch, merge, annotate, or check in when disconnected. So no, you really can't, at least not in the same way that you can with DVCSByblow
@John Saunders: Did you seriously just plug TFS in the comments section of every answer here? Who's payroll are you on?Byblow
@Brook: which version? I have no trouble with 2010 when I lose connection. And, you may have noticed I didn't just "plug in TFS". I commented in each case in a substantive, testable manner based on my experience with TFS. If I'm wrong about something, say so. If I'm right, then it shouldn't matter that I'm working for the Arizona Department of Education.Deepset
@John Saunders: Specifically I'm talking about working against an '08 server, and this not a problem that is specific to me or my company, just ask around. Additionally, as I said, you cannot do anything with source control when you are disconnected other than "check out", so it's not comparable to DVCS. I don't see what your purpose is in posting comments about TFS in a question about DVCS, it's off-topic at best, and trolling at worst.Byblow
@Brook: do all DVCS have the exact same feature set? Anyway, what exactly can you do disconnected with a DVCS that you can't do disconnected using TFS? And the question is about DVCS in the Enterprise, and explicitly asks for comparison with a centralized system like TFS.Deepset
@John Saunders: The OP mentioned Mercurial specifically, and Mercurial and Git have VERY similar features, so that's what I'm addressing. What can you do with a DVCS disconnected that you can't do with TFS? Branch, Merge, View History, Annotate/blame, Check-in (in other words, just about everything other than swap code with other devs, and you can even do that away from the server if you can just get connected to another dev.)Byblow
Or you can share your code with another dev using a USB storage... in other words with DVCS you can do everything while disconnected (with TFS 2010, you can do almost nothing while disconnected). TFS has many other great features (such as work items, proces templates, reporting, etc) but it the versioning arena it is just not a match.Colb
@Brook: Perforce is CVCS, and it is very very fast, for everything (branching, integrating, syncing, you name it). If speed is the need, DVCS may be good, but there are central systems (that include central benefits like strong auth, etc) that are super fast.Kimura
@Jonesome: I think you're missing the point. Perforce might be very fast when your network link is fast, but if the link is very slow, so will be your branching. With DVCS this is not an issue because for this and many other operations all you need is your local (on disk) repo.Byblow
@Brook: Yes, if you work across slow links, DVCS will give quicker performance from the developer POV.Kimura
F
0

Our team used TFS for about 3 years before switching to Mercurial. HG's branch/merge support is so much better than TFS. This is because a DVCS relies on painless merging.

Flashgun answered 16/4, 2011 at 0:13 Comment(3)
better than which version of TFS? Have you tried the branching and merging features added to TFS 2010? See msdn.microsoft.com/en-us/magazine/gg598921.aspx and msdn.microsoft.com/en-us/library/ms181423.aspxDeepset
It was TFS 2008. I have no experience w/ 2010 to compare. We are quite happy with HG and wouldn't consider switching back unless it was mandated by upper management. It's also nice because of it's disconnected nature, it's very easy for me to push a clone onto a USB drive and take work home.Flashgun
TFS 2010 Service Pack 1 still treats merges between branches not in a direct parent/child relationship as baseless merges. In other words, every difference between the two sides of the merge are reported as a conflict and there is no indication whether code was added on one side or removed on the other. Distributed source control tools do not have this limitation.Brinn
A
-1

Better synchronization across remote / disconnected locations.

Airbrush answered 15/4, 2011 at 23:27 Comment(6)
Better than what? Are you saying that this is a problem when using TFS?Deepset
I mean that you can keep multiple copies of the repository in disparate locations and let the VCS synchronize them seamlessly. I'm not saying it's a problem with TFS. I don't have such experience with TFS; can compare with systems like Subversion.Airbrush
Thanks. But why is this a good feature when compared with a central repository?Deepset
One real-world versioning scenario we are currently facing: Our customer wants us to store source codes in their system. The development team resides in our offices, but still needs to work on-site from time to time. With a DVCS there can be two “master” copies of the repository and they can be synchronized. Even non-existent direct network connection should not be an issue in case of a DVCS.Airbrush
Thanks, but still not getting why I wouldn't just give the customers a copy of the source, but keep the repository centralized. Recall that the question was about "Enterprise" use of DVCS. Your situation doesn't seem like "Enterprise" use, rather it sounds like a consulting situation.Deepset
It is an "Enterprise" use — it actually is a double-Enterprise use.Airbrush

© 2022 - 2024 — McMap. All rights reserved.