Sell me distributed revision control
Asked Answered
A

10

22

I know 1000s of similar topics floating around. I read at lest 5 threads here in SO But why am I still not convinced about DVCS?

I have only following questions (note that I am selfishly worried only about Java projects)

  • What is the advantage or value of committing locally? What? really? All modern IDEs allows you to keep track of your changes? and if required you can restore a particular change. Also, they have a feature to label your changes/versions at IDE level!?
  • what if I crash my hard drive? where did my local repository go? (so how is it cool compared to checking in to a central repo?)
  • Working offline or in an air plane. What is the big deal?In order for me to build a release with my changes, I must eventually connect to the central repository. Till then it does not matter how I track my changes locally.
  • Ok Linus Torvalds gives his life to Git and hates everything else. Is that enough to blindly sing praises? Linus lives in a different world compared to offshore developers in my mid-sized project?

Pitch me!

Anesthetize answered 1/4, 2010 at 21:33 Comment(12)
Seems like you're looking for an argument rather than honestly seeking to be convinced. Git isn't for everyone, nor is it for every project. As you said there are thousands of topics like this, and if in reading all that you're not convinced, then don't use it.Manslayer
@Manslayer - Liked your response and, yes that is why I haven't voted for switching to DVCS in my org. No I am not looking for an argument. All I am looking for is a clear and concise answer for my first 3 questions.Anesthetize
Joel Spolsky makes a pretty good case: www.hginit.comFluidize
@ring Not enough for a full answer but about "how is it cool compared to central repo" It's cool because if your building burns down with your server and backups in it, then hopefully someone took their laptop home. If they did, then you didn't lose any of your source code. Note, I'm not even a Git user and I think that's cool.Cateran
Well no! where did all the money go that we spent on disaster recovery? What about all that daily back ups / tapes/ disks and tape rotationAnesthetize
@Cateran I have no position on the original question, but formal, intentional off-site backups are done by all serious organizations. Hope-somebody-took-their-laptop-home backups are really no backup at all. If a company is together enough to use any type of RCS in the first place, they'd better be beyond faith based backup systems. Not that the laptop isn't a belt and suspenders solution for when the earthquake buries the off-site backup as well as your office, but how far do you want to take it? A weak argument IMO.Appellation
Yea, I see your point. But I'm also saying developers can go on developing from their laptops whenever the office does catch on fire. Maybe there are more important things to worry about than programmers having a vacation when it burns, but stillCateran
Don't forget about locking for binary files. If you use a DVCS, you give up this feature. (This makes a DVCS a non-starter for our company.)Thunell
Going to the backups. Nobody said that one shouldn't do intentional backups in DVCS. It's just so much easier to do, and to recover later on. That's a really big benefit. Binary files: I'm familiar with the problem, but I've also always wondered why to store e.g. documents in MS-binary formats? You kind of lose all the traceability. If I ever have a start up, all binary formats shall be prohibited, except maybe for pictures...Barony
@aapeli Not everything in version control is source code. You don't lose traceability. You still know who edited the file, when it was edited, and comments on the change. In our company, we use Subversion to track CAD design files, excel spreadsheets, Visio diagrams, etc. While some of these things have XML/ASCII equivalents, it is still not mergable, so without file locking, you can lose hours of work.Thunell
You're right about different file formats, and there are uses for them. However, using binary formats, where it's not necessary is just silly (not trying to pick a fight here, just my opinion). E.g. since MS decided to use XML-based format for office docs, then would it not be nice to store those as XML instead of non-mergeable zip format? Or, alternatively provide some context-sensitive merge tools where it's feasible.Barony
@aapeli In general, a VCS works better for text files. Actually MS Word documents are pretty easy to deal with, because Word has it's own merge tools. (Try doing a diff of a .doc file with TortoiseSVN.) The thing is storing as XML still doesn't always solve the problem. If the file is an unintelligible mess of auto-generated XML, your diff/merge tools don't do much good.Thunell
C
13

Reliability

If your harddisk silently starts corrupting data, you damn well want to know about it. Git takes SHA1 hashes of everything you commit. You have 1 central repo with SVN and if its bits get silently modified by a faulty HDD controller you won't know about it till it's too late.

And since you have 1 central repo, you just blew your only lifeline.

With git, everyone has an identical repo, complete with change history, and its content can be fully trusted due to SHA1's of its complete image. So if you back up your 20 byte SHA1 of your HEAD you can be certain that when you clone from some untrusted mirror, you have the exact same repo you lost!

Branching (and namespace pollution)

When you use a centralised repo, all the branches are there for the world to see. You can't make private branches. You have to make some branch that doesn't already collide with some other global name.

"test123 -- damn, there's already a test123. Lets try test124."

And everyone has to see all these branches with stupid names. You have to succumb to company policy that might go along the lines of "don't make branches unless you really need to", which prevents a lot of freedoms you get with git.

Same with committing. When you commit, you better be really sure your code works. Otherwise you break the build. No intermediate commits. 'Cause they all go to the central repo.

With git you have none of this nonsense. Branch and commit locally all you want. When you're ready to expose your changes to the rest of the world, you ask them to pull from you, or you push it to some "main" git repo.

Performance

Since your repo is local, all the VCS operations are fast and don't require round trips and transfer from the central server! git log doesn't have to go over the network to find a change history. SVN does. Same with all other commands, since all the important stuff is stored in one location!

Watch Linus' talk for these and other benefits over SVN.

Celanese answered 3/4, 2010 at 9:40 Comment(2)
Note that in modern versions of the SVN LOG command, the results are cached, so the command doesn't necessarily need to go over the network to find change history.Uncourtly
Downvoted. The reliability part IMO is completely wrong. If you use a centralized system (not only source control, but any centralized system) you should make backups and check the central node for corruptions. Corruptions with svn are very rare, BTW! Branching part is also wrong. You can have private shelves with Subversion.Sketchy
T
14

I have been where you are now, sceptical of the uses of distributed version control. I had read all the articles and knew the theoretical arguments, but I was not convinced.

Until, one day, I typed git init and suddenly found myself inside a git repository.

I suggest you do the same -- simply try it. Begin with a small hobby project, just to get the hang of it. Then decide if it's worth using for something larger.

Thrilling answered 3/4, 2010 at 10:7 Comment(0)
C
13

Reliability

If your harddisk silently starts corrupting data, you damn well want to know about it. Git takes SHA1 hashes of everything you commit. You have 1 central repo with SVN and if its bits get silently modified by a faulty HDD controller you won't know about it till it's too late.

And since you have 1 central repo, you just blew your only lifeline.

With git, everyone has an identical repo, complete with change history, and its content can be fully trusted due to SHA1's of its complete image. So if you back up your 20 byte SHA1 of your HEAD you can be certain that when you clone from some untrusted mirror, you have the exact same repo you lost!

Branching (and namespace pollution)

When you use a centralised repo, all the branches are there for the world to see. You can't make private branches. You have to make some branch that doesn't already collide with some other global name.

"test123 -- damn, there's already a test123. Lets try test124."

And everyone has to see all these branches with stupid names. You have to succumb to company policy that might go along the lines of "don't make branches unless you really need to", which prevents a lot of freedoms you get with git.

Same with committing. When you commit, you better be really sure your code works. Otherwise you break the build. No intermediate commits. 'Cause they all go to the central repo.

With git you have none of this nonsense. Branch and commit locally all you want. When you're ready to expose your changes to the rest of the world, you ask them to pull from you, or you push it to some "main" git repo.

Performance

Since your repo is local, all the VCS operations are fast and don't require round trips and transfer from the central server! git log doesn't have to go over the network to find a change history. SVN does. Same with all other commands, since all the important stuff is stored in one location!

Watch Linus' talk for these and other benefits over SVN.

Celanese answered 3/4, 2010 at 9:40 Comment(2)
Note that in modern versions of the SVN LOG command, the results are cached, so the command doesn't necessarily need to go over the network to find change history.Uncourtly
Downvoted. The reliability part IMO is completely wrong. If you use a centralized system (not only source control, but any centralized system) you should make backups and check the central node for corruptions. Corruptions with svn are very rare, BTW! Branching part is also wrong. You can have private shelves with Subversion.Sketchy
S
10

I'm a Mercurial developer and have worked as a Mercurial consultant. So I find your questions very interesting and hope I answer them:

  • What is the advantage or value of committing locally? [...]

You are correct that IDEs can track local changes beyond simple undo/redo these days. However, there is still a gap in functionality between these file snapshots and a full version control system.

The local commits give you the option of preparing your "story" locally before you submit it for review. I often work on some changes involving 2-5 commits. After I make commit 4, I might go back and amend commit 2 slightly (maybe I saw an error in commit 2 after I made commit 4). That way I'll be working not just on the latest code, but on the last couple of commits. That's trivially possible when everything is local, but it becomes more tricky if you need to sync with a central server.

  • what if I crash my hard drive? [...] so how is it cool compared to checking in to a central repo?

Not cool at all! :-)

However, even with a central repo, you still have to worry about the uncommited data in the working copy. I would therefore claim that you ought to have a backup solution in place anyway.

It is my experience, that people often have larger chunks of uncommited data lying around in their working copies with a centralized system. Clients told me how they were trying to convince developers to commit at least once a week.

The changes are often left uncommited because:

  1. They are not really finished. There might be debug print statements in the code, there might be incomplete functions, etc.

  2. Committing would go into trunk and that is dangerous with a centralized system since it impacts everybody else.

  3. Committing would require you to first merge with the central repository. That merge might be intimidating if you know that there has been other conflicting changes made to the code. The merge might simply be annoying because you might not be all done with the changes and you prefer to work from a known-good state.

  4. Committing can be slow when you have to talk to an overloaded central server. If you're in an offshore location, commits are even slower.

You are absolute correct if you think that the above isn't really a question of centralized versus distribted version control. With a CVCS, people can work in separate branches and thus trivially avoid 2 and 3 above. With a separate throw-away branch, I can also commit as much as I want since I can create another branch where I commit more polished changes (solving 1). Commits can still be slow, though, so 4 can apply still.

People who use DVCS will often push their "local" commits to a remote server anyway as poor man's backup solution. They don't push to the main server where the rest of the team is working, but to another (possibly private) server. That way they can work in isolation and still keep off-site backups.

  • Working offline or in an air plane. [...]

Yeah, I never liked that argument either. I have good Internet connectivity 99% of the time and don't fly enough for this to be an issue :-)

However, the real argument is not that you are offline, but that you can pretend to be offline. More precisely, that you can work in isolation without having to send your changes to a central repository immediately.

DVCS tools are designed around the idea that people might be working offline. This has a number of important consequences:

  • Merging branches become a natural thing. When people can work in parallel, forks will naturally occur in the commit graph. These tools must therefore be really good at merging branches. A tool such a SVN is not very good at merging!

    Git, Mercurial, and other DVCS tools merge better because they have had more testing in this area, not directly because they are distributed.

  • More flexibility. With a DVCS, you have the freedom to push/pull changes between arbitrary repositories. I'll often push/pull between my home and work computers, without using any real central server. When things are ready for publication, I push them to a place like Bitbucket.

    Multi-site sync is no longer an "enterprise feature", it's a built-in feature. So if you have an off-shore location, they can setup a local hub repository and use this among themselves. You can then sync the local hubs hours, daily, or when it suits you. This requires nothing more than a cronjob that runs hg pull or git fetch at regular intervals.

  • Better scalability since more logic is on the client-side. This means less maintenance on the central server, and more powerful client-side tools.

    With a DVCS, I expect to be able to do a keyword search through revisions of the code (not just the commit messages). With a centralized tool, you normally need to setup an extra indexing tool.

Springfield answered 12/4, 2014 at 10:49 Comment(1)
+1 for Multi-site being built-in. This can really be a game changer. I work for a company with offices that are 500 km away from each other, and we currently have an SVN server in each of them. Some 3rd-party scripts take care about keeping them synced. While they mostly work, sometimes we have issues, and realigning the 2 servers is never trivial. I REALLY wish this multi-site support was official and reliable.Scandium
M
6

DVCS is very interesting for me as it:

  • adds an all new dimension to the source control process: publication.
    You do not just have a merge workflow, you also have a publication workflow (to which repository will you push to/pull from), and that can have many implication in term of:

    • development lifecycle (with repositories made only for a certain type of commits, like the one made to be released into profuctions, for deployment purposes)
    • solo tasks (you can push and update a backup repo, even in the form of just one file)
    • inter-dependencies project (when a team of project A is waiting for team porject B to finally commit to the central repo, it may resort to ask B to "pass" an intermediate development as an attached zip file in a mail. Now, all that A has to do is add B repo as a potential remote, fetch it and have a peek)
  • brings a new way of producing/consuming revisions with:

    • a passive way of producing new revisions (only the one which are actively pulling from your repo will see them in their branches)
    • an active way of consuming revisions from others (by adding their repo as remote and fetching/merging what you need from them).

That means you do not depend on other delivering their work to a central repo but that you can have a more direct relationship with different actors and their repos.

Methuselah answered 1/4, 2010 at 21:50 Comment(0)
M
2

Your central argument about the IDE doing the tracking for you is false. Most IDEs don't in fact have any such functionality besides unlimited undo levels. Think of branches, merges, reverts, commit messages (log) and such and I bet that even the IDE that you did refer to falls short. Especially I doubt it tracking your commits - quite possibly on several different branches that you work on - and properly pushing them to the repository once you get online.

If your IDE actually does all that, I would in fact call it a distributed version control system in itself.

Finally, if the central repository dies for whatever the reason (your service provider went bankrupt, there was a fire, a hacker corrupted it, ...), you have a full backup on every machine that had pulled the repository recently.

EDIT: You can use a DVCS just like a centralized repository, and I would even recommend doing so for small-to-medium sized projects at least. Having one central "authoritative" repository that is always online simplifies a lot of things. And when that machine crashes, you can temporarily switch to one of the other machines until the server gets fixed.

Manganate answered 1/4, 2010 at 21:49 Comment(5)
for the last comment about the central repository's vulnerability: that's why you back it up, the good thing is there's only one thing to back up.Bellyband
There are various reasons why backups would fail, too. Human error, a fire, bankrupt, unless you actually do take care to frequently backup the repository to multiple physical locations and keep multiple old versions of it. I doubt that many would stick to such procedure but simply using a DVCS gets you that for free.Manganate
I've had SVN repositories become unreadable simply because a newer version of BerkDB (the default backend of SVN back then) could no longer read the old repository. Backups will not help you with that but a DVCS will have usable copies on machines that didn't upgrade yet even if such a software error would occur.Manganate
A DVCS does only backup up the data if those changes have actually been shared with someone else. What you describe with the BDB format is an upgrade problem that can be avoided by following upgrade instructions and not a software error, and it does not mean data has been lost. Worst case you can downgrade, dump, upgrade, load, and be done.Bellyband
Well, that would be the same as centralized VCS, wouldn't you say? Making a backup of the SVN server isn't going to help if I am tracking my changes in my IDE instead of committing them to server. This is why I recommend using DVCS in a centralized manner, frequently syncing with the central server.Manganate
O
1

If you don't see the value of local history or local builds, then I'm not sure than any amount of question-answering is going to change your mind.

The history features of IDE's are limited and clumsy. They are nothing like the full function.

One good example of how this stuff gets used is on various Apache projects. I can sync up a git repo to the Apache svn repo. Then I can work for a week in a private branch all my very own. I can downmerge changes from the repo. I can report on my changes, retail or wholesale. And when I'm done, I can package them up as one commit.

Octachord answered 1/4, 2010 at 21:51 Comment(5)
@Octachord IntelliJ IDEA provides excellent local history and diff, have a look at http://www.jetbrains.com/idea/features/local_history.html if you do not have any first-hand experience.Anesthetize
@ring bearer I'm perfectly aware of what it does and doesn't do, I've used it. It doesn't do a command line to show me the diffs across an entire tree, just to name one. It doesn't sync remote branches and allow merges.Octachord
Now we are mixing up here. What I said is why? why? on earth I need local history via distributed version control? I manage by local history on IDE, then merg/check in /diff using my version control(cvs/svn .. )client(or IDE Plugin)Anesthetize
You've actually just pointed out the FLAW with a DVCS model. You can pull down your changes, and work in isolation for a week, and then submit some massive update to the "main" repo that no-one has reviewed or commented on. blog.red-bean.com/sussman/?p=20 (There is also the issue of your local hard drive crashing and taking a week's worth of work with it.)(Thunell
This is stupid. If it doesn't solve a problem for you: don't use it. Don't try to tell me that it doesn't solve a problem for me.Octachord
K
1

Interesting question.

I'm not a seasoned DVCS user but my limited exposure has felt very positive.

I love being able to 2-step commit. It suits me.

Some advantages that spring to mind:

  1. Better merge support. Branch-Merge feels more like a 1st class citizen in DVCS, whereas in my experience of centralised solutions, I've found it to be painful and tricksy. Merge tracking is now available in svn, but it's still slow and cumbersome.

  2. Large teams. DVCS is not only for single-user commits. You can push & pull commits between teams before contributing back to the master repository (or not). This is invaluable for certain flavours of collaboration.

  3. when working on experimental functionality, it makes sense to commit frequently, but only for the short-term. I don't want always to branch the main codebase, so it's nice to be able to play & re-record. Similarly, I can see it being useful when working with Continuous Integration. If I am working for days on refactoring efforts, I may break builds for an unacceptable timeframe, but I still want to keep track of my changes.

Note that my DVCS experience is more with Mercurial than with Git. Coming from a CVS/SVN background, I've found the learning curve much easier with Mercurial (Hg). Recently-added Google Code support for Mercurial is also a boon. ... I'll even go as far as to say, that my initial response to Git was negative, but more from a usability perspective than anything to do with DVCS

Kymric answered 1/4, 2010 at 22:4 Comment(0)
B
1

It might be interesting to note that Subversion will probably be getting things like offline commits in the future. Of course we can't really compare those features to what's available today, but it might be a very good way to "use DVCS in a centralized manner" as described in other answers here.

Another recent post states that Subversion is not trying to become a DVCS

These things will probably mean that the repository is still centralized, meaning you can't do disconnected branching, diffing of old versions, but you can queue up commits.

Bellyband answered 3/4, 2010 at 9:30 Comment(0)
S
1

I'm not going to sell anything here.

• What is the advantage or value of committing locally? What? really? All modern IDEs allows you to keep track of your changes? and if required you can restore a particular change. Also, they have a feature to label your changes/versions at IDE level!?

The only real advantage is that you don't need to need connectivity to the main central repository. Someone can say that Git's benefit is the fact that a developer can commit locally, preparing a great combo of patches and then pull them to the blessed central repo, but IMO this is pretty uninteresting. A developer can use a private shelve or a branch in Subversion repository to work on his task and then merge it with a mainline (e.g. /trunk) or another branch.

For me the main downside here is the fact that I have to download and store the whole Git repository on my machine. With a large project with looong history it becomes a pain and takes too much space.

Another downside of being centralized is that Git technically can't track renames or copy operations. It just tries to guess whether a file was renamed or copied based on the file's content. This results into such funny cases: svn to git migration keeping history of copied file (Guy is asking why the history of a file has been lost after SVN>Git migration, ).

• What if I crash my hard drive? where did my local repository go? (so how is it cool compared to checking in to a central repo?)

With Git, if you crashed your local storage device (HDD, SSD, whatever) and it had changes that were not pulled or pushed to a blessed Git's repo, then you are out of luck. You've just lost your time and your code. In addition to this, a crash of a hard drive with your local Git repo may halt development process for some time: Linus Torvald's SSD breaks, halts Linux kernel development.

With centralized source control such as SVN, you could only lose your last commit because all your work was already committed to the central repository to a branch, private shelve or even trunk. Obviously, you should ensure that there is a disaster recovery and backup implemented for your central repo.

• Ok Linus Torvalds gives his life to Git and hates everything else. Is that enough to blindly sing praises? Linus lives in a different world compared to offshore developers in my mid-sized project?

For such a project as Linux Kernel that used BitKeeper in the past, Git is the best source control system! But I'd say that Git does not suit everyone.

Choose wisely!

Sketchy answered 2/1, 2016 at 17:6 Comment(0)
F
-1

Most likely, no one will sell you anything here. If you need git's features, just git init. If it does not fits for you, just don't.

If you don't know git features yet, type git vs (note the ending space) in the Google search, and see the results from the autocomplete.

I preferred Notepad over IDE until I needed Netbeans features. Seems that this is the same case here.

As you know, there were many successful projects without VCS at all.

PS. Selling git violates its license! ;)

Flowered answered 13/6, 2010 at 17:37 Comment(4)
"Selling git violates its license!", That's not true! it's GPLv2 you CAN sell it AFAIK.Disenthrall
"there were many successful projects without VCS at all" -- and how many projects failed because they weren't using VCS (correctly)? See example of now dead CIA.vc service, which had 'production' server failed... and codebase was complex and live-edited in production (though it has Subversion repository on Google Code).Eicher
@JakubNarębski Good point :) My answer is a bit cynic, it's obvious that version control is a must. The strange fact that many developers are not still using it even if it's free and as easy as git init :)Flowered
@Flowered or as easy as hg init :-)Springfield

© 2022 - 2024 — McMap. All rights reserved.