Fully backup a git repo?
Asked Answered
A

14

193

Is there a simple way to backup an entire git repo including all branches and tags?

Aliciaalick answered 7/4, 2011 at 8:39 Comment(4)
I guess you are refering to a local git repos here.Pintail
possible duplicate of Backup a Local Git RepositorySambar
The correct answer is to do a: git clone --mirror [email protected]/your-repo.git This will copy your entire repository, notes, branches, tracking, etc.Achromatous
Some web searches I ran that didn't include this question in its results: "git clone absolutely everything branches tags notes"; "git clone everything in repository"; "git clone a repo with all tags notes".Rabb
T
78

Whats about just make a clone of it?

git clone --mirror other/repo.git

Every repository is a backup of its remote.

Tungstate answered 7/4, 2011 at 8:44 Comment(4)
@Daniel: If you clone a repository, you fetch every branch, but only the default one is checkouted. Try git branch -a. Maybe its more obvious this way: After cloning a repository you dont fetch every branch, you fetch every commit. Branches only reference to an existing commit.Tungstate
I think he knows the clone command well, if he can ask such a question, and it is clearly not enough for him (because it is a clone, and not a dump). Dumps are different things as simple copies, for example: 1) they are not needed to be optimal (or even capable) for normal work 2) but they are required to have a good resistance and repairibility against data corruption.Huerta
@Huerta Sure, but git clone covers all that. (1) is optional, not a requirement. If the result is still optimized, it's still a backup (2) is already covered by git itself. -- The point I'd like to give is, that if git clone already cover the relevant points, for what you need a different tool? Although I also prefer git bundle I don't think my answer is wrong, or invalid. You can see both approaches as hot- vs cold-backup.Tungstate
what about file permissions? does git clone necessarily copy those over? depends on the options i believeSeraphic
T
264
git bundle

I like that method, as it results in only one file, easier to copy around.
See ProGit: little bundle of joy.
See also "How can I email someone a git repository?", where the command

git bundle create /tmp/foo-all --all

is detailed:

git bundle will only package references that are shown by git show-ref: this includes heads, tags, and remote heads.
It is very important that the basis used be held by the destination.
It is okay to err on the side of caution, causing the bundle file to contain objects already in the destination, as these are ignored when unpacking at the destination.


For using that bundle, you can clone it, specifying a non-existent folder (outside of any git repo):

git clone /tmp/foo-all newFolder
Turncoat answered 7/4, 2011 at 8:42 Comment(10)
add --all for complete backupAllege
This, the git bundle is the correct answer on my opinion, and not the accepted one. I think he knows the clone command well, if he can ask such a question, and it is clearly not enough for him (because it is a clone, and not a dump). Dumps are different things as simple copies, for example: 1) they are not needed to be optimal (or even capable) for normal work 2) but they are required to have a good resistance and repairibility against data corruption 3) It is often useful if they are easily diff-able for incremental backups, while it is a not-a-goal on copies.Huerta
Note that neither git bundle or git clone gets everything, for example the hook scripts.Chiton
@Chiton Yes, it is by design. Hooks can be dangerous or include sensitive information.Turncoat
Can I use git bundle against a remote repo?Voluntaryism
@RyanShillington I have always seen that command used after a clone, not for a remote repository: https://mcmap.net/q/12676/-how-to-git-bundle-a-complete-repo. This is different from an archive which does get a compressed version of the files, without history, and can operate on remote repo: https://mcmap.net/q/13873/-git-how-to-archive-from-remote-repository-directlyTurncoat
In my testing git bundle create backup --all does not include refs/notes.Deoxyribose
@Deoxyribose Would git bundle create backup.bndl --all refs/notes/* work for you, then? Check its content with git bundle verify backup.bndlTurncoat
@Turncoat Thanks for that. I think I'll stick to git clone --mirror for this though since it seems to take all refs into account by default.Deoxyribose
@Deoxyribose True, that is what --mirror is for indeed.Turncoat
T
78

Whats about just make a clone of it?

git clone --mirror other/repo.git

Every repository is a backup of its remote.

Tungstate answered 7/4, 2011 at 8:44 Comment(4)
@Daniel: If you clone a repository, you fetch every branch, but only the default one is checkouted. Try git branch -a. Maybe its more obvious this way: After cloning a repository you dont fetch every branch, you fetch every commit. Branches only reference to an existing commit.Tungstate
I think he knows the clone command well, if he can ask such a question, and it is clearly not enough for him (because it is a clone, and not a dump). Dumps are different things as simple copies, for example: 1) they are not needed to be optimal (or even capable) for normal work 2) but they are required to have a good resistance and repairibility against data corruption.Huerta
@Huerta Sure, but git clone covers all that. (1) is optional, not a requirement. If the result is still optimized, it's still a backup (2) is already covered by git itself. -- The point I'd like to give is, that if git clone already cover the relevant points, for what you need a different tool? Although I also prefer git bundle I don't think my answer is wrong, or invalid. You can see both approaches as hot- vs cold-backup.Tungstate
what about file permissions? does git clone necessarily copy those over? depends on the options i believeSeraphic
P
46

Expanding on the great answers by KingCrunch and VonC

I combined them both:

git clone --mirror [email protected]/reponame reponame.git
cd reponame.git
git bundle create reponame.bundle --all

After that you have a file called reponame.bundle that can be easily copied around. You can then create a new normal git repository from that using git clone reponame.bundle reponame.

Note that git bundle only copies commits that lead to some reference (branch or tag) in the repository. So dangling commits are not stored to the bundle.

Polyester answered 4/1, 2019 at 13:58 Comment(0)
M
28

Expanding on some other answers, this is what I do:

Setup the repo: git clone --mirror user@server:/url-to-repo.git

Then when you want to refresh the backup: git remote update from the clone location.

This backs up all branches and tags, including new ones that get added later, although it's worth noting that branches that get deleted do not get deleted from the clone (which for a backup may be a good thing).

This is atomic so doesn't have the problems that a simple copy would.

See http://www.garron.me/en/bits/backup-git-bare-repo.html

Mosemoseley answered 29/5, 2014 at 2:47 Comment(0)
J
19

This thread was very helpful to get some insights how backups of git repos could be done. I think it still lacks some hints, information or conclusion to find the "correct way" (tm) for oneself. Therefore sharing my thoughts here to help others and put them up for discussions to enhance them. Thanks.

So starting with picking-up the original question:

  • Goal is to get as close as possible to a "full" backup of a git repository.

Then enriching it with the typical wishes and specifiying some presettings:

  • Backup via a "hot-copy" is preferred to avoid service downtime.
  • Shortcomings of git will be worked around by additional commands.
  • A script should do the backup to combine the multiple steps for a single backup and to avoid human mistakes (typos, etc.).
  • Additionally a script should do the restore to adapt the dump to the target machine, e.g. even the configuration of the original machine may have changed since the backup.
  • Environment is a git server on a Linux machine with a file system that supports hardlinks.

1. What is a "full" git repo backup?

The point of view differs on what a "100%" backup is. Here are two typical ones.

#1 Developer's point of view

  • Content
  • References

git is a developer tool and supports this point of view via git clone --mirror and git bundle --all.

#2 Admin's point of view

  • Content files
    • Special case "packfile": git combines and compacts objects into packfiles during garbage collection (see git gc)
  • git configuration
  • Optional: OS configuration (file system permissions, etc.)

git is a developer tool and leaves this to the admin. Backup of the git configuration and OS configuration should be seen as separated from the backup of the content.

2. Techniques

  • "Cold-Copy"
    • Stop the service to have exclusive access to its files. Downtime!
  • "Hot-Copy"
    • Service provides a fixed state for backup purposes. On-going changes do not affect that state.

3. Other topics to think about

Most of them are generic for backups.

  • Is there enough space to hold the full backups? How many generations will be stored?
  • Is an incremental approach wanted? How many generations will be stored and when to create a full backup again?
  • How to verify that a backup is not corrupted after creation or over time?
  • Does the file system support hardlinks?
  • Put backup into a single archive file or use directory structure?

4. What git provides to backup content

  • git gc --auto

    • docs: man git-gc
    • Cleans up and compacts a repository.
  • git bundle --all

    • docs: man git-bundle, man git-rev-list
    • Atomic = "Hot-Copy"
    • Bundles are dump files and can be directly used with git (verify, clone, etc.).
    • Supports incremental extraction.
    • Verifiable via git bundle verify.
  • git clone --mirror

    • docs: man git-clone, man git-fsck, What's the difference between git clone --mirror and git clone --bare
    • Atomic = "Hot-Copy"
    • Mirrors are real git repositories.
    • Primary intention of this command is to build a full active mirror, that periodically fetches updates from the original repository.
    • Supports hardlinks for mirrors on same file system to avoid wasting space.
    • Verifiable via git fsck.
    • Mirrors can be used as a basis for a full file backup script.

5. Cold-Copy

A cold-copy backup can always do a full file backup: deny all accesses to the git repos, do backup and allow accesses again.

  • Possible Issues
    • May not be easy - or even possible - to deny all accesses, e.g. shared access via file system.
    • Even if the repo is on a client-only machine with a single user, then the user still may commit something during an automated backup run :(
    • Downtime may not be acceptable on server and doing a backup of multiple huge repos can take a long time.
  • Ideas for Mitigation:
    • Prevent direct repo access via file system in general, even if clients are on the same machine.
    • For SSH/HTTP access use git authorization managers (e.g. gitolite) to dynamically manage access or modify authentication files in a scripted way.
    • Backup repos one-by-one to reduce downtime for each repo. Deny one repo, do backup and allow access again, then continue with the next repo.
    • Have planned maintenance schedule to avoid upset of developers.
    • Only backup when repository has changed. Maybe very hard to implement, e.g. list of objects plus having packfiles in mind, checksums of config and hooks, etc.

6. Hot-Copy

File backups cannot be done with active repos due to risk of corrupted data by on-going commits. A hot-copy provides a fixed state of an active repository for backup purposes. On-going commits do not affect that copy. As listed above git's clone and bundle functionalities support this, but for a "100% admin" backup several things have to be done via additional commands.

"100% admin" hot-copy backup

  • Option 1: use git bundle --all to create full/incremental dump files of content and copy/backup configuration files separately.
  • Option 2: use git clone --mirror, handle and copy configuration separately, then do full file backup of mirror.
    • Notes:
    • A mirror is a new repository, that is populated with the current git template on creation.
    • Clean up configuration files and directories, then copy configuration files from original source repository.
    • Backup script may also apply OS configuration like file permissions on the mirror.
    • Use a filesystem that supports hardlinks and create the mirror on the same filesystem as the source repository to gain speed and reduce space consumption during backup.

7. Restore

  • Check and adopt git configuration to target machine and latest "way of doing" philosophy.
  • Check and adopt OS configuration to target machine and latest "way of doing" philosophy.
Jointworm answered 4/6, 2020 at 16:13 Comment(1)
I like this taxonomy. // Some OS/filesystems have "instantaneous" snapshots[*], reducing cold-copy lockout time. // IMHO all modern OS/filesystems should have instant snapshots, but reading en.wikipedia.org/wiki/Comparison_of_file_systems#Features and other Q&As indicates otherwise. // "Instant snapshots" e.g. have version numbers associated with snapshots and disk blocks, COW rather than modify blocks across versions, etc. // Still need to lock out others at the time of snapshot.Kanpur
A
8

The correct answer IMO is git clone --mirror. This will fully backup your repo.

Git clone mirror will clone the entire repository, notes, heads, refs, etc. and is typically used to copy an entire repository to a new git server. This will pull down an all branches and everything, the entire repository.

git clone --mirror [email protected]/your-repo.git
  • Normally cloning a repo does not include all branches, only Master.

  • Copying the repo folder will only "copy" the branches that have been pulled in...so by default that is Master branch only or other branches you have checked-out previously.

  • The Git bundle command is also not what you want: "The bundle command will package up everything that would normally be pushed over the wire with a git push command into a binary file that you can email to someone or put on a flash drive, then unbundle into another repository." (From What's the difference between git clone --mirror and git clone --bare)

Achromatous answered 14/5, 2018 at 19:18 Comment(1)
Does git clone --mirror create a consistent point-in-time backup? What is a user pushes a commit during the backup? Is it rejected, queued, or incorporated into the backup?Post
G
6

use git bundle, or clone

copying the git directory is not a good solution because it is not atomic. If you have a large repository that takes a long time to copy and someone pushes to your repository, it will affect your back up. Cloning or making a bundle will not have this problem.

Gemperle answered 11/4, 2013 at 3:54 Comment(0)
H
4

Everything is contained in the .git directory. Just back that up along with your project as you would any file.

Humor answered 7/4, 2011 at 8:41 Comment(8)
Does this mean, just backing up ALL contents of the directory containing the Git project is sufficient?Shrink
Agreed with Sunil--this does not appear to be an atomic operation.Ricardo
And how do you ensure no changes are made to files in that directory while creating the backup?Emperor
As Raedwald hinted, this method can result in an inconsistent backup and hence lead to data loss. Hence this answer should be removed, or at the very least, warn about the possibility of data loss.Jacobi
I think he knows the copy or cp commands very well and it doesn't suit his needs. And I also think, he thinks on a bare repository (although it can be copied as well, I think it is not a full-featured backup).Huerta
Useful with e.g. the ubuntu backup tool which asks for folders to copy. I have it at daily so if a corruption would occur I lose at most one day of work.Photoelectron
If you are using a local repo as the only collaborator is this method good enough without any possible issues?Cilurzo
This is a bit annoying when you want to delete the backup since every loose object (.git/objects/*) is write-protectted.Deoxyribose
M
3

You can backup the git repo with git-copy at minimum storage size.

git copy /path/to/project /backup/project.repo.backup

Then you can restore your project with git clone

git clone /backup/project.repo.backup project
Masry answered 3/6, 2015 at 3:44 Comment(2)
github.com/cybertk/git-copy/blob/master/bin/git-copy#L8-L36: that seems a lot of work for a simple git clone --bare + git push --force.Turncoat
@Turncoat Yes, but it can have some additional feature during the repackaging, or it can mine the internal structure of the git repo, which it can use for some optimization (restructuring of the destination, or speed increase, etc).Huerta
V
1

There is a very simple to use python tool that automatically backs up organisations' repositories in .zip format by saving public and private repositories and all their branches. It works with the Github API : https://github.com/BuyWithCrypto/OneGitBackup

Vivien answered 7/8, 2022 at 9:6 Comment(1)
This gives me "404 Page not found"Natalienatalina
R
0
cd /path/to/backupdir/
git clone /path/to/repo
cd /path/to/repo
git remote add backup /path/to/backupdir
git push --set-upstream backup master

this creates a backup and makes the setup, so that you can do a git push to update your backup, what is probably what you want to do. Just make sure, that /path/to/backupdir and /path/to/repo are at least different hard drives, otherwise it doesn't make that much sense to do that.

Renell answered 22/4, 2015 at 10:14 Comment(1)
I think he knows the clone command well, if he can ask such a question, and it is clearly not enough for him (because it is a clone, and not a dump). Dumps are different things as simple copies, for example: 1) they are not needed to be optimal (or even capable) for normal work 2) but they are required to have a good resistance and repairibility against data corruption 3) It is often useful if they are easily diff-able for incremental backups, while it is a not-a-goal on copies.Huerta
B
0

Here are two options:

  1. You can directly take a tar of the git repo directory as it has the whole bare contents of the repo on server. There is a slight possibility that somebody may be working on repo while taking backup.

  2. The following command will give you the bare clone of repo (just like it is in server), then you can take a tar of the location where you have cloned without any issue.

    git clone --bare {your backup local repo} {new location where you want to clone}
    
Brittaneybrittani answered 25/7, 2015 at 17:2 Comment(2)
I think he knows the clone or tar command well, if he can ask such a question, and it is clearly not enough for him (because it is a clone, and not a dump). Dumps are different things as simple copies, for example: 1) they are not needed to be optimal (or even capable) for normal work 2) but they are required to have a good resistance and repairibility against data corruption 3) It is often useful if they are easily diff-able for incremental backups, while it is a not-a-goal on copies.Huerta
peterh, Definitely he wasn't asking for tar or clone command. If you look closely, i wasn't explaining those command either. What i was trying to explain is the Git backup via different method which may include various Linux commands which doesn't mean that i am teaching those linux commands. I am trying to put few ideas here.Brittaneybrittani
L
0

If it is on Github, Navigate to bitbucket and use "import repository" method to import your github repo as a private repo.

If it is in bitbucket, Do the otherway around.

It's a full backup but stays in the cloud which is my ideal method.

Lyly answered 2/7, 2019 at 0:5 Comment(0)
G
-8

As far as i know you can just make a copy of the directory your repo is in, that's it!

cp -r project project-backup
Grapple answered 7/4, 2011 at 8:42 Comment(5)
Can anybody please confirm this? I feel this is the right approach for making a proper backup.Shrink
I think you could end up with an inconsistent snapshot when during the copy operation changes are committed/pushed to the repository. Using git commands like git clone --bare will give you a consistent snapshot.Overeager
Agreed with Sunil--this does not appear to be atomic.Ricardo
@Ricardo It is not always a problem if it is not atomic - you only need to know, and need to be able, to guarantee that nobody other can reach the repo while you are working on it. But I think the OP wants a specific, for git repos optimized tool for the task, simple file copy is probably well known for him.Huerta
regularly cping git repoes is a abuse to your storage device.Urias

© 2022 - 2024 — McMap. All rights reserved.