Is there a simple way to backup an entire git repo including all branches and tags?
Whats about just make a clone of it?
git clone --mirror other/repo.git
Every repository is a backup of its remote.
git branch -a
. Maybe its more obvious this way: After cloning a repository you dont fetch every branch, you fetch every commit. Branches only reference to an existing commit. –
Tungstate git clone
covers all that. (1) is optional, not a requirement. If the result is still optimized, it's still a backup (2) is already covered by git itself. -- The point I'd like to give is, that if git clone
already cover the relevant points, for what you need a different tool? Although I also prefer git bundle
I don't think my answer is wrong, or invalid. You can see both approaches as hot- vs cold-backup. –
Tungstate git bundle
I like that method, as it results in only one file, easier to copy around.
See ProGit: little bundle of joy.
See also "How can I email someone a git repository?", where the command
git bundle create /tmp/foo-all --all
is detailed:
git bundle
will only package references that are shown by git show-ref: this includes heads, tags, and remote heads.
It is very important that the basis used be held by the destination.
It is okay to err on the side of caution, causing the bundle file to contain objects already in the destination, as these are ignored when unpacking at the destination.
For using that bundle, you can clone it, specifying a non-existent folder (outside of any git repo):
git clone /tmp/foo-all newFolder
git bundle
is the correct answer on my opinion, and not the accepted one. I think he knows the clone command well, if he can ask such a question, and it is clearly not enough for him (because it is a clone, and not a dump). Dumps are different things as simple copies, for example: 1) they are not needed to be optimal (or even capable) for normal work 2) but they are required to have a good resistance and repairibility against data corruption 3) It is often useful if they are easily diff-able for incremental backups, while it is a not-a-goal on copies. –
Huerta git bundle
or git clone
gets everything, for example the hook scripts. –
Chiton git bundle
against a remote repo? –
Voluntaryism git bundle create backup --all
does not include refs/notes
. –
Deoxyribose git bundle create backup.bndl --all refs/notes/*
work for you, then? Check its content with git bundle verify backup.bndl
–
Turncoat git clone --mirror
for this though since it seems to take all refs into account by default. –
Deoxyribose --mirror
is for indeed. –
Turncoat Whats about just make a clone of it?
git clone --mirror other/repo.git
Every repository is a backup of its remote.
git branch -a
. Maybe its more obvious this way: After cloning a repository you dont fetch every branch, you fetch every commit. Branches only reference to an existing commit. –
Tungstate git clone
covers all that. (1) is optional, not a requirement. If the result is still optimized, it's still a backup (2) is already covered by git itself. -- The point I'd like to give is, that if git clone
already cover the relevant points, for what you need a different tool? Although I also prefer git bundle
I don't think my answer is wrong, or invalid. You can see both approaches as hot- vs cold-backup. –
Tungstate Expanding on the great answers by KingCrunch and VonC
I combined them both:
git clone --mirror [email protected]/reponame reponame.git
cd reponame.git
git bundle create reponame.bundle --all
After that you have a file called reponame.bundle
that can be easily copied around. You can then create a new normal git repository from that using git clone reponame.bundle reponame
.
Note that git bundle
only copies commits that lead to some reference (branch or tag) in the repository. So dangling commits are not stored to the bundle.
Expanding on some other answers, this is what I do:
Setup the repo: git clone --mirror user@server:/url-to-repo.git
Then when you want to refresh the backup: git remote update
from the clone location.
This backs up all branches and tags, including new ones that get added later, although it's worth noting that branches that get deleted do not get deleted from the clone (which for a backup may be a good thing).
This is atomic so doesn't have the problems that a simple copy would.
This thread was very helpful to get some insights how backups of git repos could be done. I think it still lacks some hints, information or conclusion to find the "correct way" (tm) for oneself. Therefore sharing my thoughts here to help others and put them up for discussions to enhance them. Thanks.
So starting with picking-up the original question:
- Goal is to get as close as possible to a "full" backup of a git repository.
Then enriching it with the typical wishes and specifiying some presettings:
- Backup via a "hot-copy" is preferred to avoid service downtime.
- Shortcomings of git will be worked around by additional commands.
- A script should do the backup to combine the multiple steps for a single backup and to avoid human mistakes (typos, etc.).
- Additionally a script should do the restore to adapt the dump to the target machine, e.g. even the configuration of the original machine may have changed since the backup.
- Environment is a git server on a Linux machine with a file system that supports hardlinks.
1. What is a "full" git repo backup?
The point of view differs on what a "100%" backup is. Here are two typical ones.
#1 Developer's point of view
- Content
- References
git is a developer tool and supports this point of view via git clone --mirror
and git bundle --all
.
#2 Admin's point of view
- Content files
- Special case "packfile": git combines and compacts objects into packfiles during garbage collection (see
git gc
)
- Special case "packfile": git combines and compacts objects into packfiles during garbage collection (see
- git configuration
- see https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain
- docs: man git-config, man gitignore
- .git/config
- .git/description (for hooks and tools, e.g. post-receive-email hook, gitolite, GitWeb, etc.)
- .git/hooks/
- .git/info/ (repository exclude file, etc.)
- Optional: OS configuration (file system permissions, etc.)
git is a developer tool and leaves this to the admin. Backup of the git configuration and OS configuration should be seen as separated from the backup of the content.
2. Techniques
- "Cold-Copy"
- Stop the service to have exclusive access to its files. Downtime!
- "Hot-Copy"
- Service provides a fixed state for backup purposes. On-going changes do not affect that state.
3. Other topics to think about
Most of them are generic for backups.
- Is there enough space to hold the full backups? How many generations will be stored?
- Is an incremental approach wanted? How many generations will be stored and when to create a full backup again?
- How to verify that a backup is not corrupted after creation or over time?
- Does the file system support hardlinks?
- Put backup into a single archive file or use directory structure?
4. What git provides to backup content
git gc --auto
- docs: man git-gc
- Cleans up and compacts a repository.
git bundle --all
- docs: man git-bundle, man git-rev-list
- Atomic = "Hot-Copy"
- Bundles are dump files and can be directly used with git (verify, clone, etc.).
- Supports incremental extraction.
- Verifiable via
git bundle verify
.
git clone --mirror
- docs: man git-clone, man git-fsck, What's the difference between git clone --mirror and git clone --bare
- Atomic = "Hot-Copy"
- Mirrors are real git repositories.
- Primary intention of this command is to build a full active mirror, that periodically fetches updates from the original repository.
- Supports hardlinks for mirrors on same file system to avoid wasting space.
- Verifiable via
git fsck
. - Mirrors can be used as a basis for a full file backup script.
5. Cold-Copy
A cold-copy backup can always do a full file backup: deny all accesses to the git repos, do backup and allow accesses again.
- Possible Issues
- May not be easy - or even possible - to deny all accesses, e.g. shared access via file system.
- Even if the repo is on a client-only machine with a single user, then the user still may commit something during an automated backup run :(
- Downtime may not be acceptable on server and doing a backup of multiple huge repos can take a long time.
- Ideas for Mitigation:
- Prevent direct repo access via file system in general, even if clients are on the same machine.
- For SSH/HTTP access use git authorization managers (e.g. gitolite) to dynamically manage access or modify authentication files in a scripted way.
- Backup repos one-by-one to reduce downtime for each repo. Deny one repo, do backup and allow access again, then continue with the next repo.
- Have planned maintenance schedule to avoid upset of developers.
- Only backup when repository has changed. Maybe very hard to implement, e.g. list of objects plus having packfiles in mind, checksums of config and hooks, etc.
6. Hot-Copy
File backups cannot be done with active repos due to risk of corrupted data by on-going commits. A hot-copy provides a fixed state of an active repository for backup purposes. On-going commits do not affect that copy. As listed above git's clone and bundle functionalities support this, but for a "100% admin" backup several things have to be done via additional commands.
"100% admin" hot-copy backup
- Option 1: use
git bundle --all
to create full/incremental dump files of content and copy/backup configuration files separately. - Option 2: use
git clone --mirror
, handle and copy configuration separately, then do full file backup of mirror.- Notes:
- A mirror is a new repository, that is populated with the current git template on creation.
- Clean up configuration files and directories, then copy configuration files from original source repository.
- Backup script may also apply OS configuration like file permissions on the mirror.
- Use a filesystem that supports hardlinks and create the mirror on the same filesystem as the source repository to gain speed and reduce space consumption during backup.
7. Restore
- Check and adopt git configuration to target machine and latest "way of doing" philosophy.
- Check and adopt OS configuration to target machine and latest "way of doing" philosophy.
The correct answer IMO is git clone --mirror. This will fully backup your repo.
Git clone mirror will clone the entire repository, notes, heads, refs, etc. and is typically used to copy an entire repository to a new git server. This will pull down an all branches and everything, the entire repository.
git clone --mirror [email protected]/your-repo.git
Normally cloning a repo does not include all branches, only Master.
Copying the repo folder will only "copy" the branches that have been pulled in...so by default that is Master branch only or other branches you have checked-out previously.
The Git bundle command is also not what you want: "The bundle command will package up everything that would normally be pushed over the wire with a git push command into a binary file that you can email to someone or put on a flash drive, then unbundle into another repository." (From What's the difference between git clone --mirror and git clone --bare)
use git bundle, or clone
copying the git directory is not a good solution because it is not atomic. If you have a large repository that takes a long time to copy and someone pushes to your repository, it will affect your back up. Cloning or making a bundle will not have this problem.
Everything is contained in the .git
directory. Just back that up along with your project as you would any file.
copy
or cp
commands very well and it doesn't suit his needs. And I also think, he thinks on a bare repository (although it can be copied as well, I think it is not a full-featured backup). –
Huerta .git/objects/*
) is write-protectted. –
Deoxyribose You can backup the git repo with git-copy at minimum storage size.
git copy /path/to/project /backup/project.repo.backup
Then you can restore your project with git clone
git clone /backup/project.repo.backup project
git clone --bare
+ git push --force
. –
Turncoat There is a very simple to use python tool that automatically backs up organisations' repositories in .zip format by saving public and private repositories and all their branches. It works with the Github API : https://github.com/BuyWithCrypto/OneGitBackup
cd /path/to/backupdir/
git clone /path/to/repo
cd /path/to/repo
git remote add backup /path/to/backupdir
git push --set-upstream backup master
this creates a backup and makes the setup, so that you can do a git push to update your backup, what is probably what you want to do. Just make sure, that /path/to/backupdir and /path/to/repo are at least different hard drives, otherwise it doesn't make that much sense to do that.
Here are two options:
You can directly take a tar of the git repo directory as it has the whole bare contents of the repo on server. There is a slight possibility that somebody may be working on repo while taking backup.
The following command will give you the bare clone of repo (just like it is in server), then you can take a tar of the location where you have cloned without any issue.
git clone --bare {your backup local repo} {new location where you want to clone}
If it is on Github, Navigate to bitbucket and use "import repository" method to import your github repo as a private repo.
If it is in bitbucket, Do the otherway around.
It's a full backup but stays in the cloud which is my ideal method.
As far as i know you can just make a copy of the directory your repo is in, that's it!
cp -r project project-backup
git clone --bare
will give you a consistent snapshot. –
Overeager cp
ing git repoes is a abuse to your storage device. –
Urias © 2022 - 2024 — McMap. All rights reserved.