How to complete a git clone for a big project on an unstable connection?
Asked Answered
K

18

261

I am trying to git clone the LibreOffice codebase, but at the moment I have an internet connection of about 300kbps and it's just anything but stable. I can get the connection back any moment, but then the git clone process already stopped working, and no way to get it running again. Is there some way to have a more failure-resistant git clone download?

One option I considered myself is to download someone else's .git directory, but that is overly dependent of others and doesn't seem like the best possible solution to me.

Kauppi answered 17/10, 2010 at 19:22 Comment(6)
Do you need to clone all revisions, or just latest? Maybe depth -1 is a solution?Cleotildeclepe
The bundle approach is already in place for repos like kernel/git/torvalds/linux.git. And a resumable git clone is being discussed (March 2016). See https://mcmap.net/q/111139/-if-a-git-fetch-is-cancelled-half-way-will-it-resume.Bump
I wonder. Won't doing git init, setting a remote and then doing fetch until it succeeds do the trick? I don't think fetch discards successfully downloaded objects if the connection fails.Flippant
@АндрейБеньковский has anyone tried this?Thibaud
Also see Does git-clone have resume capability? over on Super User and Is there any way to continue Git clone from the point where it failed? here.Lammond
Microsoft contributes GVFS now, so that and maybe the buffer size option just added might be helping to actually solve this issue over time.Kauppi
D
80

I don't think this is ready yet. There's an old GSoC page that which planned to implement your desired feature. My best bet is, like you suggested download it as a directory. I'm assuming you are able to resume downloads over other protocols.

Restartable Clone

When cloning a large repository (such as KDE, Open Office, Linux kernel) there is currently no way to restart an interrupted clone. It may take considerable time for a user on the end of a small pipe to download the data, and if the clone is interrupted in the middle the user currently needs to start over from the beginning and try again. For some users this may make it impossible to clone a large repository.

Goal: Allow git-clone to automatically resume a previously failed download over the native git:// protocol. Language: C Mentor: Shawn Pearce Suggested by: Shawn Pearce on gmane


Update

Along with the shallow cloning (git clone --depth=1) suggestion in one of the other answers it may be helpful if someone can make a bare repository for you if you can communicate with the provider. You can easily convert the bare repository to a full repository. Also read the comments in that answer as a shallow clone may not always help.

Dissension answered 17/10, 2010 at 19:28 Comment(12)
Thanks for the information, so my problem is known and a solution is worked on... What would you recommend as a work-around?Kauppi
I would say if you can clone it some place else, just copy is from there. Or if you can download it as a directory (the .git and other stuff that's there) then you do that. Almost all download managers will let you resume your regular downloads (the directory method).Dissension
I know that one. The worst thing however is that it's one anonymous download over the git-protocol first, then there's a script to do 19 more git clonesKauppi
Oh! Get someone to clone it for you on a flash drive or something then. :PDissension
The problem is that all connections are crap here... I think I'll have to put it all on a server and then download it by scp... I just only have Shared Hosting ssh access, so I don't know about git on those machines... :(Kauppi
Maybe off-topic, but this might work as a possible implementation for a more failsave git clone: * Have an option to make this possible (like --flacky-connection) * While using this option, implement clone as just a clone of the first revision, then update in blocks with git pull.Kauppi
Would work if the first revision is small. Could happen that the initial revision is big enough to be painful. But, hey, it's all open-source. ;)Dissension
I am also stucked while cloning vlc code, though its not that big but connection getting interrupted over http, no way to resume from the repo block already downloaded :(Rata
Well just yesterday ,I Lost my 600 rupees($10) Because of this Problem.Internet Bandwidth is quite precious thing in my Part of the world.Odie
Lots of people asking for updates and nobody sharing their contribution to the solution.Thibaud
Mar'18 - lukin for it still...on this earth!!Lagas
11 years later, Google's attack on the underlying socioeconomic issue of unreliable bandwidth with Google Fiber and Google Fi had mixed results. Its fiber micro-trenches in the city of Louisville were cut too shallowly into the asphalt, and the cables were found popping out from the road surface soon after work. Meanwhile, --depth 1 and --unshallow appears to have withstood the years of usage.Offprint
S
176

Two solutions (or rather workarounds) that come to mind are:

  • Use shallow clone i.e. git clone --depth=1, then deepen this clone using git fetch --depth=N, with increasing N. You can use git fetch --unshallow (since 1.8.0.3) to download all remaining revisions.

  • Ask somebody to bundle up to some tagged release (see git-bundle(1) manpage). The bundle itself is an ordinary file, which you can download any way, via HTTP/FTP with resume support, via BitTorrent, via rsync, etc. Then you can create clone from bundle, fix configuration, and do further fetches from official LibreOffice repository.

Steerage answered 18/10, 2010 at 9:7 Comment(9)
The shallow clone trick doesn't work well in practice. Cloning a well-packed repo (git://libvirt.org/libvirt.git) changes a 68M transfer into a 61M + 35M transfer. A feature to prioritise the worktree, rather than all branches at depth 1, might fare better; session resumption would be better still.Mohamed
@Tobu: Shallow clone trick might work in repository with lonG history. There is ongoing work to make shallow clone get only a single branch by default. That might have helped. Or not.Frankhouse
This works really well now, with git 1.7.10. The initial depth=1 clone of the Git repository is only 4.72Mb, while the whole repository is 55Mb. Further fetches can be as small as you want, (depth=100 gave me a ~20Mb fetche). The total compressed download was 31Mb, over one clone and 3 fetches.Lovellalovelock
@Lovellalovelock It downloads objects for one revision, and if source code itself is large (not history), then it will be an issue again...Vasculum
Deepen with increasing N: en.wikipedia.org/wiki/Iterative_deepening_depth-first_searchCu
for m in $(seq 1 50);do git fetch --depth=$[m*100];done worked for me, thanks! :)Forereach
If using windows command line, the above loop can be FOR /L %%m IN (Lowerlimit, Increment, Upperlimit) Do git fetch --depth=%%mPifer
I encountered a problem after using this: after --unshallow, my remote tracking branches still only included the main branch. See: stackoverflow.com/a/46282491Uncommercial
A powershell equivalent oneliner: 1..50 | ForEach-Object { git fetch --depth=$($_*100) }Dendro
D
80

I don't think this is ready yet. There's an old GSoC page that which planned to implement your desired feature. My best bet is, like you suggested download it as a directory. I'm assuming you are able to resume downloads over other protocols.

Restartable Clone

When cloning a large repository (such as KDE, Open Office, Linux kernel) there is currently no way to restart an interrupted clone. It may take considerable time for a user on the end of a small pipe to download the data, and if the clone is interrupted in the middle the user currently needs to start over from the beginning and try again. For some users this may make it impossible to clone a large repository.

Goal: Allow git-clone to automatically resume a previously failed download over the native git:// protocol. Language: C Mentor: Shawn Pearce Suggested by: Shawn Pearce on gmane


Update

Along with the shallow cloning (git clone --depth=1) suggestion in one of the other answers it may be helpful if someone can make a bare repository for you if you can communicate with the provider. You can easily convert the bare repository to a full repository. Also read the comments in that answer as a shallow clone may not always help.

Dissension answered 17/10, 2010 at 19:28 Comment(12)
Thanks for the information, so my problem is known and a solution is worked on... What would you recommend as a work-around?Kauppi
I would say if you can clone it some place else, just copy is from there. Or if you can download it as a directory (the .git and other stuff that's there) then you do that. Almost all download managers will let you resume your regular downloads (the directory method).Dissension
I know that one. The worst thing however is that it's one anonymous download over the git-protocol first, then there's a script to do 19 more git clonesKauppi
Oh! Get someone to clone it for you on a flash drive or something then. :PDissension
The problem is that all connections are crap here... I think I'll have to put it all on a server and then download it by scp... I just only have Shared Hosting ssh access, so I don't know about git on those machines... :(Kauppi
Maybe off-topic, but this might work as a possible implementation for a more failsave git clone: * Have an option to make this possible (like --flacky-connection) * While using this option, implement clone as just a clone of the first revision, then update in blocks with git pull.Kauppi
Would work if the first revision is small. Could happen that the initial revision is big enough to be painful. But, hey, it's all open-source. ;)Dissension
I am also stucked while cloning vlc code, though its not that big but connection getting interrupted over http, no way to resume from the repo block already downloaded :(Rata
Well just yesterday ,I Lost my 600 rupees($10) Because of this Problem.Internet Bandwidth is quite precious thing in my Part of the world.Odie
Lots of people asking for updates and nobody sharing their contribution to the solution.Thibaud
Mar'18 - lukin for it still...on this earth!!Lagas
11 years later, Google's attack on the underlying socioeconomic issue of unreliable bandwidth with Google Fiber and Google Fi had mixed results. Its fiber micro-trenches in the city of Louisville were cut too shallowly into the asphalt, and the cables were found popping out from the road surface soon after work. Meanwhile, --depth 1 and --unshallow appears to have withstood the years of usage.Offprint
A
18

This method uses 3rd party server.

First, do git clone --bare, then rsync -v -P -e ssh user@host:repo.git . You can use msys under Windows.

Aliciaalick answered 25/9, 2012 at 10:54 Comment(2)
I tried --bare option, it created the expected contents of .git internal files inside repo.git , I had to do the git clone file:///path/to/repo.git/ to get the actual repositoryGleiwitz
Linus doesn't own GitHub…by "3rd-party server", did you actually mean “Git server which does not jail its users so heavily as to prohibit their use of rsync(1) by the way GitHub I'm looking at you”? Or, do you mean to first git clone on a 3rd-party server and then rsync it to the local machine?Reunion
M
17

I would like to put my 5 cents here. This is actually what helped me to solve this issue

  • Turn off compression
  • Increase http.postBuffer
  • Do a partial clone
  • Navigate to the cloned directory and fetch the rest of the clone
  • Pull the rest
git config --global core.compression 0
git config --global https.postBuffer 524288000
git clone  <your_git_http_url_here> --depth 1
git fetch --unshallow 
git pull --all

This helped me to clone ~3GB repo over the 8Mbps adsl connection, of course I had to perform fetch and pulls few times, but still ...

Mcclenon answered 19/4, 2019 at 7:47 Comment(0)
M
16

"Never underestimate the bandwidth of a carrier pigeon and a bundle of SD cards" would be the modern form of this answer. Tar it up, plain old cp -a it, whatever, and mail the damn thing. Find someone willing to take two minutes of their time to drop a thumb drive into an SASE. Find a contact, there, they might even do it for you.

Maraca answered 13/11, 2013 at 0:44 Comment(0)
S
13

Increasing buffer size will help you in this problem. Just follow the steps.

  1. Open terminal or Git Bash and with cd go to the location where you wanted to clone repo.

  2. Set compression to 0

    git config --global core.compression 0
    
  3. Set postBuffer size

    git config --global http.postBuffer 1048576000
    
  4. Set maxRequestBuffer size

    git config --global http.maxRequestBuffer 100M
    
  5. Now start clone

    git clone <repo url>
    
  6. Wait till clone completes.

Silvereye answered 19/5, 2020 at 12:19 Comment(1)
This should definitely be the accepted answer. It solves the problem.Twospot
S
11

You can "download someone else's .git directory", but with that someone else being the official repository itself. The LibreOffice repositories are available via http, for instance their build.git is at http://anongit.freedesktop.org/git/libreoffice/build.git/ (see http://cgit.freedesktop.org/libreoffice/ for the complete list, the http URL is at the bottom of each repository's page).

What you see at these http URLs is nothing more than a .git directory (actually a "bare" repository, which has only what you would find in the .git directory). It is the same directory the server for the git:// protocol (git daemon) would read. If you make a copy of these directories with a web downloader (for instance wget -m -np), you can clone from your copy and it will work as well as if you had cloned directly from the http repository.

So, what you can do is: for each repository, get a copy of it with your favorite web downloader (which will deal with all the issues with resuming broken downloads), and clone from that copy. When you want to update, use again your favorite web downloader to update your copy, and pull from that copy. Now your clones and updates are as resistant to bad connections as your favorite web downloader is.

Senatorial answered 12/6, 2011 at 21:14 Comment(3)
They made the conversion to just one repository now, trying your tip wget decides to download the site at once however... (trying again now, will probably update here later...)Kauppi
Your command seems to get all links on the site, which is not what is meant to happen. I resorted to write a script that seems to work here: gist.github.com/1307703 Anyway, thanks a lot for the initial idea!Kauppi
Interesting idea, I'm trying to get the ruby/ruby repo from github and I'm getting blocked by the robots.txt... any suggestions?Malapert
I
10

Let's break git clone down into its component parts, and use git checkout to prevent re-downloading files.

When git clone runs, the first few things it does are equivalent to

git init
git remote add origin <repo_url>
git fetch origin <branch>

If you run the above steps manually, and assuming that they completed correctly, you can now run the following as many times as necessary:

git checkout --force <branch>

Note that it will checkout all files each time it's run, but you will not have to re-download files, which may save you a ton of time.

Ingamar answered 8/9, 2017 at 2:9 Comment(5)
it doesn't work the way you describe, it will not allow to do a git reset after a broken fetchDeaminate
As I said, once you assume that a fetch has completed successfully, you can run git reset. If your fetch is broken, then reset won't work. You need to either A) repeatedly try to fetch again until it works, or B) abandon this and try something else.Ingamar
I did something else I it miraculous worked. I did a git pull instead of git fetch =)Deaminate
@Deaminate I believe a git pull is just calling git fetch internally, and then merges, so the command should ned have made the differenceGrubb
Fetch still restarts from the beggining if it fails. it just create a new tmp file at .git/objects/pack. I saw you said fetch should complete correctly, but it doesnt differ from clone command in the end, at least to download huge projects like unreal engine. The only good thing is that I briefly felt hope xDLocation
S
8
git clone --depth <Number> <repository> --branch <branch name> --single-branch

This command help me (Thanks to Nicola Paolucci)

for example

git clone --depth 1 https://github.com/gokhanmoral/siyahkernel3 --branch ics  --single-branch
Seraph answered 21/10, 2014 at 14:12 Comment(1)
I need only develop branch for now, so it helped!Nuke
R
5

If you have access to a 3rd-party server, you could clone there and then copy.

Republican answered 17/10, 2010 at 19:26 Comment(0)
S
4

Use a git proxy, such as ngitcached or git-proxy.

Subchaser answered 31/7, 2014 at 21:31 Comment(2)
Even better: github.com/git-cloner/gitcacheArda
this seems to still require we to have access to a server with an excellent connection to complete the inital cloning right? where such apps will be installed to do the initial download? I dont have it, and unreal engine is so huge...Location
P
3

This problem bit me too. In my case there is a work-around. It may or may not apply in your case.

I'm using a mobile phone sometimes to initiate git operations on a remote system. If my wi-fi breaks of course the session ends and git drops the whole clone operation without recovering. But since the internet connection from my remote system to the git master is solid there's no need for the clone to stop. All I need is the commonsense to detach the clone from the terminal session. This can be done by using screen/tmux or nohup/daemon. So it's a liveware malfunction in my case.

Prouty answered 14/7, 2012 at 16:47 Comment(0)
O
3

Same problem here - I have a really flaky internet connection with often not more than 10-15 kb/sec :-P

For me the wget way worked very well.

Go to the repository site where the green button "clone or download" is, click it and copy the link of the ZIP download option.

Then insert the link to the wget command:

wget -c -m -np https://github.com/your/repository/archive/master.zip

Works like a charm...

Older answered 2/10, 2018 at 7:53 Comment(1)
Maybe this worked before, but right now when I try your solution and connection breaks (or I press Ctrl-C) then after rerun downloading is not continued but started from beginning, at least on LLVM repository.Dracula
I
2

Use CNTRL Z to stop the cloning. Don't close the terminal put the system/laptop in hibernation and then continue later by fg command. I was facing this same problem today while trying to clone a repo frm github. This came as a time saver for me.

Inspired answered 26/10, 2013 at 19:17 Comment(0)
C
0

if we assume server's have good band-wide (and you have a server) another answer is to:

  1. create your own server using Server-Side Git Wrapper's
  2. clone it in your server
  3. Zip it using Server-Side Zip Archiver's
  4. download it from and with Server-Side Resume support

but this only works with very basic Web-development experience ;) and also you need git.exe in your sever

Caddis answered 12/3, 2018 at 4:57 Comment(0)
C
0

I've had a lot of trouble getting repositories over a high latency satellite connection. I had no success with the various config options that are commonly suggested which seem to be repeated without any real explanation of why they ought to work. My bandwidth is a couple of Mb/s and is sufficient to download large files (hundreds of MB) in a reasonable amount of time, but it seems some other instability in the connection causes fetch to fail.

One solution that I haven't seen mentioned here is to try with SSH instead of HTTPS (in combination with other suggestions like shallow cloning). This has been a lot more successful for me in cases where HTTPS would reliably fail. I imagine most people reading this are trying to clone from Github i.e. try setting up a key and using git clone --depth=1 --no-tags [email protected]:organisation/repo.git

A backup solution is to clone the repository somewhere else; ideally your own server, but since many folks don't have access to that I've found Google Colab is very serviceable. This also works if SSH is blocked on your network:

!git clone --depth=1 --no-tags https://github.com/some/repo.git
!tar -czf repo.tar.gz repo

and then download the tarball via the file explorer in the browser. You could also copy to Google Drive, scp/rsync or even cloud storage if you have the means. Running git fetch --unshallow on the extracted tarball also seems to generally work.

Cornice answered 8/12, 2023 at 21:7 Comment(0)
T
-1

You can try to use mercurial with the hg-git extension.

If that doesn't work you can can use git fetch <commit-id> to fetch only parts of a remote git repository (you can fetch into an empty git repository, there is no need to create it with clone). But you might to correct the branch configuration (=create local and remote tracking branches) when you use this approach.

Therapist answered 18/10, 2010 at 8:33 Comment(0)
M
-1

The best workaround that worked for me:

I faced the same issue with a bad internet connection. So I came up with the following solution:

I created a small php file on my server to download the package as a zip file:

<?php
$url = "https://codeload.github.com/CocoaPods/Specs/zip/master";
file_put_contents("coco.zip", fopen($url, 'r'));
?>  

<a href="coco.zip">coco.zip</a>

Then download the zip file using any download manager that supports resume.

Marinetti answered 26/7, 2019 at 7:51 Comment(1)
You don't need a server or PHP for this. curl -ococo.zip https://codeload.github.com/CocoaPods/Specs/zip/masterIngurgitate

© 2022 - 2024 — McMap. All rights reserved.