How to complete a git clone for a big project on an unstable connection?

K

18

261

I am trying to git clone the LibreOffice codebase, but at the moment I have an internet connection of about 300kbps and it's just anything but stable. I can get the connection back any moment, but then the git clone process already stopped working, and no way to get it running again. Is there some way to have a more failure-resistant git clone download?

One option I considered myself is to download someone else's .git directory, but that is overly dependent of others and doesn't seem like the best possible solution to me.

Kauppi answered 17/10, 2010 at 19:22 Comment(6)

Do you need to clone all revisions, or just latest? Maybe depth -1 is a solution? – Cleotildeclepe 17/10, 2010 at 20:32

The bundle approach is already in place for repos like kernel/git/torvalds/linux.git. And a resumable git clone is being discussed (March 2016). See https://mcmap.net/q/111139/-if-a-git-fetch-is-cancelled-half-way-will-it-resume. – Bump 3/3, 2016 at 14:37

I wonder. Won't doing git init, setting a remote and then doing fetch until it succeeds do the trick? I don't think fetch discards successfully downloaded objects if the connection fails. – Flippant 21/11, 2016 at 14:23

@АндрейБеньковский has anyone tried this? – Thibaud 16/4, 2017 at 0:44

Also see Does git-clone have resume capability? over on Super User and Is there any way to continue Git clone from the point where it failed? here. – Lammond 16/6, 2018 at 7:37

Microsoft contributes GVFS now, so that and maybe the buffer size option just added might be helping to actually solve this issue over time. – Kauppi 27/5, 2020 at 14:35

D

80

I don't think this is ready yet. There's an old GSoC page that which planned to implement your desired feature. My best bet is, like you suggested download it as a directory. I'm assuming you are able to resume downloads over other protocols.

Restartable Clone

When cloning a large repository (such as KDE, Open Office, Linux kernel) there is currently no way to restart an interrupted clone. It may take considerable time for a user on the end of a small pipe to download the data, and if the clone is interrupted in the middle the user currently needs to start over from the beginning and try again. For some users this may make it impossible to clone a large repository.

Goal: Allow git-clone to automatically resume a previously failed download over the native git:// protocol. Language: C Mentor: Shawn Pearce Suggested by: Shawn Pearce on gmane

Update

Along with the shallow cloning (git clone --depth=1) suggestion in one of the other answers it may be helpful if someone can make a bare repository for you if you can communicate with the provider. You can easily convert the bare repository to a full repository. Also read the comments in that answer as a shallow clone may not always help.

Dissension answered 17/10, 2010 at 19:28 Comment(12)

Thanks for the information, so my problem is known and a solution is worked on... What would you recommend as a work-around? – Kauppi 17/10, 2010 at 19:31

I would say if you can clone it some place else, just copy is from there. Or if you can download it as a directory (the .git and other stuff that's there) then you do that. Almost all download managers will let you resume your regular downloads (the directory method). – Dissension 17/10, 2010 at 19:33

I know that one. The worst thing however is that it's one anonymous download over the git-protocol first, then there's a script to do 19 more git clones – Kauppi 17/10, 2010 at 19:43

Oh! Get someone to clone it for you on a flash drive or something then. :P – Dissension 17/10, 2010 at 19:44

The problem is that all connections are crap here... I think I'll have to put it all on a server and then download it by scp... I just only have Shared Hosting ssh access, so I don't know about git on those machines... :( – Kauppi 17/10, 2010 at 19:48

Maybe off-topic, but this might work as a possible implementation for a more failsave git clone: * Have an option to make this possible (like --flacky-connection) * While using this option, implement clone as just a clone of the first revision, then update in blocks with git pull. – Kauppi 18/10, 2010 at 14:38

Would work if the first revision is small. Could happen that the initial revision is big enough to be painful. But, hey, it's all open-source. ;) – Dissension 18/10, 2010 at 17:35

I am also stucked while cloning vlc code, though its not that big but connection getting interrupted over http, no way to resume from the repo block already downloaded :( – Rata 11/11, 2013 at 12:6

Well just yesterday ,I Lost my 600 rupees($10) Because of this Problem.Internet Bandwidth is quite precious thing in my Part of the world. – Odie 24/12, 2013 at 14:6

Lots of people asking for updates and nobody sharing their contribution to the solution. – Thibaud 16/4, 2017 at 0:45

Mar'18 - lukin for it still...on this earth!! – Lagas 23/3, 2018 at 5:11

11 years later, Google's attack on the underlying socioeconomic issue of unreliable bandwidth with Google Fiber and Google Fi had mixed results. Its fiber micro-trenches in the city of Louisville were cut too shallowly into the asphalt, and the cables were found popping out from the road surface soon after work. Meanwhile, --depth 1 and --unshallow appears to have withstood the years of usage. – Offprint 8/2, 2019 at 22:39

S

176

Two solutions (or rather workarounds) that come to mind are:

Use shallow clone i.e. git clone --depth=1, then deepen this clone using git fetch --depth=N, with increasing N. You can use git fetch --unshallow (since 1.8.0.3) to download all remaining revisions.
Ask somebody to bundle up to some tagged release (see git-bundle(1) manpage). The bundle itself is an ordinary file, which you can download any way, via HTTP/FTP with resume support, via BitTorrent, via rsync, etc. Then you can create clone from bundle, fix configuration, and do further fetches from official LibreOffice repository.

Steerage answered 18/10, 2010 at 9:7 Comment(9)

The shallow clone trick doesn't work well in practice. Cloning a well-packed repo (git://libvirt.org/libvirt.git) changes a 68M transfer into a 61M + 35M transfer. A feature to prioritise the worktree, rather than all branches at depth 1, might fare better; session resumption would be better still. – Mohamed 19/1, 2012 at 12:9

@Tobu: Shallow clone trick might work in repository with lonG history. There is ongoing work to make shallow clone get only a single branch by default. That might have helped. Or not. – Frankhouse 19/1, 2012 at 15:53

This works really well now, with git 1.7.10. The initial depth=1 clone of the Git repository is only 4.72Mb, while the whole repository is 55Mb. Further fetches can be as small as you want, (depth=100 gave me a ~20Mb fetche). The total compressed download was 31Mb, over one clone and 3 fetches. – Lovellalovelock 26/3, 2013 at 9:1

@Lovellalovelock It downloads objects for one revision, and if source code itself is large (not history), then it will be an issue again... – Vasculum 28/3, 2013 at 13:6

Deepen with increasing N: en.wikipedia.org/wiki/Iterative_deepening_depth-first_search – Cu 16/12, 2017 at 3:27

for m in $(seq 1 50);do git fetch --depth=$[m*100];done worked for me, thanks! :) – Forereach 15/1, 2019 at 11:41

If using windows command line, the above loop can be FOR /L %%m IN (Lowerlimit, Increment, Upperlimit) Do git fetch --depth=%%m – Pifer 29/5, 2021 at 12:9

I encountered a problem after using this: after --unshallow, my remote tracking branches still only included the main branch. See: stackoverflow.com/a/46282491 – Uncommercial 31/5, 2023 at 21:1

A powershell equivalent oneliner: 1..50 | ForEach-Object { git fetch --depth=$($_*100) } – Dendro 5/7, 2023 at 12:4