git push is very slow for a huge repo
Asked Answered
V

2

5

I am having the same issue as in - git push is very slow for a branch but the answer there doesn't fit my situation.

I am working against a corporate GitHub with a very large repo. My process is as follows:

1) Pull from master

2) Create new branch

3) Commit

4) Push the branch to create a pull request.

When pushing the branch on (4) it wants to write over 1,000,000 objects which take about 3gb when the commit I made was to change only 1 line.

If I go to the GitHub UI and create a branch with the same name as in (2) from the UI, then push into that branch, the push takes less than a second. Needless to say that the changes between master and my branch are very minor (no big file added or deleted).

What can I do to make Git push only the relevant data and not the entire repo?

Git on Windows ver 2.17.0

Valentinevalentino answered 3/6, 2018 at 12:1 Comment(5)
if you run git show --name-status <your branch>, how many file there are?Ugrian
at what point to I run this?Valentinevalentino
after the step 3, commitUgrian
Well... it cal also depends on your tty output: twitter.com/33asr/status/1097165302125789184Sump
For a large repo, you now (Q1 2019) have, with Git For Windows 2.21, the config pack.sparse which can help the performance of the push. See my answer below.Sump
S
5

You could try your same push with:

This option is from those paches, and implemented in commit d5d2e93, which includes the comment:

These improvements will have even larger benefits in the super- large Windows repository.

That should be interesting in your case.

See "Exploring new frontiers for Git push performance" from Derrick Stolee

A git push would typically display something like:

$ git push origin topic
Enumerating objects: 3670, done.
Counting objects: 100% (2369/2369), done.
Delta compression using up to 8 threads
Compressing objects: 100% (546/546), done.
Writing objects: 100% (1378/1378), 468.06 KiB | 7.67 MiB/s, done.
Total 1378 (delta 1109), reused 1096 (delta 832)
remote: Resolving deltas: 100% (1109/1109), completed with 312 local objects.
To https://server.info/fake.git
* [new branch] topic -> topic

"Enumerating" means:

Git constructs a pack-file that contains the commit you are trying to push, as well as all commits, trees, and blobs (collectively, objects) that the server will need to understand that commit.
It finds a set of commits, trees, and blobs such that every reachable object is either in the set or known to be on the server.

The goal is to find the right "frontier"

https://static.mcmap.net/file/mcmap/ZG-AbGLDKwfkX7XiaFfnbw2tZVMwa1MvXn3QWRft/devops/wp-content/uploads/sites/6/2019/05/sparse-push-commit-walk.png

The uninteresting commits that are direct parents of interesting commits form the frontier

Old:

To determine which trees and blobs are interesting, the old algorithm first determined all uninteresting trees and blobs.

Starting at every uninteresting commit in the frontier, recursively walk from its root tree and mark all reachable trees and blobs as uninteresting. This walk skips trees that were already marked as uninteresting to avoid revisiting potentially large portions of the graph.

https://static.mcmap.net/file/mcmap/ZG-AbGLDKwfkX7XiaFfnbw2tZVMwa1MvXn3QWRft/devops/wp-content/uploads/sites/6/2019/05/sparse-push-old-algorithm.png

New

The old algorithm is recursive: it takes a tree and runs the algorithm on all subtrees.

The new algorithm uses the paths to reduce the scope of the tree walk. It is also recursive, but it takes a set of trees.
As we start the algorithm, the set of trees contains the root trees for the uninteresting and the interesting commits.

https://static.mcmap.net/file/mcmap/ZG-AbGLDKwfkX7XiaFfnbw2tZVMwa1MvXn3QWRft/devops/wp-content/uploads/sites/6/2019/05/sparse-push-new-algorithm.png

The new tree walk recursively explores paths containing interesting and uninteresting trees.
Inside the trees at B, we have subtrees with names F and G.
Both sets have interesting and uninteresting paths, so we recurse into each set. This continues into B/F and B/G. The B/F set will not recurse into B/F/M or B/F/N and the B/G set will not recurse into B/G/X but not B/G/Y.

Sump answered 16/5, 2019 at 16:28 Comment(0)
H
0

It sounds like a line ending problem.

If you checkout a repo on a Windows machine the Unix (LF) line endings will be converted to Windows (CR LF).
When you commit, Git will think all the files have been updated because all the line endings will have changed.

You can configure Git to manage this for you with this command:

git config --global core.autocrlf true

Hydrangea answered 3/6, 2018 at 13:12 Comment(3)
"because all the line endings will have changed", well, only if you open the files with some editor that would overwrite them, right? Doesn't sound like it's the case for me.Conquian
I think if you checkout a repo on a Windows system Git will convert all the file endings, regardless of opening the files or not. Try git checkout master, git status. Presumably you haven't made any changes on master so if all the files are listed as being modified in the Git status then the file endings have probably changed.Hydrangea
@Hydrangea Actually it does it behind the scenes so the files are not marked as changed.Valentinevalentino

© 2022 - 2024 — McMap. All rights reserved.