What is the difference between `git lfs fetch`, `git lfs fetch --all`, and `git lfs pull`?
Asked Answered
N

1

23

Despite having used git for years, I find git lfs (git Large File Storage) to be pretty confusing to use, even at a very basic level. Can someone explain the difference between these 3 commands?:

  1. git lfs fetch
  2. git lfs fetch --all
  3. git lfs pull

Related:

  1. Pull ALL files from git LFS
Nev answered 14/6, 2022 at 1:5 Comment(0)
N
45

After a bunch of study and figuring out where the help pages are, here is what I have concluded.

Be warned, however, that Git LFS is slow, inefficient, online (as opposed to Git, which is offline), and evil. Read my full rant in my eRCaGuy_dotfiles repo.

How to use git lfs as a basic user

This covers the question: "What is the difference between git lfs fetch, git lfs fetch --all, git lfs pull, and git lfs checkout?"

Summary

# Fetch git lfs files for just the currently-checked-out branch or commit (Ex: 20
# GB of data). This downloads the files into your `.git/lfs` dir but does NOT
# update them in your working file system for the branch or commit you have 
# currently checked-out.
git lfs fetch

# Fetch git lfs files for ALL remote branches (Ex: 1000 GB of data), downloading
# all files into your `.git/lfs` directory.
git lfs fetch --all

# Fetch git lfs files for just these 3 branches (Ex: 60 GB of data)
# See `man git-lfs-fetch` for details. The example they give is:
# `git lfs fetch origin main mybranch e445b45c1c9c6282614f201b62778e4c0688b5c8`
git lfs fetch origin main mybranch1 mybranch2

# Check out, or "activate" the git lfs files for your currently-checked-out
# branch or commit, by updating all file placeholders or pointers in your
# active filesystem for the current branch with the actual files these git lfs
# placeholders point to.
git lfs checkout

# Fetch and check out in one step. This one command is the equivalent of these 2
# commands:
#       git lfs fetch
#       git lfs checkout
git lfs pull
#
# Note that `git lfs pull` is similar to how `git pull` is the equivalent
# of these 2 commands:
#       git fetch
#       git merge

So, a general, recommended workflow to check out your git files and your git lfs files might look like this:

git checkout main   # check out your `main` branch
git pull            # pull latest git files from the remote, for this branch
git lfs pull        # pull latest git lfs files from the remote, for this branch

# OR (exact same thing)
git checkout main   # check out your `main` branch
# (The next 2 commands replace `git pull`)
git fetch           # fetch the latest files from the remote for branch `main`
                        # into your locally-stored hidden remote-tracking branch
                        # named `origin/main`, for example
git merge           # merge the latest content (which you just fetched
                        # into your local hidden branch `origin/main`)
                        # into non-hidden branch `main`
# (The next 2 commands replace `git lfs pull`)
git lfs fetch       # fetch latest git lfs files from the remote, for this 
                        # branch
git lfs checkout    # check out all git lfs files for this branch, replacing 
                        # git lfs file placeholders with the actual files

Details

1. git lfs fetch

See man git-lfs-fetch, and git lfs fetch --help.

From git lfs fetch --help (emphasis added):

Download Git LFS objects at the given refs from the specified remote. See "Default remote" and "Default refs" for what happens if you don't specify.

This does not update the working copy.

So, this is just like doing git fetch (where it fetches remote contents to your locally-stored, remote-tracking hidden branches), except it is for git lfs-controlled files.

It fetches the git lfs file content to your .git/lfs directory I believe, but does NOT update your active file system (the currently checked-out branch) with those files.

From farther down in the help menu (emphasis added):

Default remote

Without arguments, fetch downloads from the default remote. The default remote is the same as for git fetch, i.e. based on the remote branch you're tracking first, or origin otherwise.

Default refs

If no refs are given as arguments, the currently checked out ref is used. In addition, if enabled, recently changed refs and commits are also included. See "Recent changes" for details.

Note that the "currently checked-out ref" refers to your currently-checked out branch or commit.

2. git lfs fetch --all

Whereas git lfs fetch fetches only the content for your currently-checked-out branch or commit, by default, git lfs fetch --all checks out ALL content for ALL remote branches. On a gigantic corporate mono-repo, that means that git lfs fetch might fetch 20 GB of data, whereas git lfs fetch --all might fetch 1000 GB of data. In such a case, do NOT include --all unless:

  1. You absolutely have to, OR
  2. The amount of data being fetched is still reasonably small, OR
  3. You have a script running overnight to do this, and your hard drive is big enough to fetch it all overnight, so that you don't have to wait for Git LFS's insanely slow and now online and time-inefficient git checkouts to run during your working hours, which wastes your time during the work day. For more info. on this, see:
    1. my full rant
    2. My detailed answer to How does git LFS track and store binary data more efficiently than git? [spoiler: it doesn't], especially the section of this answer titled, "When does normal git download files from the internet? A detailed look at git fetch vs git pull", where I compare and contrast the online portion of regular git vs git lfs.

From git lfs fetch --help (emphasis added):

* --all:

Download all objects that are referenced by any commit reachable from the refs provided as arguments. If no refs are provided, then all refs are fetched. This is primarily for backup and migration purposes. Cannot be combined with --recent or --include/--exclude. Ignores any globally configured include and exclude paths to ensure that all objects are downloaded.

3. git lfs pull

Just like git pull is the combination of git fetch and git merge, git lfs pull is the combination of git lfs fetch and git lfs checkout.

From git lfs pull --help (emphasis added):

git lfs pull [options] [<remote>]

Download Git LFS objects for the currently checked out ref, and update the working copy with the downloaded content if required.

This is equivalent to running the following 2 commands:

git lfs fetch [options] [<remote>]
git lfs checkout

So, that begs the question: "what does git lfs checkout do?":

4. git lfs checkout

This command copies the git lfs files from your .git/lfs directory to your active, working tree for the current reference (branch or commit) you have currently checked-out.

From git lfs checkout --help:

Try to ensure that the working copy contains file content for Git LFS objects for the current ref, if the object data is available. Does not download any content; see git lfs fetch for that.

Checkout scans the current ref for all LFS objects that would be required, then where a file is either missing in the working copy, or contains placeholder pointer content with the same SHA, the real file content is written, provided we have it in the local store. Modified files are never overwritten.

One or more <glob-pattern>s may be provided as arguments to restrict the set of files that are updated. Glob patterns are matched as per the format described in gitignore(5).

And it provides some examples. Ex:

Examples

  • Checkout all files that are missing or placeholders:

    $ git lfs checkout
    
  • Checkout a specific couple of files:

    $ git lfs checkout path/to/file1.png path/to.file2.png
    

Related

  1. My Q&A: How does git LFS track and store binary data more efficiently than git? [spoiler: it doesn't]
  2. My explanation in my question here, in this section: Update: don't use git lfs. I now recommend against using git lfs in our free GitHub repos
  3. My question, and lousy-workaround not-an-answer answer: How to resume git lfs post-checkout hook after failed git checkout - spoiler: my git lfs checkout kept failing at 97% complete while downloading 27 GB in 3 hours due to not enough disk space.
  4. My answer: Unix & Linux: All about finding, filtering, and sorting with find, based on file size - see the example near the end, titled "(Figure out which file extensions to add to git lfs next)".
  5. Other really useful git lfs info:
    1. Great article!: my developer planet: Git LFS: Why and how to use
    2. https://git-lfs.github.com/
    3. My repo and notes: https://github.com/ElectricRCAircraftGuy/eRCaGuy_dotfiles#how-to-clone-this-repo-and-all-git-submodules
    4. Very useful video!: What is Git LFS?: https://www.youtube.com/watch?v=9gaTargV5BY. I discovered this video from here: https://mcmap.net/q/12489/-how-git-lfs-work-do-i-need-to-do-quot-git-add-quot
  6. https://www.git-tower.com/learn/git/faq/difference-between-git-fetch-git-pull
  7. My answer to Can I "undo" `git lfs checkout?
Nev answered 14/6, 2022 at 1:5 Comment(6)
Thanks for the research and write-up. A couple of clarification questions: 1) Does git lfs fetch pull down LFS-tracked files just for the current commit, or for all the commits in history? (I hope it is the former.) 2) Once you run git lfs checkout, how do you "uncheckout" the files, i.e. go back to using placeholder files rather than the actual files in the working tree?Agitato
@GarretWilson, as shown in the code comments in my summary section, git lfs fetch fetches only "files for just the currently-checked-out branch or commit", whereas git lfs fetch --all fetches "git lfs files for ALL remote branches". As for how to replace files with placeholder links again, I don't know.Nev
@GarretWilson, you can't uncheckout yet. See my answer hereNev
@GabrielStaples your note at the top about not using git lfs is very misleading. You are actually advising against using git lfs on free GitHub accounts . There is nothing wrong with git lfs itself.Drysalter
@AndyMadge, you are right. It is misleading, but not in the way you think. I mean to say to not use it for work or home, paid or free. It can make git checkout take hours instead of seconds. I've updated my explanation to make this clear. Perhaps this could have also been mitigated (but not eliminated) by having a bigger ssd, as I now suggest at the bottom of my question here, and configuring a script which does git lfs fetch --all nightly.Nev
@AndyMadge, in both cases: corporate and free, with over 3 years of daily experience using it, I have found git lfs to be a massive time-waster.Nev

© 2022 - 2024 — McMap. All rights reserved.