How to check that all git-lfs tracked and committed files are pointers?
Asked Answered
A

4

12

As git-lfs requires some manual setup (install git-lfs, run git lfs install once) this can lead to developers not committing git-lfs tracked file types correctly. I would like to check that for pull requests on our continuous integration system.

How to check that all git-lfs tracked and committed files are pointers? There is a check which is run e.g. when rebasing but this is not available as cli-command.

I would like to have something like this:

$ git clone https://...
$ [git lfs check-for-pointers]
Encountered 35 file(s) that should have been pointers, but weren't:
    file1.png
    ...
Arvo answered 2/11, 2017 at 9:54 Comment(2)
I was just looking for a command like this! Did you ever find a solution?Towel
Not yet... I will start a bounty...Arvo
G
11

Git LFS doesn't have a way to do this natively, although you could open an issue on the GitHub repository if you'd like to see support for it. In the mean time, you can implement this using a technique like the following:

git ls-files | git check-attr --stdin filter | \
awk -F': ' '$3 ~ /lfs/ { print $1}' | \
xargs -L1 sh -c 'git cat-file blob "HEAD:$0" | \
    git lfs pointer --check --stdin || { echo "$0"; false; }'

If this command produces any output, then there is an invalid pointer file, and it should print which file it is. It will also exit zero if everything is okay, and nonzero if there's a broken file.

This does have the limitation that it doesn't handle file names with a colon-space or newline in them; if that matters, you'll have to use the -z option and run things through perl or ruby instead of awk.

Greening answered 9/8, 2019 at 10:47 Comment(4)
This work nicely, but the length of the pipeline makes it a bit hard to read (especially on SO itself). In my own code I broke it up onto multiple lines with a backslash after each pipe, and that may help with readability here as well.Stickybeak
Thanks, can you please elaborate on it doesn't handle file names with a colon-space or newline in them; if that matters, you'll have to use the -z option and run things through perl or ruby instead of awk.Melisma
It uses awk -F': ', so it's intrinsically newline-based and any part of the file name after the first : won't be included. If you need to handle arbitrary file names, you need to use the NUL-based delimiting with git ls-files -z and adjust the rest of the pipeline accordingly. This comment is too small to go much further into depth, sorry.Greening
@Melisma , for git ls-files -z, see this answer: github.com/git-lfs/git-lfs/issues/4091#issuecomment-758023921Crist
Y
1

Here is an alternative solution that might be a bit faster:

rm .git/index && ! git reset --hard HEAD 2>&1 | grep 'that should have been pointers'

Please be warned that this solution will delete data that is not yet committed.

Yarmouth answered 10/9, 2019 at 7:22 Comment(2)
Nice one! With MSYS2, on a repo with 23k files this command took 48 seconds (mainly spent to rebuild the index) vs 18 min and 53 secs for the ls-files -z/perl/xargs command, to report the same ~900 files. And it did this without the puzzling errors “fatal : unable to stream 777785ca07bc371afa7593484fc78245802850e3 to stdout” that the longer command printed. Now just a caveat: the pipe to grep will only give the summary line that indicates how many files it found. So omit that last part of the command. And the exclamation mark isn’t needed either.Footstall
A complete solution based on this answer would be mv .git/index this_is_my_git_index_I_am_afraid_to_lose && LANG=en git reset --hard HEAD 2>&1 | awk '/Encountered [0-9]+ files that should have been pointers/{n=$2;next} n-->0 && $0=substr($0, index($0,$1))'. It prints just the list of the files, without the leading spaces.Footstall
C
0

For those who are not scanning comments below answers;)

In addition to bkk2204 (git ls-files -z for spaces and other creazy characters in file names) and thanks to calve from github

git ls-files -z | git check-attr --stdin -z filter | \
perl -n0 -e 'chomp; push @x, $_; if (@x == 3) { print "$x[0]$/" if $x[2] eq "lfs"; @x=(); }' | \
xargs -0 -L1 --no-run-if-empty sh -c 'git cat-file blob "HEAD:$0" | git lfs pointer --check --stdin || { echo $0; false; }'

You might notice

  • -L option at xargs is supported by findutils package
Crist answered 5/5, 2021 at 15:36 Comment(0)
A
0

Such a command has been added: git lfs fsck --pointers

Note that by default only HEAD is checked. In a CI to guarantee a good usage of Git LFS you should probably check all new commits added, not just the last one (see how to pass a range of commits in the documentation below).

I could not find the documentation online, so here is the output of git lfs fsck --help:

git lfs fsck [options] [revisions]

Checks all Git LFS files in the current HEAD for consistency.

Corrupted files are moved to ".git/lfs/bad".

The revisions may be specified as either a single committish, in which
case only that commit is inspected; specified as a range of the form
A..B (and only this form), in which case that range is inspected; or
omitted entirely, in which case HEAD (and, for --objects, the index) is
examined.

The default is to perform all checks.

In your Git configuration or in a .lfsconfig file, you may set
lfs.fetchexclude to a comma-separated list of paths. If
lfs.fetchexclude is defined, then any Git LFS files whose paths match
one in that list will not be checked for consistency. Paths are matched
using wildcard matching as per gitignore(5).

Options:

--objects:
  Check that each object in HEAD matches its expected hash
  and that each object exists on disk.
--pointers:
  Check that each pointer is canonical and that each file
  which should be stored as a Git LFS file is so stored.
Adventitia answered 3/7 at 10:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.