Why do excluded files keep reappearing in my git sparse checkout?
Asked Answered
J

3

13

I use the GCC git mirror and because I only use the C and C++ front ends I use git's sparse checkout feature to exclude the hundreds of files I don't need:

$ git config core.sparseCheckout
true
$ cat .git/info/sparse-checkout 
/*
!gnattools/
!libada/
!libgfortran/
!libgo/
!libjava/
!libobjc/
!libquadmath/
!gcc/ada/
!gcc/fortran/
!gcc/go/
!gcc/java/
!gcc/objc/
!gcc/objcp/
!gcc/testsuite/ada/
!gcc/testsuite/gfortran.dg/
!gcc/testsuite/gfortran.fortran-torture/
!gcc/testsuite/gnat.dg/
!gcc/testsuite/go.dg/
!gcc/testsuite/go.go-torture/
!gcc/testsuite/go.test/
!gcc/testsuite/objc/
!gcc/testsuite/objc.dg/
!gcc/testsuite/obj-c++.dg/
!gcc/testsuite/objc-obj-c++-shared/

This works for a while, but then now and then I notice that some of those excluded files have returned, sometimes lots of them:

$ ls gnattools/
ChangeLog  configure  configure.ac  Makefile.in
$ ls  gcc/fortran/ | wc -l 
86

I'm not sure exactly when the files reappear, I do a lot of switching to different branches (both remote-tracking and local) and it's a very busy repo so there are new changes to pull frequently.

As a relative newbie to git I don't know how to "reset" my work tree to get rid of those files again.

As an experiment, I tried disabling sparse checkout and pulling, thinking I could enable sparseCheckout again afterwards to update the tree somehow, but that didn't work very well:

$ git config core.sparseCheckout false
$ git config core.sparseCheckout 
false
$ git pull
remote: Counting objects: 276, done.
remote: Compressing objects: 100% (115/115), done.
remote: Total 117 (delta 98), reused 0 (delta 0)
Receiving objects: 100% (117/117), 64.05 KiB, done.
Resolving deltas: 100% (98/98), completed with 64 local objects.
From git://gcc.gnu.org/git/gcc
   7618909..0984ea0  gcc-4_5-branch -> origin/gcc-4_5-branch
   b96fd63..bb95412  gcc-4_6-branch -> origin/gcc-4_6-branch
   d2cdd74..2e8ef12  gcc-4_7-branch -> origin/gcc-4_7-branch
   c62ec2b..fd9cb2c  master     -> origin/master
   2e2713b..29daec8  melt-branch -> origin/melt-branch
   c62ec2b..fd9cb2c  trunk      -> origin/trunk
Updating c62ec2b..fd9cb2c
error: Your local changes to the following files would be overwritten by merge:
        gcc/fortran/ChangeLog
        gcc/fortran/iresolve.c
        libgfortran/ChangeLog
        libgfortran/io/intrinsics.c
Please, commit your changes or stash them before you can merge.
Aborting

So apparently I've got local modifications to files I never asked for and AFAIK have never touched!

But git status doesn't show those changes:

$ git st
# On branch master
# Your branch is behind 'origin/master' by 9 commits, and can be fast-forwarded.
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       libstdc++-v3/53270.txt
#       libstdc++-v3/TODO

I've tried git read-tree -m -u HEAD but it doesn't do anything.

So my questions are:

  • Why do the files reappear?
  • How do I make them disappear again?
  • How do I prevent them coming back?
  • Is this possibly related to the fact my .git/info/exclude file contains references to files in the directories supposed to be excluded (i.e. named with !) in the sparse-checkout file? I followed the instructions to ignore the same files that SVN does

    $ git svn show-ignore >> .git/info/exclude

So my exclude files includes paths such as

# /gcc/fortran/
/gcc/fortran/TAGS
/gcc/fortran/TAGS.sub
/gcc/fortran/gfortran.info*

Which would be below one of the directories named in the sparse-checkout file:

!gcc/fortran/

I've tried to reproduce the problem with a test repo that I clone a few copies of and edit each of them, create/switch/delete branches and merge changes between them, but it never goes wrong in my toy testcases. The GCC repo is a bit big (over 2GB) and the time between "failures" (on the order of a week or two) too long to expect people to try to reproduce the problem exactly. I haven't experimented with having the same paths in sparse-checkout and exclude, as it only occurred to me today there might be a conflict there.

I asked about this on #git on freenode a few weeks ago and IIRC was basically told "it's probably a bug, noone uses sparse checkout" but I'm hoping for a better answer ;-)

Update:

The most recent time I saw the problem actually happen (i.e. the files weren't there, then appeared after a single command) was doing a pull from the upstream origin:

   bac6f1f..6c760a6  master     -> origin/master

and among the changes shown were these renames:

 create mode 100644 libgo/go/crypto/x509/root.go
 rename libgo/go/crypto/{tls => x509}/root_darwin.go (90%)
 rename libgo/go/crypto/{tls => x509}/root_stub.go (51%)
 rename libgo/go/crypto/{tls => x509}/root_unix.go (76%)
 create mode 100644 libgo/go/crypto/x509/root_windows.go

Before the pull the libgo directory was absent, as desired. After the pull that dir was present and these files (and no others) were under it:

$ ls libgo/go/crypto/x509/root_<TAB>
root_darwin.go  root_stub.go    root_unix.go    

I don't know if the renamed files lost their skip-worktree bit, how do I check that?

I'm pretty sure the problem doesn't always happen when there are renames, because e.g. the libgfortran/ChangeLog file shown in an example above is not a new file or recently renamed.

Jabon answered 22/6, 2012 at 0:6 Comment(10)
Could it be that these files are generated? For example during some configuration or a certain build target? This happens a lot especially with ChangeLog Have you tried deleting them and continue working, but this time whatever you do, check if the files appear again or not? My guess is, git doesn't work with them and that is why it also doesn't show them in git statusMilky
No, they are tracked files and not generated. GCC's ChangeLog files are not generated, they're manually edited and committed. See my new edit, showing an example the problem that I observed happening after running a command.Jabon
Have you tried looking at the contents? Sometimes some Makefiles touch ChangeLog just as a flag for their build. If your gcc/fortran/ChangeLog is empty, this could be it. Also, it is possible for some one to have added those files to the repository by mistake.Milky
Never mind, I just saw your updateMilky
I always build GCC in a separate dir from the source, so the makefiles never touch the source tree (that's by design, so you can build from sources on read-only media.)Jabon
Also, if git didn't work with those files it wouldn't say "Your local changes to the following files would be overwritten by merge" -- they are tracked filesJabon
Hm, you can try to fix git ;) Or consider some shell script to remove unwanted files each checkout.Dibble
Which version of git are you currently using?Such
1.7.7.6, from the Fedora 16 rpmJabon
I have this issue, on git 1.8.3.msysgit.0 (windows). It's not intermittent - there are certain circumstances (which unfortunately only seem to be in a repo with lots of files) where it happens every time. Further, the files that are "left in place" that should not be have an indeterminate git state: they are neither "untracked" nor (when I delete or edit them) do they show as being tracked and modified.Grounder
C
4

The skip-worktree bit can be modified with git update-index --skip-worktree. When you notice the files present you can check git ls-files -v |grep ^S (S being a file marked with skip-worktree).

But as the #git folks say, if you see odd behavior it is most likely a bug in git. After all, this is quite esoteric feature. You should probably report your findings to the git mailing list.

Edit: Also, if you are using git 1.7.7.6, I strongly recommend upgrading. 1.7.10 tree is way ahead, and I think there is a strong chance it will fix your problems.

Chandra answered 27/6, 2012 at 8:17 Comment(3)
Excellent, thanks for those commands. I'll automate checking for unwanted files now so I can tell exactly when they appear. Without more information about how to reproduce it I doubt a bug report will get much attention, so I'll keep investigating before I report it.Jabon
I haven't seen any problems recently using git 1.7.10.5 but as it was only intermittent I'm not yet willing to say upgrading has definitely fixed it.Jabon
Still no problems, so I think there was a bug that is fixed in the recent versions. It's too late to give you the bounty but you can have my first ever SO tick - thanks!Jabon
E
1

In my case, I was performing some unit tests on a repo using a sparse checkout. One of my test cases created commits that contained files that were not included in my sparse checkout sub-tree list.

When I attempted to git reset --hard 123456, I received the following error:

error: Entry 'a.c' not uptodate. Cannot update sparse checkout.
fatal: Could not reset index file to revision '123456'.

The solution was to get remove the files in my working tree by re-applying the sparse-checkout rules:

git read-tree -mu HEAD
Estelleesten answered 19/11, 2015 at 21:45 Comment(1)
I ran into another case where the sparse-checkout file would do nothing at all on git version 2.6.2. After deleting the repo, cloning it, putting the exact same sparse-checkout file back, everything was ok. I am guessing I may have run into a bug in git.Estelleesten
H
1

Check if the issue persist in the latest Git 2.13 (Q2 2017, 5 years later).
Any skip-worktree file should not be modified or even looked at during a sparse checkout anymore, because:

The preload-index code has been taught not to bother with the index entries that are paths that are not checked out by "sparse checkout".

See commit e596acc (10 Feb 2017) by Jeff Hostetler (jeffhostetler).
(Merged by Junio C Hamano -- gitster -- in commit c7e234f, 27 Feb 2017)

preload-index: avoid lstat for skip-worktree items

Teach preload-index to avoid lstat() calls for index-entries with skip-worktree bit set.
This is a performance optimization.

During a sparse-checkout, the skip-worktree bit is set on items that were not populated and therefore are not present in the worktree.
The per-thread preload-index loop performs a series of tests on each index-entry as it attempts to compare the worktree version with the index and mark them up-to-date.
This patch short-cuts that work.

On a Windows 10 system with a very large repo (450MB index) and various levels of sparseness, performance was improved in the {preloadindex=true, fscache=false} case by 80% and in the {preloadindex=true, fscache=true} case by 20% for various commands.


With Git 2.27 (Q2 2020), "sparse-checkout" manages skip-worktree differently.

See commit 5644ca2, commit 681c637, commit ebb568b, commit 22ab0b3, commit 6271d77, commit 1ac83f4, commit cd002c1, commit 4ee5d50, commit f56f31a, commit 7af7a25, commit 30e89c1, commit 3cc7c50, commit b0a5a12, commit 72064ee, commit fa0bde4, commit d61633a, commit d7dc1e1, commit 031ba55 (27 Mar 2020) by Elijah Newren (newren).
(Merged by Junio C Hamano -- gitster -- in commit 48eee46, 29 Apr 2020)

unpack-trees: failure to set SKIP_WORKTREE bits always just a warning

Reviewed-by: Derrick Stolee
Signed-off-by: Elijah Newren

Setting and clearing of the SKIP_WORKTREE bit is not only done when users run 'sparse-checkout'; other commands such as 'checkout' also run through unpack_trees() which has logic for handling this special bit. As such, we need to consider how they handle special cases.

A couple comparison points should help explain the rationale for changing how unpack_trees() handles these bits:

  • Ignoring sparse checkouts for a moment, if you are switching branches and have dirty changes, it is only considered an error that will prevent the branch switching from being successful if the dirty file happens to be one of the paths with different contents.

  • SKIP_WORKTREE has always been considered advisory; for example, if rebase or merge need or even want to materialize a path as part of their work, they have always been allowed to do so regardless of the SKIP_WORKTREE setting.
    This has been used for unmerged paths, but it was often used for paths it wasn't needed just because it made the code simpler.
    It was a best-effort consideration, and when it materialized paths contrary to the SKIP_WORKTREE setting, it was never required to even print a warning message.

In the past if you trying to run e.g. 'git checkout' and:

  1. you had a path that was materialized and had some dirty changes
  2. the path was listed in $GITDIR/info/sparse-checkout
  3. this path did not different between the current and target branches

then despite the comparison points above, the inability to set SKIP_WORKTREE was treated as a hard error that would abort the checkout operation.

This is completely inconsistent with how SKIP_WORKTREE is handled elsewhere, and rather annoying for users as leaving the paths materialized in the working copy (with a simple warning) should present no problem at all.

Downgrade any errors from inability to toggle the SKIP_WORKTREE bit to a warning and allow the operations to continue.

So the message is no longer:

error: The following untracked working tree files would be overwritten by checkout:

But:

warning: The following paths were already present and thus not updated despite sparse patterns:

With Git 2.28 (Q3 2020), The behaviour of "sparse-checkout" in the state "git clone --no-checkout" left was changed accidentally in 2.27, which has been corrected.

See commit b5bfc08 (05 Jun 2020) by Elijah Newren (newren).
(Merged by Junio C Hamano -- gitster -- in commit a554228, 18 Jun 2020)

sparse-checkout: avoid staging deletions of all files

Signed-off-by: Elijah Newren

sparse-checkout's purpose is to update the working tree to have it reflect a subset of the tracked files.

As such, it shouldn't be switching branches, making commits, downloading or uploading data, or staging or unstaging changes.

Other than updating the worktree, the only thing sparse-checkout should touch is the SKIP_WORKTREE bit of the index.

In particular, this sets up a nice invariant: running sparse-checkout will never change the status of any file in git status (reflecting the fact that we only set the SKIP_WORKTREE bit if the file is safe to delete, i.e. if the file is unmodified).

Traditionally, we did a _really_ bad job with this goal.

The predecessor to sparse-checkout involved manual editing of .git/info/sparse-checkout and running git read-tree -mu HEAD.

That command would stage and unstage changes and overwrite dirty changes in the working tree.

The initial implementation of the sparse-checkout command was no better; it simply invoked git read-tree -mu HEAD as a subprocess and had the same caveats, though this issue came up repeatedly in review comments and workarounds for the problems were put in place before the feature was merged [1, 2, 3, 4, 5, 6; especially see 4 & 6].

However, these workarounds, in addition to disabling the feature in a number of important cases, also missed one special case.

I'll get back to it later.

In the 2.27.0 cycle, the disabling of the feature was lifted by finally replacing the internal equivalent of git read-tree -mu HEAD with something that did what we wanted: the new update_sparsity() function in unpack-trees.c that only ever updates SKIP_WORKTREE bits in the index and updates the working tree to match.

This new function handles all the cases that were problematic for the old implementation, except that it breaks the same special case that avoided the workarounds of the old implementation, but broke it in a different way.

So...that brings us to the special case: a git clone performed with --no-checkout.

As per the meaning of the flag, --no-checkout does not check out any branch, with the implication that you aren't on one and need to switch to one after the clone.

Implementationally, HEAD is still set (so in some sense you are partially on a branch), but:

  • the index is "unborn" (non-existent)
  • there are no files in the working tree (other than .git/)
  • the next time git switch (or git checkout) is run it will run unpack_trees with initial_checkout flag set to true.

It is not until you run, e.g. git switch <somebranch> that the index will be written and files in the working tree populated.

With this special --no-checkout case, the traditional read-tree -mu HEAD behavior would have done the equivalent of acting like checkout -- switch to the default branch (HEAD), write out an index that matches HEAD, and update the working tree to match.

This special case slipped through the avoid-making-changes checks in the original sparse-checkout command and thus continued there.

After update_sparsity() was introduced and used (see commit f56f31af03 ("sparse-checkout: use new update_sparsity() function", 2020-03-27, Git v2.27.0-rc0 -- merge listed in batch #5)), the behavior for the --no-checkout case changed: Due to git's auto-vivification of an empty in-memory index (see do_read_index() and note that must_exist is false), and due to sparse-checkout's update_working_directory() code to always write out the index after it was done, we got a new bug.

That made it so that sparse-checkout would switch the repository from a clone with an "unborn" index (i.e. still needing an initial_checkout), to one that had a recorded index with no entries.

Thus, instead of all the files appearing deleted in git status being known to git as a special artifact of not yet being on a branch, our recording of an empty index made it suddenly look to git as though it was definitely on a branch with ALL files staged for deletion!
A subsequent checkout or switch then had to contend with the fact that it wasn't on an initial_checkout but had a bunch of staged deletions.

Make sure that sparse-checkout changes nothing in the index other than the SKIP_WORKTREE bit; in particular, when the index is unborn we do not have any branch checked out so there is no sparsification or de-sparsification work to do.

Simply return from update_working_directory() early.


With Git 2.35 (Q1 2022), various operating modes of "git reset"(man) have been made to work better with the sparse index.

See commit f2a454e, commit 4d1cfc1, commit 20ec2d0, commit c01b1cb, commit 291d77e (29 Nov 2021), commit 86609db, commit 71471b2 (27 Oct 2021), and commit 1f86b7c (07 Oct 2021) by Victoria Dye (vdye).
(Merged by Junio C Hamano -- gitster -- in commit f085087, 10 Dec 2021)

sparse-index: update command for expand/collapse test

Helped-by: Derrick Stolee
Signed-off-by: Victoria Dye

In anticipation of git reset --hard(man) being able to use the sparse index without expanding it, replace the command in sparse-index is expanded and converted back with git reset -- folder1/a(man) .
This command will need to expand the index to work properly, even after integrating the rest of reset with sparse index.

Note: Git 2.36 (Q2 2022) fixes a bug in unpack-trees introduced in 2.35 just above:

See commit 99430aa, commit bfc763d, commit c3a9cec (17 Mar 2022) by Victoria Dye (vdye).
(Merged by Junio C Hamano -- gitster -- in commit d629667, 29 Mar 2022)

Revert "unpack-trees: improve performance of next_cache_entry"

Signed-off-by: Victoria Dye

This reverts commit f2a454e (unpack-trees: improve performance of next_cache_entry, 2021-11-29, Git v2.35.0-rc0 -- merge listed in batch #2) (unpack-trees: improve performance of next_cache_entry, 2021-11-29).

The "hint" value was originally needed to improve performance in 'git reset -- <pathspec>'(man) caused by 'cache_bottom' lagging behind its correct value when using a sparse index.
The 'cache_bottom' tracking has since been corrected, removing the need for an additional "pseudo-cache_bottom" tracking variable.

Homemade answered 26/3, 2017 at 19:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.