How do I exclude files from git archive?
Asked Answered
D

5

19

Given a simple test repository with a single commit with two files, a and b, I can get a list of specific files:

$ git ls-files a
a

Or a list of all files excluding specific files:

$ git ls-files . ':!b'
a

I can create an archive of specific files:

$ git archive HEAD a | tar tf -
a

But I cannot create an archive of all files excluding specific files:

$ git archive HEAD . ':!b' | tar tf -
a
b

The option of using an archive of specific files is not an option for me in my real repository, as it exceeds the maximum command line argument length.

I know I can store the list of files to exclude in .gitattributes via the export-ignore attribute, but the list is dynamically generated. I can automatically change the file, but changes do not get picked up until after another commit.

Is there some other invocation that works without requiring another commit?

Dorie answered 3/8, 2016 at 8:1 Comment(0)
P
13

I think you almost nailed it: attributes can be read from several places, with .gitattributes being only the most common of them. The second one—considered a per-repository configuration—is $GIT_DIR/info/attributes.

To cite the manual:

Note that attributes are by default taken from the .gitattributes files in the tree that is being archived. If you want to tweak the way the output is generated after the fact (e.g. you committed without adding an appropriate export-ignore in its .gitattributes), adjust the checked out .gitattributes file as necessary and use --worktree-attributes option. Alternatively you can keep necessary attributes that should apply while archiving any tree in your $GIT_DIR/info/attributes file.

So, if possible, stick your list to that file and then do git archive.

Another approach is to not use git archive but instead merely tar the work tree passing tar the --exclude-from command-line option which accepts a file. This wouldn't work for a bare repository, but if you're OK with checking out stuff before archiving it, this can be done by doing git read-tree and git checkout-index supplied with the correct $GIT_INDEX_FILE and $GIT_WORK_TREE env. variables.

Another possible workaround is reversing the approach: tar (at least GNU tar) supports a lesser-known option of being able to delete stuff from an archive in a pipeline.

Basically, you can do

 $ tar -C a_path -c -f - . \
   | tar -f - --wildcards --delete '*.pdf' >result.tar

so that the first tar in the pipeline archives everything while the second one passes everything through except for files matching the *.pdf shell glob patten.

So if specifying files to delete using shell globs can be fitted to the command-line limit, just pipe the output of git archive to a tar prcocess which removes the stuff not needed.

Phonemic answered 3/8, 2016 at 8:44 Comment(2)
Thanks for the detailed answer. I think for me .git/info/attributes is not necessarily the most logical approach, but it fits best into what I have already, and if I need something more in the future I can change it to tar --delete.Dorie
This answer is unnecessarily complicated. See zett42's answer below for excluding files via the path argument of git archive and the exclude functionality of git pathspecs.Jibe
I
7

With Git version 2.20 (Windows) and Gitolite server (unknown version) this works for me to exclude files and folders named "b":

git archive HEAD . ":!b" | tar tf -

This also works:

git archive HEAD . ":(exclude)b" | tar tf -

Note that I have to use double-quotes on the Windows platform, not sure about other platforms.

This feature is part of pathspec (pattern used to limit paths in Git commands). Also see this answer.

Indefeasible answered 18/3, 2020 at 15:8 Comment(4)
Thanks. Practical example based on this: git archive -v -o eb-bundle.zip --format=zip HEAD . ":(exclude)data/local.js"Potted
Very nice! Is there some documentations about that ":(exclude)" syntax? On git archive docs there is nothing about that.Emperor
@LorenzoDonatisupportUkraine I've added links.Indefeasible
@Indefeasible Thanks!!! This info is "particularly well hidden" in the documentation shipped with GitForWindows! I would expect that any option/command requiring a pathspec would link to the relevant documentation, but nope!Emperor
T
4

You can create a tar and then delete folders and files that does not need to be inside

git archive HEAD -o archive.tar
tar -f archive.tar --delete listoffiles1
tar -f archive.tar --delete listoffiles2
tar -f archive.tar --delete listoffiles..
tar -f archive.tar --delete listoffilesN

this way you can split your command line to stay below the maximum cli argument length

Trigonal answered 3/8, 2016 at 8:35 Comment(0)
C
3

Instead of putting export-ignore into a (committed) .gitattributes, you can also put it into a (not committed) $GIT_DIR/info/attributes file. Or alternatively leave the .gitattributes uncommitted and use the --worktree-attributes option, also that's probably not so nice as it leaves your working tree dirty.

Chassidychassin answered 3/8, 2016 at 8:44 Comment(0)
B
0

One possible solution lies in the fact that git archive wants a tree-ish to archive.

You are passing it HEAD (the most common choice, probably). To make this do what you mean, this ref is automatically resolved to the object it points to – which is going to be a commit, obviously. And a commit object is resolved to the tree object attached to it. So you get the contents of the current commit. So far, so obvious.

But you can pass any tree object you want! And how does that help? Well you can always create a tree object from the current state of the index using git write-tree – which returns the SHA1 of the tree object it just created on stdout. You don’t have to create a commit or anything like that.

So you can just git rm --cached everything you don’t want in the tarball and then create a tree object to pass to git archive. And since you don’t care about the tree object otherwise, you can combine that into the git archive command:

git archive $( git write-tree )

After that you can git reset --hard and be on your way.

All together:

git rm --cached foo bar baz
git archive $( git write-tree )
git reset --all
Brucite answered 25/10, 2021 at 13:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.