Testing what is about to be committed in a pre-commit hook

Asked 7/9, 2012 at 14:36 Answered 10/5, 2018 at 17:26

git githooks pre-commit-hook pre-commit git-index

When a pre-commit hook runs, the repository might not be clean. So if you naively run your tests, they will not be against what you're committing, but whatever happens to be in your working tree.

The obvious thing to do is to git stash --keep-index --include-untracked at the start of the pre-commit and git pop at the end. That way you are testing against the (pure) index, which is what we want.

Unfortunately, this generates merge conflict markers if you use git add --patch (especially if you edit hunks), since the contents of stash@{0} might not match up against the work tree after commit.

Another common solution is to clone the repository and run the tests in a new temporary one. There are two issues with that:

we haven't committed yet, so we can't easily get a copy of the repository in the state we're about to commit; and
my tests might be sensitive to the location of the current working directory, for example because of local environment configuration.

How can I restore my work-tree to whatever state it was in before the git stash --keep-index --include-untracked, without introducing merge conflict markers, and without modifying the post-commit HEAD?

Windbound answered 7/9, 2012 at 14:36 Comment(10)

The pre-commit script receives the data being committed as input. Why do you need to look at anything else? Perhaps what you are trying to do would best be done in something other than a pre-commit hook. What sort of tests are you wanting to do that require access to the full repository? – Cockpit 7/9, 2012 at 17:15

@WilliamPursell: What do you mean by "the data being committed?". The pre-commit script runs in my work tree (i.e, the base of the source repository). The problem is that if you make some changes to the repository and only stage a few of them (e.g, you add some files but not others), then you will not be testing the commit before it happens (what I want to do), you'd be testing whatever you have in your working directory. – Windbound 8/9, 2012 at 18:16

The patch the you are committing is available on stdin to the pre-commit hook. What are you testing if not the patch that is being committed? The purpose of the pre-commit hook is to verify the patch. – Cockpit 8/9, 2012 at 18:17

@WilliamPursell: Also, I'm not sure what you mean by "full access to the repository". That's not an issue. The issue is getting the source code in the state which will in the HEAD after the commit. As an example of something I want to do, I would like to run my build scripts. In general, though, this might require a specific environment to work (i.e, it's path-dependent). So you can't just clone the repository elsewhere and run the build. – Windbound 8/9, 2012 at 18:19

Ah, I was not aware that the patch is on the stdin. But that doesn't quite help me if I want to run an in-place build, for example. – Windbound 8/9, 2012 at 18:20

The pre-commit hook is the wrong place to run your build scripts. Use the pre-commit hook for simple things like rejecting a patch with trailing white space or incorrect indentation. – Cockpit 8/9, 2012 at 18:37

@WilliamPursell, can you elaborate? My build takes a second or so. So there's no reason not to run it. If you don't think this is a legitimate use of pre-commit, you seem to imply that you know where is. Looking at the other hooks they don't look like what I want. In this case, I want to prevent things from being committed unless they build. This doesn't seem like an unreasonable use of pre-commit. If you believe it is, please explain why, and what a sensible alternative might be. – Windbound 8/9, 2012 at 20:44

I don't think this is the role of the vcs. I tend to perfer multiple small commits, and I believe it is not unreasonable for the build to fail on individual commits. Certainly it should succeed at any major (or any) merge point or at any tag, but verifying the build is best left to the developer prior to pushing to a public repo or making a tag. Purely personal opinion, of course. Certainly my biggest concern with using the pre-commit is build time discouraging commits (which should be frequent), which appears to be a non-issue in your case. – Cockpit 8/9, 2012 at 22:52

I mis-spoke. The patch is not on stdin by default. I had been looking at a perl script that was reading the patch from stdin, and didn't notice that I had redirected the output of git diff --cached onto that file descriptor. – Cockpit 8/9, 2012 at 23:51

I do use many small commits. I'd just like to leave the code in a state where it is easy to bisect. I'd often like to edit hunks in trivial ways with git add -p just before committing. Since my builds are so fast, I'd like to make sure that these tweaks don't break the build. – Windbound 9/9, 2012 at 8:37

git write-tree is useful in pre-commit hooks. It writes a tree into the repo of the index (this tree will be reused if and when the commit is finalised.)

Once the tree is written to the repo, you can use git archive | tar -x to write the tree to a temporary directory.

E.g.:

#!/bin/bash

TMPDIR=$(mktemp -d)
TREE=$(git write-tree)
git archive $TREE | tar -x -C $TMPDIR

# Run tests in $TMPDIR

RESULT=$?
rm -rf "$TMPDIR"
exit $RESULT

Querist answered 2/4, 2016 at 14:12 Comment(0)

If cloning the entire repo is too expensive, perhaps you just need a copy of the working directory. Making a copy would be simpler than trying to deal with conflicts. For example:

#!/bin/sh -e

trap 'rm -rf $TMPD' 0
mkdir ${TMPD=$PWD/.tmpdir}
git ls-tree -r HEAD | while read mod type sha name; do
    if test "$type" = blob; then
        mkdir -p $TMPD/$( dirname "$name" ) 
        git show $sha > $TMPD/"$name";
        chmod $mod $TMPD/"$name"
    fi
done
cd $TMPD
git diff --cached HEAD | patch
# Run tests here

This will dump the state of the tree as it will be after the commit in $TMPD, so you can run your tests there. You should get a temporary directory in a more secure fashion than is done here, but in order for the final diff to work (or to simplify the script and cd earlier), it must be a child of the working directory.

Cockpit answered 8/9, 2012 at 23:28 Comment(0)

If you can afford to use a temporary directory (ie. make a complete copy of the current checkout) you can use a temporary directory like so:

tmpdir=$(mktemp -d) # Or put it wherever you like
git archive HEAD | tar -xf - -C "$tmpdir"
git diff --staged | patch -p1 -d "$tmpdir"
cd "$tmpdir"
...

This is basically William Pursell's solution but takes advantage of git archive which makes the code simpler, and I expect will be faster.

Alternatively, by cd'ing first:

cd somewhere
git -C path/to/repo archive HEAD | tar -xf -
git -C path/to/repo diff --staged | patch -p1
...

git -C requires Git 1.8.5.

Lapstrake answered 29/5, 2014 at 12:11 Comment(2)

I've voted you up, but it doesn't satisfy "stay in $PWD when testing", which is necessary for some systems, such as go libraries. – Windbound 30/5, 2014 at 13:45

Besides pwaller's concerns, I believe this solution doesn't work if you're doing a git commit -a, because your script assumes that we're only committing staged files. – Landsknecht 7/2, 2016 at 15:9

I have found the following to be useful:

## bash declare -a files readarray -t files < <(git status --porcelain | perl -ane 'print $F[1],qq(\n) if m/^[ACM] /') # declare -a delfiles readarray -t delfiles < <(git status --porcelain | perl -ane 'print $F[1],qq(\n) if m/^D /') # declare -a huhfiles readarray -t huhfiles < <(git status --porcelain | perl -ane 'print $F[1],qq(\n) if m/^\? /')

It may be inefficient to call git status three times, but this code is less complex than calling once, storing in memory and looping over the results. And I don't think putting the results to a temp file and reading it off the disk three times would be faster. Maybe. I don't know. This was the first pass. Feel free to critique.

Depersonalization answered 10/5, 2018 at 17:26 Comment(0)

-1

I have finally found the solution I was looking for. Only the state of the index before commit is checked, and it leaves the index and working tree in exactly as it was before the commit.

If you see any problems or a better way, please do reply, either as a comment or your own answer.

This assumes that nothing else will try to stash or otherwise modify the git repository or working tree whilst it is running. This comes with no warranty, might be wrong and throw your code into the wind. USE WITH CAUTION.

# pre-commit.sh
REPO_PATH=$PWD
git stash save -q --keep-index --include-untracked # (stash@{1})
git stash save -q                                  # (stash@{0})

# Our state at this point:
# * clean worktree
# * stash@{0} contains what is to be committed
# * stash@{1} contains everything, including dirt

# Now reintroduce the changes to be committed so that they can be tested
git stash apply stash@{0} -q

git_unstash() {
    G="git --work-tree \"$REPO_PATH\" --git-dir \"$REPO_PATH/.git\"" 
    eval "$G" reset -q --hard             # Clean worktree again
    eval "$G" stash pop -q stash@{1}      # Put worktree to original dirty state
    eval "$G" reset -q stash@{0} .        # Restore index, ready for commit
    eval "$G" stash drop -q stash@{0}     # Clean up final remaining stash
}
trap git_unstash EXIT

... tests against what is being committed go here ...

Windbound answered 14/9, 2012 at 15:32 Comment(1)

As mentioned in the comments in the following post, this won't work correctly if amending a commit, or if you don't have a dirty working tree. codeinthehole.com/writing/tips-for-using-a-git-pre-commit-hook – Windbound 14/9, 2012 at 16:21

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags