Skip processing of Git revisions in post-receive hook that have already been previously processed
Asked Answered
S

5

5

I have a git post-receive hook that extracts all the revisions that were added during a "git push" and does some processing on each one (such as sending notification emails). This works great except when merging; e.g.:

  1. I make some commits on branch1 and then push branch1. The post-receive hook processes the commits correctly.
  2. I merge branch1 into branch2 and then push branch2. The post-receive hook processes all the merged commits a second time.

How can I avoid this? Below is the beginning of my post-receive hook where I extract the commits that should be processed (at the end $COMMITS holds the list of commits to process).

#!/bin/sh

REPO_PATH=`pwd`
COMMITS=''

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# for each ref that was updated during the push
while read OLD_REV NEW_REV REF_NAME; do
  OLD_REV="`git rev-parse $OLD_REV`"
  NEW_REV="`git rev-parse $NEW_REV`"
  if expr "$OLD_REV" : '0*$' >/dev/null; then
    # if the branch was created, add all revisions in the new branch; skip tags
    if ! expr "$REF_NAME" : 'refs/tags/' >/dev/null; then
      REF_REV="`git rev-parse $REF_NAME`"
      REF_NAME="`git name-rev --name-only $REF_REV`"
      COMMITS="$COMMITS `git rev-list $REF_NAME | git name-rev --stdin | grep -G \($REF_NAME.*\) | awk '{ print $1 }' | tr '\n' ' '`"
    fi

  elif expr "$NEW_REV" : '0*$' >/dev/null; then
    # don't think branch deletes ever hit a post-receive hook, so we should never get here
    printf ''
  else
    # add any commits in this push
    COMMITS="$COMMITS `git rev-parse --not --all | grep -v $(git rev-parse $REF_NAME) | git rev-list --reverse --stdin $(git merge-base $OLD_REV $NEW_REV)..$NEW_REV | tr '\n' ' '`"
  fi
done
Salinasalinas answered 2/5, 2012 at 18:6 Comment(0)
G
8

Look at $(prefix)/share/git-core/contrib/hooks/post-receive-email, which does just what (I think) you want. Basically it uses git for-each-ref to find the names of all branches, and then exclude every commit that's reachable from some branch other than the one being updated:

if [ "$change_type" = create ]
then
    # Show all revisions exclusive to this (new) branch.
    revspec=$newrev
else
    # Branch update; show revisions not part of $oldrev.
    revspec=$oldrev..$newrev
fi

other_branches=$(git for-each-ref --format='%(refname)' refs/heads/ |
     grep -F -v $refname)
git rev-parse --not $other_branches | git rev-list --pretty --stdin $revspec

(I've simplified it here, and hopefully not damaged anything in my cut-and-paste job. The inputs here are: $change_type is create if $oldrev is all-zeros, otherwise it's update; $oldrev is the old rev SHA1 from the line recently-read from stdin; $newrev is the new rev SHA1; and $refname is the full name, e.g., refs/heads/topic.)

Gothart answered 3/5, 2012 at 5:30 Comment(2)
Thanks for this, it does just what I need. The remaining corner case, using this code, is when you make a commit, merge it into another branch, and then push both changes together at once. In that case, the revision in each branch is accessible from the other branch, and so it isn't processed at all. However, this is acceptable for us for now, because that is a relatively uncommon case.Salinasalinas
Interesting! If you did this in the pre-receieve hook instead of the post-receive, it would probably scan those twice, because neither ref will be updated. Perhaps if you do them in the update hook it will scan them exactly once? (Haven't tried this myself, obviously.)Gothart
M
1

What we do is to keep the hash of the previously processed commits in a text file. Every time the hook runs, it looks in that file to check if a given commit has already been processed or not. If it did not process that commit yet, process it and then log that commit to the file.

This is not very scalable, as the text files would only grow as more commits are added to the repository and the time to check for a given commit would also grow.

Misfortune answered 2/5, 2012 at 19:47 Comment(2)
One way to improve on this is to simply trim the file periodically, consistent with whatever your policy is on how frequently you merge branches.Gifford
Thanks for your answer. I'm going to go with torek's solution above, but your answer would also avoid the remaining corner case I mentioned there. However, tracking commits in a text file is more of a complication than I want to introduce right now.Salinasalinas
G
0

We did this by having the post-receive hook stop processing when it encountered a merge commit (a commit with two or more parents). This requires a bit of discipline when pushing merges to ensure that other "real" commits aren't thrown out. The discipline is to always push before merging and then push the merge separately.

Gifford answered 2/5, 2012 at 20:53 Comment(0)
B
0

I implemented this completely in a post-receive hook. It notifies trac of only new commits since the last fetch without duplicating, regardless of whether the new commits were pushed to a single branch or multiple branches at the same time. This method keeps a file called TRAC_HEAD in your git directory for tracking which commits have already been processed.

It is recommended that you run cat refs/heads/* > TRAC HEAD in your .git directory before enabling the hook.

#!/bin/sh
#
# Reads and notifies trac of only new commits that have not yet been dealt with.
#
# The "post-receive" script is run after receive-pack has accepted a pack
# and the repository has been updated.  It is passed arguments in through
# stdin in the form
#  <oldrev> <newrev> <refname>
# For example:
#  aa453216d1b3e49e7f6f98441fa56946ddcd6a20 68f7abf4e6f922807889f52bc043ecd31b79f814 refs/heads/master
#

TRAC_PATH="/path/to/trac/env"

# Read the standard input
while read oldrev newrev refname ; do

        echo "Processing branch: $refname"

        # Read the last revisions for each branch from TRAC_HEAD
        exclusions=$(cat TRAC_HEAD | uniq |  sed -e 's/^/^/' -e 's/ / ^/g' | xargs echo)

        echo "Exclusion list: $exclusions"

        git rev-list --reverse $newrev $exclusions | while read rev ; do
                trac-admin $TRAC_PATH changeset added '(default)' $rev
                echo "Processed: $rev"
        done

        # Add to the exclusions file the latest revision from this branch
        echo $newrev >> TRAC_HEAD

done

# Update the TRAC_HEAD file
cat refs/heads/* > TRAC_HEAD
Balsamiferous answered 27/4, 2013 at 0:6 Comment(0)
F
0

Like @Matt White noted, the approach taken in $(prefix)/share/git-core/contrib/hooks/post-receive-email can be trivially circumvented by pushing multiple refs containing the same new commits in the same push.

@BrunoOliveira and @Kenaniah both have workarounds that involve persisting extra information.

I believe there is also a viable approach that involves explicitly passing "other refs previously considered in this push" into the --not argument's --exclude-list, as you iterate through the refs in the push:

#!/bin/bash
#

while read oldrev newrev refname
do
        if expr "$newrev" : '0*$' >/dev/null
        then
                echo "---Deleted: $refname---"
        else
                EXCLUDE_REFS+=($refname)

                # this is not safe against special characters in ref names; fixes welcome!
                new_commits_command=$(echo git rev-list "$refname" --not "${EXCLUDE_REFS[@]/#/--exclude }" --all --)
                echo "---New or updated: $refname--- (with" $($new_commits_command | wc -l) "new and unique commits)"
                $new_commits_command
        fi

done

The main problem with this approach, as far as I can tell, is that you end up with as many arguments to the git rev-list command as twice the number of refs being pushed - this might eventually cause an error. I've tested 1,000 2,000 refs without a problem on my Ubuntu environment, I don't know where the limit is likely to be for any given server environment and ref pattern.

Generally speaking, if you're operating a server where you do this kind of checking, I assume you'd actually want to have a limit on the number of refs that can be pushed in one go anyway, in a pre-receive hook.

Fuze answered 5/4, 2022 at 9:15 Comment(1)
Ten years later, but this is a nice improvement :)Salinasalinas

© 2022 - 2024 — McMap. All rights reserved.