make sure files are converted CRLF into LF in an update hook- is there a performance hit?
Asked Answered
S

1

6

There had been a lot of discussions about the core.autocrlf and core.safecrlf features in the current release and the next release. The question i have here relates to an environment where developers clone from a bare repository.

During the clone the autocrlf settings are enabled. But since the developers has full control on their clone, they can remove this autocrlf setting and proceed.

  1. We can specify files other than binary in the .gitattributes file but is there any other way GIT automatically determine if a file is a text file or binary file?

  2. Is there a way like an update hook (commit hook is not possible as developers can still remove it) that can be placed to make sure, the files (with CRLF) being pushed from a windows environment to a UNIX machine hosting the bare repo, is converted to UNIX EOL format (LF)?

  3. Will having such update hooks that scans each file for CRLF affect performance of a push operation?

Thanks

Sexennial answered 11/8, 2010 at 6:21 Comment(0)
O
6
  • 1/ Git itself has an heuristic to determine if a file is binary or text (similar to istext)

  • 2/ gergap weblog had recently (may 2010) the same idea.
    See his update hook here (reproduced at the end of this answer), but the trick is:
    Rather than trying to convert, the hook will simply reject the push if it detects an (supposedly) non-binary file with improper eol style.

Git converts LF->CRLF when checking out on Windows.
If the file contains already CRLF, Git is clever enough to detect that and does not expand it to CRCRLF what would be wrong. It keeps the CRLF, which means the file was implicitly changed locally during the checkout, because when committing it again, the wrong CRLF will be corrected to LF. That’s why GIT must mark these files as modified.

It’s good to understand the problem, but we need a solution that prevents that wrong line endi- ngs are pushed to the central repo.
The solution is to install an update hook on the central server.

  • 3/ There will be a small cost, but unless you push every 30 seconds, this shouldn't be an issue.
    Plus there is no actual conversion taking place: it the file is not correct, the push gets rejected.
    That places the conversion issue right back where it should belong: on the developer side.

#!/bin/sh
#
# Author: Gerhard Gappmeier, ascolab GmbH
# This script is based on the update.sample in git/contrib/hooks.
# You are free to use this script for whatever you want.
#
# To enable this hook, rename this file to "update".
#

# --- Command line
refname="$1"
oldrev="$2"
newrev="$3"
#echo "COMMANDLINE: $*"

# --- Safety check
if [ -z "$GIT_DIR" ]; then
    echo "Don't run this script from the command line." >&2
    echo " (if you want, you could supply GIT_DIR then run" >&2
    echo "  $0 <ref> <oldrev> <newrev>)" >&2
    exit 1
fi

if [ -z "$refname" -o -z "$oldrev" -o -z "$newrev" ]; then
    echo "Usage: $0 <ref> <oldrev> <newrev>" >&2
    exit 1
fi

BINARAY_EXT="pdb dll exe png gif jpg"

# returns 1 if the given filename is a binary file
function IsBinary() 
{
    result=0
    for ext in $BINARAY_EXT; do
        if [ "$ext" = "${1#*.}" ]; then
            result=1
            break
        fi
    done

    return $result
}

# make temp paths
tmp=$(mktemp /tmp/git.update.XXXXXX)
log=$(mktemp /tmp/git.update.log.XXXXXX)    
tree=$(mktemp /tmp/git.diff-tree.XXXXXX)
ret=0

git diff-tree -r "$oldrev" "$newrev" > $tree
#echo
#echo diff-tree:
#cat $tree

# read $tree using the file descriptors
exec 3<&0
exec 0<$tree
while read old_mode new_mode old_sha1 new_sha1 status name
do
    # debug output
    #echo "old_mode=$old_mode new_mode=$new_mode old_sha1=$old_sha1 new_sha1=$new_sha1 status=$status name=$name"
    # skip lines showing parent commit
    test -z "$new_sha1" && continue
    # skip deletions
    [ "$new_sha1" = "0000000000000000000000000000000000000000" ] && continue

    # don't do a CRLF check for binary files
    IsBinary $tmp
    if [ $? -eq 1 ]; then
        continue # skip binary files
    fi

    # check for CRLF
    git cat-file blob $new_sha1 > $tmp
    RESULT=`grep -Pl '\r\n' $tmp`
    echo $RESULT
    if [ "$RESULT" = "$tmp" ]; then
        echo "###################################################################################################"
        echo "# '$name' contains CRLF! Dear Windows developer, please activate the GIT core.autocrlf feature,"
        echo "# or change the line endings to LF before trying to push."
        echo "# Use 'git config core.autocrlf true' to activate CRLF conversion."
        echo "# OR use 'git reset HEAD~1' to undo your last commit and fix the line endings."
        echo "###################################################################################################"
        ret=1
    fi
done
exec 0<&3
# --- Finished
exit $ret
Och answered 11/8, 2010 at 6:57 Comment(9)
Thanks Von, this update hook dos helps a lot. So this script skips binary files or does it try to read binary files too? I still have a doubt on how GIT distinguishes between a binary and text file. Do we have to specify which files have LF in the .gitattributes file?, (like ur suggestion in this article #2517690) or GIT has other mechanisms to distinguish files?Sexennial
@Senthil: it is best to specify what is binary and what is not, otherwise Git use a simple heuristicOch
does GIT has the ability to find CRLFs(or git attribute) during the time of commit? Can this trigger be used as a pre-commit hook instead of an update hook?Sexennial
@Senthil: that would be interesting to try, but with some modifications, since in pre-commit, the git diff-tree might not have the same information it has on an update (where everything is already committed)Och
Does this still work nowadays? I tried it with git version 1.9.1 and realized that binary files are no longer ignored, because $1 inside IsBinary will be something like /tmp/git.update.K76hFg, so it doesn't contain any file extension.Iodize
@digorydoo I don't know, I have tested it in years. Plus, 1.9.1 sounds ancient. Even on Windows, you have Git 2.4.6 (github.com/git-for-windows/git/releases)Och
Sure it's an older version, but probably not as old as 2010 where this post was from. And it's the "current" version I get when I run apt-get install git on my Debian machine.Iodize
Actually I think there's a bug in that script. $tmp is the result from mktemp, which will never contain the file's suffix. IsBinary should be called with $name as argument rather than $tmp...Iodize
@digorydoo ok, don't hesitate to edit this answer in order to fix the script.Och

© 2022 - 2024 — McMap. All rights reserved.