How to cope with spaces in file names when iterating results from git diff --name-only
Asked Answered
V

5

13

A script I am working on needs to go through each file from a git diff. However, I don't know how to deal with spaces in the file name. Any files that have a space are split into "2 files". I know they need to be wrapped in " " but I don't know how to achieve that before it goes to the @ param.

When there are spaces in filenames, how should I iterate over the files from

git diff --name-only  $1

?

Here is a simple test that reproduces the error:

copyfiles()
{
    echo "Copying added files"
    for file in $@; do

        new_file=$(echo ${file##*/})

        directory=$(echo ${file%/*})
        echo "Full Path is is  $file"
        echo "File is  $new_file"
        echo "Directory is  $directory"
        cp $file $COPY_TO
    done    
}

COPY_TO="testDir"
DIFF_FILES=$( git diff --name-only  $1) 
copyfiles $DIFF_FILES 

The script is currently run like:

test.sh <git commit id>
Villosity answered 23/1, 2015 at 12:3 Comment(0)
I
4

Use -z to get git-diff to use null terminators. For example:

export COPY_TO
git diff -z --name-only | xargs -0 sh -c 'for file; do
    new_file=$(echo ${file##*/})
    directory=$(echo ${file%/*})
    echo "Full Path is is  $file"
    echo "File is  $new_file"
    echo "Directory is  $directory"
    cp "$file" "$COPY_TO"
done' sh

Note that the more reasonable solution is to refuse pull requests from people who create files with whitespace in the name.

Implacental answered 23/1, 2015 at 12:23 Comment(9)
love the reasonable solution, I cant understand why source files end up with white space!Villosity
can I ask why just adding the "-z" to my current script didnt work?Villosity
I can't work out how to use this in a way that allows me to run a function on each file..Villosity
If you are using bash, you can export the function with export -f and then do xargs -0 -I {} bash -c 'function_name {}'. This invokes the function once for each file rather than passing multiple filenames. I highly advise against this though, because exporting functions is wanky. Put it in a shell script instead.Implacental
Easier to just do xargs -0 bash -c 'function_name "$@"' bash, though. That will invoke the function with multiple arguments. The important thing is export -f function_nameImplacental
If i send it as multiple arguments would i not have the same problem with the spaces? When you say put it in a shell script do you mean a separate one that i invoke? At the moment i just copy and pasted the function contents into the ' ' and it does work, just feels a touch untidy :pVillosity
Using multiple arguments is only a problem if your function uses for file in $@ instead of for file; do or for file in "$@";do Implacental
Oh, how strange, is always best to use "do" then?Villosity
The "do" is required syntax for the for loop. The important distinction is the double quotes around $@. I mention the "do" in the second example to clarify that for file; do is identical to for file in "$@"; do, which behaves very differently from for file in $@; do.Implacental
S
7
git diff -z --name-only |
while read -d $'\0' file
do
    echo ${file}
done
Sylph answered 2/2, 2018 at 12:44 Comment(1)
You have to be careful: read -d only works in bash, not sh.Hatchet
H
6

The output from --name-only is subject to a certain amount of escaping. Unfortunately it is awkward to work with.

git diff explains the escaping (and an alternative) under the -z option:

-z

When --raw, --numstat, --name-only or --name-status has been given, do not munge pathnames and use NULs as output field terminators.

Without this option, each pathname output will have TAB, LF, double quotes, and backslash characters replaced with \t, \n, \", and \, respectively, and the pathname will be enclosed in double quotes if any of those replacements occurred.

An example:

$ git init ugh
$ cd ugh
$ touch 'spa ce' $'new\nline' $'t\tab'
$ ls # Unhelpful really
new?line  spa ce  t?ab
$ ls --quote # Minorly helpful but wrong (for shell usage)
"new\nline"  "spa ce"  "t\tab"
$ git add -A
$ git diff --cached --name-only
"new\nline"
spa ce
"t\tab"
$ git diff --cached --name-only -z # Doesn't copy and paste well and is a bit confusing to read this way
new
line^@spa ce^@t ab^@
$ printf %q\\n "$(git diff --cached --name-only -z )"
$'new\nlinespa cet\tab'

Anyway, the point here is that the best way to do this is to use the -z output and read the list of files with read.

while IFS= read -r -d '' file; do
    printf 'file = %q\n' "$file"
done < <(git diff --cached --name-only -z)

You could also pipe the output from git diff to the while loop but if you need variables from inside the loop once the loop is done you need this Process Substitution method to avoid the subshell problems with the pipe methodD.

Homocentric answered 23/1, 2015 at 12:26 Comment(1)
This answer was super helpful. Thank you. It allowed me to pass the outpuf from git diff --name-only -z as the input into git diff/git difftool. I demo that here: https://mcmap.net/q/450141/-how-to-cope-with-spaces-in-file-names-when-iterating-results-from-git-diff-name-onlyBirch
I
4

Use -z to get git-diff to use null terminators. For example:

export COPY_TO
git diff -z --name-only | xargs -0 sh -c 'for file; do
    new_file=$(echo ${file##*/})
    directory=$(echo ${file%/*})
    echo "Full Path is is  $file"
    echo "File is  $new_file"
    echo "Directory is  $directory"
    cp "$file" "$COPY_TO"
done' sh

Note that the more reasonable solution is to refuse pull requests from people who create files with whitespace in the name.

Implacental answered 23/1, 2015 at 12:23 Comment(9)
love the reasonable solution, I cant understand why source files end up with white space!Villosity
can I ask why just adding the "-z" to my current script didnt work?Villosity
I can't work out how to use this in a way that allows me to run a function on each file..Villosity
If you are using bash, you can export the function with export -f and then do xargs -0 -I {} bash -c 'function_name {}'. This invokes the function once for each file rather than passing multiple filenames. I highly advise against this though, because exporting functions is wanky. Put it in a shell script instead.Implacental
Easier to just do xargs -0 bash -c 'function_name "$@"' bash, though. That will invoke the function with multiple arguments. The important thing is export -f function_nameImplacental
If i send it as multiple arguments would i not have the same problem with the spaces? When you say put it in a shell script do you mean a separate one that i invoke? At the moment i just copy and pasted the function contents into the ' ' and it does work, just feels a touch untidy :pVillosity
Using multiple arguments is only a problem if your function uses for file in $@ instead of for file; do or for file in "$@";do Implacental
Oh, how strange, is always best to use "do" then?Villosity
The "do" is required syntax for the for loop. The important distinction is the double quotes around $@. I mention the "do" in the second example to clarify that for file; do is identical to for file in "$@"; do, which behaves very differently from for file in $@; do.Implacental
B
1

Thanks @Etan Resiner for your answer. Here's an example showing how to use the output of git diff --name-only -z "$merge_base" $BACKUP_BRANCH as input to contain escaped filenames sent into git diff or git difftool. It requires an extra --, so see the code below.

I was able to fix my git changes program with it, so now it can handle filenames in a git repo which have spaces or special chars (such as ') in the filenames. Now, the program looks like this:

Usage:

Usage: git changes <common_base> <backup_branch> [any other args to pass to git difftool]

git-changes.sh:

Notice especially the filling of the files_changed_escaped variable, which was directly learned from @Etan Reisner's answer.

COMMON_BASE_BRANCH="$1"
BACKUP_BRANCH="$2"
# Obtain all but the first args; see:
# https://mcmap.net/q/63808/-process-all-arguments-except-the-first-one-in-a-bash-script/9057392#9057392
ARGS_3_AND_LATER="${@:3}"

merge_base="$(git merge-base $BACKUP_BRANCH $COMMON_BASE_BRANCH)"
files_changed="$(git diff --name-only "$merge_base" $BACKUP_BRANCH)"

echo "Checking for changes against backup branch \"$BACKUP_BRANCH\""
echo "only in these files which were previously-modified by that backup branch:"
echo "--- files originally changed by the backup branch: ---"
echo "$files_changed"
echo "------------------------------------------------------"
echo "Checking only these files for differences between your backup branch and your current branch."

# Now, escape the filenames so that they can be used even if they have spaces or special characters,
# such as single quotes (') in their filenames!
# See: https://mcmap.net/q/450141/-how-to-cope-with-spaces-in-file-names-when-iterating-results-from-git-diff-name-only/28109890#28109890
files_changed_escaped=""
while IFS= read -r -d '' file; do
    escaped_filename="$(printf "%q" "$file")"
    files_changed_escaped="${files_changed_escaped}    ${escaped_filename}"
done < <(git diff --name-only -z "$merge_base" $BACKUP_BRANCH)

# DEBUG PRINTS. COMMENT OUT WHEN DONE DEBUGGING.
echo "$files_changed_escaped"
echo "----------"
# print withOUT quotes to see if that changes things; ans: indeed, it does: this removes extra 
# spaces and I think will replace each true newline char (\n) with a single space as well 
echo $files_changed_escaped 
echo "=========="

# NB: the `--` is REQUIRED before listing all of the files to search in, or else escaped files
# that have a dash (-) in their filename confuse the `git diff` parser and the parser thinks they
# are options! It will output this error:
#       fatal: option '-\' must come before non-option arguments
# Putting the list of all escaped filenames to check AFTER the `--` forces the parser to know
# they cannot be options, because the `--` with nothing after it signifies the end of all optional
# args.
git difftool $ARGS_3_AND_LATER $BACKUP_BRANCH -- $files_changed_escaped
echo "Done."

You can download the git changes program as part of my dotfiles project here: https://github.com/ElectricRCAircraftGuy/eRCaGuy_dotfiles.

It also contains such things as git diffn, which is git diff with line numbers.

Birch answered 11/7, 2020 at 19:43 Comment(0)
T
1

I think your code needs this command IFS=$'\n'

echo "this command is important"

IFS=$'\n'
for file_change in `git diff --name-only $1`
do
    echo "Put $file_change ..."

    # File Name
    fileName=$(basename "$file_change")
    echo "$fileName"

    # Directory
    dir=$(dirname "$file_change")
    echo "$dir"
    

    # copy file
    cp $file_change $REMOTE_DIR$file_change
done
Thumbsdown answered 16/6, 2021 at 22:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.