Print percentage of dissimilarity
Asked Answered
E

2

6

Sometimes when you drastically change a file, it triggers a rewrite:

yes | head -256 > pa.txt
git add .
git commit -m qu
truncate -s128 pa.txt
yes n | head -64 >> pa.txt
git commit -am ro

Result:

[master 79b5658] ro
 1 file changed, 128 insertions(+), 256 deletions(-)
 rewrite pa.txt (75%)

However this does not happen with smaller changes:

yes | head -128 > pa.txt
git add .
git commit -m qu
truncate -s64 pa.txt
yes n | head -32 >> pa.txt
git commit -am ro

Result:

[master 88ef937] ro
 1 file changed, 32 insertions(+), 96 deletions(-)

Can I run a command that will show the percent change regardless of the amount? I looked into git diff-tree, but again it seems to only show when the change is drastic.

Edwinedwina answered 6/1, 2016 at 5:4 Comment(4)
git diff --numstat <commit1> <commit2> will show you the number of lines added and removed, for each file modified between commit1 and commit2. However, the 75% you see above is a Git similarity index, which measures the percentage of lines changed in the original file. This is a slightly different metric than what git diff --numstat will show you.Hyson
git diff -B1 maybe (lowering the default 50 threadshold)Trotta
I have gotten pretty low dissimilarity index with git -c "core.pager=less -SFR" diff -B1%/1%Trotta
I used your last example, but with truncate -s462 pa.txt. Then git diff -B1%/1% @~ @ | grep diss gives me dissimilarity index 10% I use git 2.6.4 (I will check if that still works with git 2.7, released yesterday)Trotta
E
4
git diff -U10000 | awk '
/^i/ {getline; next}
/^-/ {pa += length}
/^ / {qu += length}
END {printf "%.0f%\n", pa/(pa+qu)*100}
'
  1. Force full context with -U10000

  2. Filter out --- lines

  3. Filter in deletions and context lines

  4. Count bytes for each

Edwinedwina answered 13/1, 2016 at 3:57 Comment(1)
That looks more precise than my answer. +1Trotta
T
0

With the latest git:

> git --version
git version 2.7.0.windows.1

I use:

git init dissimilarity
cd dissimilarity
yes aaa | head -128 > pa.txt
git commit -am qu
<remove a few lines>
yes n | head -32 >> pa.txt
git commit -am ro

Then a git diff -B1%/1% gives me:

> git diff -B1%/1% @~|grep diss
dissimilarity index 14%

I then proceeded to make an even minor change by manually editing pa.txt, removing a few lines, adding a new one:

> git diff @~
diff --git a/pa.txt b/pa.txt
index 7f9bf77..bf32d0b 100644
--- a/pa.txt
+++ b/pa.txt
@@ -107,13 +107,7 @@ aaa
 aaa
 aaa
 aaa
-n
-n
-n
-n
-n
-n
-n
+sss
 n
 n
 n

And even then, I still see a dissimilarity index:

> git diff -B1%/1% @~|grep diss
dissimilarity index 2%

2%!

Trotta answered 6/1, 2016 at 13:41 Comment(4)
This appears to break if the change is less than 400 bytes. For example truncate -s400 pa.txt truncate -s0 pa.txt works, but 399 or less will fail. Possibly related: github.com/git/git/blob/7548842/diffcore.h#L23Edwinedwina
@SarahManning I agree. Still: 2%! ;)Trotta
This answer is helpful, but not a comprehensive solution, as it does not work in all cases. It appears in my original question the problem was not the percentage, but the amount of bytes being changed was below the thresholdEdwinedwina
@SarahManning I agree, but come on... 2%! Just kidding. I don't think there is a "comprehensive" solution out of the box. Mine tries to illustrate a git-native solution.Trotta

© 2022 - 2024 — McMap. All rights reserved.