The term "hunk" is indeed not specific to Git, and comes from the Gnu diffutil format. Even more succinctly:
Each hunk shows one area where the files differ.
But the challenge for Git is to determine the right boundaries for a hunk.
The rest of the answer helps illustrates what a hunk looks like in Git:
After various heuristics (like the compaction one, which is gone in Git 2.12), Git maintainers settled on the indent one, which was introduced in Sept. 2016 with Git 2.11, commit 433860f.
Some groups of added/deleted lines in diffs can be slid up or down,
because lines at the edges of the group are not unique. Picking good shifts for such groups is not a matter of correctness but definitely has a big effect on aesthetics.
For example, consider the following two diffs.
The first is what standard Git emits:
--- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
+++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
@@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) {
}
if (!$smtp_server) {
+ $smtp_server = $repo->config('sendemail.smtpserver');
+}
+if (!$smtp_server) {
foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
if (-x $_) {
$smtp_server = $_;
The following diff is equivalent, but is obviously preferable from an
aesthetic point of view:
--- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
+++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
@@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) {
$initial_reply_to =~ s/(^\s+|\s+$)//g;
}
+if (!$smtp_server) {
+ $smtp_server = $repo->config('sendemail.smtpserver');
+}
if (!$smtp_server) {
foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
if (-x $_) {
This patch teaches Git to pick better positions for such "diff sliders" using heuristics that take the positions of nearby blank lines and the indentation of nearby lines into account.
With Git 2.14 (Q3 2017), that indent heuristic will be the default!
See commit 1fa8a66 (08 May 2017) by Jeff King (peff
).
See commit 33de716 (08 May 2017) by Stefan Beller (stefanbeller
).
See commit 37590ce, commit cf5e772 (08 May 2017) by Marc Branchaud.
(Merged by Junio C Hamano -- gitster
-- in commit 53083f8, 05 Jun 2017)
diff: enable indent heuristic by default
The feature was included in v2.11 (released 2016-11-29) and we got no negative feedback. Quite the opposite, all feedback we got was positive.
Turn it on by default. Users who dislike the feature can turn it off
by setting diff.indentHeuristic
.
With Git 2.24 (Q4 2019), the "indent heuristics" that decides where to split
diff hunks has seen its documentation corrected.
See commit 64e5e1f (15 Aug 2019) by SZEDER Gábor (szeder
).
(Merged by Junio C Hamano -- gitster
-- in commit e115170, 09 Sep 2019)
diff: 'diff.indentHeuristic' is no longer experimental
The indent heuristic started out as experimental, but it's now our
default diff heuristic since 33de716 (diff
: enable indent heuristic
by default, 2017-05-08, Git v2.14.0-rc0).
Alas, that commit didn't update the documentation, and the description of the 'diff.indentHeuristic
' configuration variable still implies that it's experimental and not the default.
Update the description of 'diff.indentHeuristic
' to make it clear that it's the default diff heuristic.
The description of the related '--indent-heuristic
' option has already
been updated in this answer.
The documentation will now read:
diff.indentHeuristic
:
Set this option to false
to disable the default heuristics that shift diff hunk boundaries to make patches easier to read.
With Git 2.25 (Q1 2020), you don't even have to specify --indent-heuristic
anymore (since it is the default for quite some times now).
See commit 44ae131 (28 Oct 2019) by SZEDER Gábor (szeder
).
(Merged by Junio C Hamano -- gitster
-- in commit 532d983, 01 Dec 2019)
builtin/blame.c
: remove '--indent-heuristic' from usage string
Signed-off-by: SZEDER Gábor
The indent heuristic is our default diff heuristic since 33de716387 ("diff
: enable indent heuristic by default", 2017-05-08, Git v2.14.0-rc0 -- merge listed in batch #7), but the usage string of 'git blame
' still mentions it as "experimental heuristic".
We could simply update the short help associated with the option, but according to the comment above the option's declaration it was "only included here to get included in the "-h
" output".
That made sense while the feature was still experimental and we wanted to give it more exposure, but nowadays it's unnecessary.
So let's rather remove the '--indent-heuristic
' option from 'git blame
's usage string.
Note that 'git blame
' will still accept this option, as it is parsed in parse_revision_opt()
.
Astute readers may notice that this patch removes a comment mentioning "the following two options", but it only removes one option.
The reason is that the comment is outdated: that other options was '--compaction-heuristic
', and it has already been removed in 3cde4e02ee (diff: retire "compaction" heuristics, 2016-12-23), but that commit forgot to update this comment.