It seems that the clean filter is applied before diff. Is that correct?
Yes. In at least some cases it must be. Consider, for instance, what happens if the smudge filter consists of, say, "double every character" and the clean filter consists of "remove the doubling"—or, if that seems too peculiar, if the smudge filter consists of "translate into some alternate character set" and the clean filter translates back.
A git diff
to compare the work-tree against an actual commit must either run the smudge filter on the commit's content, or run the clean filter on the work-tree's content. Or it might even run both, with the output going to temporary files. (I'm pretty sure I tested this once, long ago, and found that the approach Git used was to run the clean filter, rather than the smudge filter. But see Cyker's comment, which suggests it runs both filters and then diffs smudged results.)
Is it possible to disable this? Would it be a good idea?
See above—at best you might have a "run only the smudge filter" option (but there is none).
Note that what's in the index is already clean, by definition. Cleaning happens on the transition from work-tree to index; smudging happen on the transition from index to work-tree.
Existing commits are strictly read-only and extracting a commit into the index makes no changes. Hence, while the index contents are clean by definition, if the clean filter itself has changed, they may not match what you would get by re-running the filter.
I would like a solution where autoformat is applied to the staging area, that is only the hunks that were staged. Isn't clean filter an appropriate solution?
This does not work the way you are thinking.
Running git add
does not apply diff hunks to the index copy: running git add
copies the entire work-tree file into the index. The whole thing gets cleaned.
Running git add -p
also does not actually apply diff hunks to the index copy, because it literally can't. Instead, git add -p
extracts the index copy to a temporary file, applies a diff hunk to the temporary file, and then copies the entire temporary file (with applied hunk) into the index, running that through the clean filter. Once again the whole thing gets cleaned—it's just that "the whole thing" is a temporary file built by patching the smudged index copy.
In other words, the index copy of each file is an entity unto itself, independent of the HEAD
commit copy and the work-tree copy. Git starts out, at git checkout
time, by just copying the commit copy of the file directly into the index (no changes, no filters), then copies the index copy of the file into the work-tree (smudge filter). At git add
time, Git runs the clean filter on the work-tree file (or the patched result) and stuffs that into the index.1
1Technically, the index holds not the files themselves, but rather their content hashes. Adding a file consists of writing the file into the repository! The hash ID of the resulting blob object goes into the index. The index entry keeps the blob from being garbage-collected, if the index is the only place the blob is used (if the blob matches some committed blob then it's safe from the Grim Collector).
git diff
runsclean
notsmudge
filter? It seems both filters are run, but the diff result is made usingsmudge
filter. Git version 2.24.1 (latest). – Uraeus