git smudge/clean filter between branches

Asked 7/4, 2014 at 10:20 Answered 3/5, 2021 at 8:2

There are many related questions involving smudge/clean filters - I have spent some hours reading them, and trying various options, but still failing. I hope I can ask in a way that I get an answer that works for me.

Specifically, I have read the page most of these answers link back to:

Customizing Git - Git Attributes

tl;dr

Its a detailed question, but the summary is:

Can I store DEBUG = false in a file on one branch, and DEBUG = true in another branch, using smudge/clean filters to manage that file? And how?

Background

I have various remote repos hosted at bitbucket. I am using SourceTree on Win8, to clone the remote repos to my laptop. I create different branches for development, features, releases etc (following A successful Git branching model for better or worse).

I have an Android java class called Dbug.java that contains a boolean which turns on/off various debug logging, mocking etc features in my code.

public static final boolean DEBUG = false;

I would like this value to be false on my "production" (master) branch, and to be true on my feature branches.

Is this possible using filters, or have I already misunderstood the use case?
I am unsure if filters work like this between 2 branches of the same locally hosted repo, or if the filters only work between 2 repos.

Creating the filters

Working locally, I checked out the production branch. I created a test file called debug_flag.txt with the following contents:

// false on production branch
// true on other branches
DEBUG = false;

I created a file in the root of my local repo called .gitattributes and added the filter reference to it:

debug_flag.txt filter=debug_on_off

I updated the .git/config file with the filter definition:

[filter "debug_on_off"]
    clean = sed -e 's/DEBUG = true/DEBUG = false/'
    smudge = sed -s 's/DEBUG = false/DEBUG = true/'

In my understanding, this should ensure that my file always has a false value in production, but will have a true value when I branch from production.
Is this a correct understanding?

Testing the filters

I created a new branch test using:

git checkout -b test

I checked the contents of my file:

$ cat debug_flag.txt

// false on production branch
// true on other branches
DEBUG = false;

I expected to see the value true in the file
Shouldn't the "smudge" filter have run when I checked out the file?

I added a new line to the file, and committed. I then switched back to the production branch, and this is where things get weird.

If I look at the file in SourceTree, there are no changes on this branch since it was created. That is what I would expect, since the only change was made on a different branch.

If I look at the file in the terminal, or Notepad++, I see my value has changed:

$ cat debug_flag.txt

// false on production branch
// true on other branches
DEBUG = true;

I have not yet merged the change across from the test branch, I have not made a commit on the production branch, yet the file has changed.

it looks like the smudge filter was run on the file within this branch, but not across branches.

I'm missing a vital piece of the puzzle, and hopefully it is something simple that can be spotted by someone with experience doing this.

My bet is this is a simple misunderstanding of the concept.

Pls prompt for any missing info...

Update based on VonC's reply

Setting up the basic filters worked quite well. Defined the filter in the config file as:

[filter "debug_on_off"]
    clean = sed -e 's/DEBUG = true/DEBUG = false/'
    smudge = sed -s 's/DEBUG = false/DEBUG = true/'

Creating a new branch fixes false -> true, merging back changes true -> false.

Confining the change to just the production (master) branch required custom scripts that were aware of the branch they are being run from. So the config file became:

[filter "debug_on_off"]
    clean = ./scripts/master_clean.sh
    smudge = ./scripts/master_smudge.sh

master_clean.sh:

#!/bin/sh
branch=$(git rev-parse --symbolic --abbrev-ref HEAD)
if [ "master" = "$branch" ]; then
    sed -e s/DEBUG = true/DEBUG = false/ $1
else
    cat $1
fi

master_smudge.sh:

#!/bin/sh
branch=$(git rev-parse --symbolic --abbrev-ref HEAD)
if [ "master" = "$branch" ]; then
    sed -e s/DEBUG = false/DEBUG = true/ $1
else
    cat $1
fi

At this point, I am running into inconsistencies between what SourceTree is seeing, and what is being shown in Notepad++ for the contents of the debug file. SourceTree is showing the changes, but Notepad++ is not.

I am accepting VonC's answer, since it answers the basic question I posed.

However, I will likely be implementing the solution I wrote, since it solves the underlying problem that I am trying to solve, in an easier way (for me): retaining a different config file on separate branches.

Piddle answered 7/4, 2014 at 10:20 Comment(4)

You might want to add quotes around $1 in order to support files with spaces. – Dispossess 7/1, 2016 at 16:25

Also aren't you missing %f in the config file? (not sure whether with quotes, in case you need to then they need to be escaped because git itself interprets them as well when parsing the config file AFAIK) – Dispossess 7/1, 2016 at 17:1

@Dispossess I gave up on this more than a year ago. I now do it manually (and it is a pain of course). But if you have the time and think you have a properly working solution, please feel free to post. From what I found, git doesn't support this feature, and I believe that is "by design". – Piddle 7/1, 2016 at 17:14

I've also give up at that point, the Linus quote swung me to do such things in a build script. – Dispossess 8/1, 2016 at 15:7

I expected to see the value true in the file

You just created a new branch, not checked out its content (sice its content is the same as the branch you were in)

To force the smudge to run, do at the top of the repo:

git checkout HEAD --

I have not yet merged the change across from the test branch, I have not made a commit on the production branch, yet the file has changed.

That is the idea of a content filter driver: it modifies the content, without affecting git status (which still reports the modified file as "unchanged").

To have a smudge acting differently per branch, I would recommend calling a script which starts by looking the name of the current branch.
See an example in my older answer "Best practice - Git + Build automation - Keeping configs separate".

#!/bin/sh
branch=$(git rev-parse --symbolic --abbrev-ref HEAD)

Gambrell answered 7/4, 2014 at 10:41 Comment(9)

thx, I thought I'd seen all your answers on the topic so far, but I missed this one. So the filters are not branch-related? I think this is my big misunderstanding. Is it that the working copy is always smudged, and the copy inside the repo (which is not the working copy) is always clean? But we don't build off the "clean" version, hence the link to the other answer...? – Piddle 7/4, 2014 at 10:53

@RichardLeMesurier by default, a smudge script isn't aware of the branch. It could be branch related if the .gitattributes declares in in one branch, but there is no .gitattributes in the other branch. But if it is declared in both branches, then you need to add a detection step in your smudge script. – Gambrell 7/4, 2014 at 10:55

Unfortunately I can't get around having 2 different .gitattributes files - chicken & egg situation; and I can't get a SED filter to run on the .gitattributes either. I will move on to try to make the smudge branch related. – Piddle 7/4, 2014 at 15:13

@RichardLeMesurier the goal is not to have two different .gitattributes files. It is to have one smudge script intelligent enough to do the right thing based on its execution environment (like the current branch). – Gambrell 7/4, 2014 at 15:15

I'm going to work on that next. Was just addressing your comment: "It could be branch related if the .gitattributes declares in in one branch, but there is no .gitattributes in the other branch." Maybe that is possible, but not by me, now. – Piddle 7/4, 2014 at 15:19

Was unable to create working smudge & create filter scripts. Seemed to work in part, however inconsistencies between what SourceTree was showing vs what I saw in Notepad++ threw me off. Updated my experience in the OP if you want to take a look. +1 and accepted your answer anyway, since I believe it addresses what I asked. Thx for help - very interesting learning about the filters. – Piddle 7/4, 2014 at 20:35

@RichardLeMesurier I will take a look at your approach. Thank you for the feedback. – Gambrell 7/4, 2014 at 20:39

Ah, still there - I'm pretty sure I'm missing the basic concepts tho. But thx. If you find anything let me know. I've learnt loads of git today fwiw; going to stand me in good stead when I start my next CI build machine. – Piddle 7/4, 2014 at 20:43

@RichardLeMesurier I will. And don't you worry: I am always "still there" ;) meta.stackexchange.com/q/122976/6309 – Gambrell 7/4, 2014 at 20:45

VonC's advice addresses the exact question I posed, but I was unable to work out the final details (as per my Update to the question). This answer gives the details of how I have done things.

Update

Below method worked for the first merge. But after that it is no longer working. I'm leaving it here, since it represents the current state of my investigation.

It seems that the merge drivers are no longer being called.

Also tried various modifications from related questions using exit 0, touch %A, or a custom script merge driver (https://mcmap.net/q/13250/-have-git-select-local-version-on-merge-conflict-on-a-specific-file) instead of true as presented below.

I found a workaround to this that uses a custom merge strategy to solve the underlying problem, which is:

I want to have build files in my build branch always set to have all debug values turned off.
This prevents any accidental releases of the product with mock settings, localhost settings, logging turned on etc.

I have based the following on info from this question: .gitattributes & individual merge strategy for a file

1) Define a custom merge driver in the .git/config file as follows:

[merge "ours"]
    name = "Keep ours merge"
    driver = true

I am not sure if this step is required - but it seems it may be a workaround for a bug on some (older?) systems.

(for details: https://mcmap.net/q/21301/-gitattributes-amp-individual-merge-strategy-for-a-file)

2) Set up a .gitattributes file in the build/production/pristine branch so that the special debug flag uses the above merge strategy.

So using the files in my question, go to the "production" branch, and add the following line to the .gitattributes file:

debug_flag.txt merge=ours

Any time a merge is made back to the "production" branch, git will look for the merge strategy defined as "ours", and will prevent debug_flag.txt from being overwritten.

3) On the other branches, set up your .gitattributes file without that custom merge strategy.

4) Last (but important) step of the config process, is to set up the debug_flag.txt file correctly in all branches, and to commit changes to each branch.

You should now have 2 branches, each containing different versions of .gitattributes & debug_flag.txt files. This ensures that on each time you merge, there are conflicts.

Without conflicts, the custom "ours" merge strategy is not called, and the files could get overwritten.

(for details: https://mcmap.net/q/21301/-gitattributes-amp-individual-merge-strategy-for-a-file)

5) Finally merge your new branch back into "production". You will have merge conflicts due to steps 3 & 4. Resolve the conflicts so that the 2 branches keep their differences. Commit the changes.

All future merges between these 2 branches will ignore the debug_flag.txt file seamlessly.

This accomplishes the goal of having 2 different config files on different branches, so that you can easily separate debug from production code etc. It seems to be a common use case, with many related questions on this forum, but it still took me a couple of days to get it right.

Piddle answered 7/4, 2014 at 13:37 Comment(3)

Interesting use of a merge driver, more precise than my answer. +1 – Gambrell 7/4, 2014 at 13:38

@Gambrell - thx but credit to the other guys I linked to. But I did want to document it in an easy to understand manner. Am busy working on a solution based on your answer. Looking good so far. – Piddle 7/4, 2014 at 14:34

This technique glitched on me earlier today - will have to test more to see if its something I did, or an error in my answer. – Piddle 9/4, 2014 at 16:7

Take a look at expandr. It's a script that lets you set up different configs on different branches using smudge/clean. Basically just what you were originally asking for.

Right now the biggest gotcha is just that after switching branches sometimes I need to do a git checkout HEAD -- "$(git rev-parse --show-toplevel)" to get the working directory to be correctly smudged. Other times, however, it seems to work fine; I haven't yet figured out why. I think it might have to do with "merge renormalize" being turned on, causing some problems? I'm not sure. (I have it on.)

The other gotcha is that you must protect each branch's .gitattributes files themselves with merge=ours by putting a line in that says .gitattributes merge=ours and of course turning the driver on for that (as you already mentioned). The gotcha here is that after you create each separate branch, now you must go into each .gitattributes file and modify each one (I recommend now adding a comment like #touched for merge=ours on master, #touched for merge=ours on test branch, etc., so you'll remember why it was there). You must do this because merge=ours will only protect a file from changing into the version of the incoming branch in a merge if that file has been changed after branch creation on both the incoming branch and its parent branch. (Remember git deals with changes, not with files.)

Amphibolous answered 15/9, 2014 at 23:24 Comment(1)

Looks good - have earmarked this for attention when I have some time to try it out. Your extra checkout command might be the final solution to what I was having trouble with. – Piddle 16/9, 2014 at 15:21

The most straight-forward solution would be to update your makefile to include a check on which branch is currently checked out. If the branch is part of a named list, define a new build argument -DDEBUG=true or -DDEBUG=false otherwise.

See How to programmatically determine the current checked out Git branch

branch_name=$(git symbolic-ref -q HEAD)
branch_name=${branch_name##refs/heads/}
branch_name=${branch_name:-HEAD}

PROD_BRANCHES := master \
                QA
debug_flag=
ifneq ($(filter $(branch_name),$(PROD_BRANCHES)),)
    debug_flag="-DDEBUG=true"
endif
ifeq($debug_flag,)
    debug_flag="-DDEBUG=false"
endif

Rehabilitation answered 1/6, 2016 at 13:53 Comment(0)

I have a similar use case and was able to resolve it with a filter and a post-checkout hook. The hook is nice because it does the clean+smudge for you and immediately updates the file when switching branches via GitHub Desktop. With this approach, each sandbox/tier has their own configuration per branch.

Detailed write-up at: https://c1.eagen.net/git-smudge-clean.html

Sample repository at: https://github.com/vinnyjames/git-filter-demo.git

Recuperate answered 3/5, 2021 at 8:2 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++