How to substitute text from files in git history?
Asked Answered
T

7

52

I've always used an interface based git client (smartGit) and thus don't have much experience with the git console.

However, I now face the need to substitute a string in all .txt files from history (so, not erasing the whole file but just substituting a string). I found the following command:

git filter-branch --tree-filter 'git ls-files -z "*.php" |xargs -0 perl -p -i -e "s#(PASSWORD1|PASSWORD2|PASSWORD3)#xXxXxXxXxXx#g"' -- --all

I tried this, and unfortunately noticed that while the password did get changed, all binary files got corrupted. Images, etc. would all be corrupted.

Is there a better way to do this that won't corrupt my binary files?

Thanks.

EDIT:

I got mixed up with something. The actual code that caused binary files to get corrupted was:

$ git filter-branch --tree-filter "find . -type f -exec sed -i -e 's/originalpassword/newpassword/g' {} \;"

The code at the top actually removed all files with my password strangely enough.

Televise answered 5/11, 2010 at 22:33 Comment(3)
Doesn't solve your problem, but this is similar to a question I asked a while back: stackoverflow.com/questions/2225454/…Ostia
Indeed, there are many answers on how to remove files. I need to substitute a string though.Televise
@Ostia Cuadra, please see my edit, I actually used a different script, got mixed up. Maybe it helps you in getting the right command.Televise
B
43

You can avoid touching undesired files by passing -name "pattern" to find.

This works for me:

git filter-branch --tree-filter "find . -name '*.php' -exec sed -i -e \
    's/originalpassword/newpassword/g' {} \;"
Bautista answered 6/11, 2010 at 17:4 Comment(10)
I tried this, but looking at the git history, all the files remain the same... Do I have to 'rebase' or something (I'm so new) and if so how do I do that?Uxorious
@Uxorious Most likely the regular expression you're using is not matching anything. This command will rewrite the repository history (like a rebase), provided that the expression matches something.Bautista
You were right. Turned out I was searching for .php files when I meant to be searching for .h :P That's what I get for blind-copy-paste haha. Cheers.Uxorious
Your script doesn't work for me (in Cygwin on Windows). However this works: git filter-branch --tree-filter "find . -name '*.php' -type f -exec sed -i -e 's/originalpassword/newpassword/g' {} \;"Lamaism
This saved my @$$ ! TY @Bautista , shor tsweet one liner for the win.Vivi
In my case it was necessary to use the command with -- --all at the end as for example all tags on my original branch weren't properly copied to the resulting branchCystic
how would one place a \n in the newpassword section?Gnome
This takes forever... :\Ration
Beware of line returns in Windows. Even if sed does not match the string, it still can change the line returns of output lines. This brings an entire file change without any match.Wirra
git show $oldcommitid still works, also after removing .git/refs/original and using git reflog expire and git gc with various options (like from the BFG info page). The contents of the new commit IDs look good (all the way until initial commit) so the replacement worked, but it doesn't get rid of the old data. One's only hope might be that the old commit IDs are not discoverable after recreating the remote system and pushing to there.Pantechnicon
S
123

I'd recommend using the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branch specifically designed for rewriting files from Git history.

You should carefully follow these steps here: https://rtyley.github.io/bfg-repo-cleaner/#usage - but the core bit is just this: download the BFG's jar (requires Java 7 or above) and run this command (where my-repo.git is the folder name of the bare clone of your repo):

$ java -jar bfg.jar  --replace-text replacements.txt -fi '*.php'  my-repo.git

The replacements.txt file should contain all the substitutions you want to do, in a format like this (one entry per line - note the comments shouldn't be included):

PASSWORD1 # Replace literal string 'PASSWORD1' with '***REMOVED***' (default)
PASSWORD2==>examplePass         # replace with 'examplePass' instead
PASSWORD3==>                    # replace with the empty string
regex:password=\w+==>password=  # Replace, using a regex
regex:\r(\n)==>$1               # Replace Windows newlines with Unix newlines

Your entire repository history will be scanned, and .php files (under 1MB in size) will have the substitutions performed: any matching string (that isn't in your latest commit) will be replaced.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

Salic answered 31/3, 2013 at 14:42 Comment(16)
unbelieveable! BFG is incredible!Phosphorescent
This just helped me tremendously. Thank you for such an awesome project. I donated too. Thank you again.Nothingness
Thanks @Nothingness - really glad it helped, and thanks for supporting the project!Salic
+1 as I just used BFG for fixing an SQL script that the dev team let get out of control. Though you need to highlight the default of under 1MB on your usage page Roberto, it took a long while of head scratching before it was apparent that that was why my text replace wasn't happening.Amortizement
The newlines replacement is a life saver when migrating a mixed Windows/Linux team from Subversion to Git!Agace
I tried using BFG on my repo, but I get "BFG aborting: No refs to update - no dirty commits found??" I use this: ---> java -jar /C/Users/user/bfg-1.12.12.jar --filter-content-including '*.{php,inc,sql,txt,htm,html,js,css,xml}' --replace-text /w/Dev/\!GIT/CVS_ident_replace.txt /w/Dev/\!GIT/repo CVS_ident_replace.txt: regex:(//\s*\$Id:).*\$==>$1\$ # Replace CVS IdentSpacial
@RobertoTyley After doing the steps outlined in rtyley.github.io/bfg-repo-cleaner/#usage, how should we update our local history? I did git pull origin master. Was this a mistake? Should I have done git clone? Because after pushing again I noticed that the replacements made with bfg are gone.Canaster
It'd be great if examples like the above were listed on the BFG website! I had to google this SO question again to find them.Niche
@RobertoTyley Is there any way to do this only for files on certain path of the repo? I have several files with the same extension, but I want to do the cleanup only on the ones under an specific directory.Convertible
Why is this not part of the official BFG docs? I'd send a PR but after a cursory look at the repo it seems there are many open PRs which sought to improve the project's documentation/README, which makes me feel there's no point.Niche
Wow, I should actually have said "Why is this STILL not part of the official BFG docs?"... Just realised I'd already commented in the same vein almost 4 (!) years ago, lol(sob).Niche
BFG didn't work. IT found the text, but still present on the revisions. Also, tried to DELETE the file, and nothing...Ration
Just stumbled upon this, great project! Does exactly what I want it to do in the shortest amount of time, thanks mateKaylakayle
What is the -fi option? I couldn't find any documentation on it.Palomino
-fi means --filter-content-including, related help request: github.com/rtyley/bfg-repo-cleaner/issues/117Rabjohn
What is my-repo.git? Is it the repo's url that is used in git clone command?Spiegelman
B
43

You can avoid touching undesired files by passing -name "pattern" to find.

This works for me:

git filter-branch --tree-filter "find . -name '*.php' -exec sed -i -e \
    's/originalpassword/newpassword/g' {} \;"
Bautista answered 6/11, 2010 at 17:4 Comment(10)
I tried this, but looking at the git history, all the files remain the same... Do I have to 'rebase' or something (I'm so new) and if so how do I do that?Uxorious
@Uxorious Most likely the regular expression you're using is not matching anything. This command will rewrite the repository history (like a rebase), provided that the expression matches something.Bautista
You were right. Turned out I was searching for .php files when I meant to be searching for .h :P That's what I get for blind-copy-paste haha. Cheers.Uxorious
Your script doesn't work for me (in Cygwin on Windows). However this works: git filter-branch --tree-filter "find . -name '*.php' -type f -exec sed -i -e 's/originalpassword/newpassword/g' {} \;"Lamaism
This saved my @$$ ! TY @Bautista , shor tsweet one liner for the win.Vivi
In my case it was necessary to use the command with -- --all at the end as for example all tags on my original branch weren't properly copied to the resulting branchCystic
how would one place a \n in the newpassword section?Gnome
This takes forever... :\Ration
Beware of line returns in Windows. Even if sed does not match the string, it still can change the line returns of output lines. This brings an entire file change without any match.Wirra
git show $oldcommitid still works, also after removing .git/refs/original and using git reflog expire and git gc with various options (like from the BFG info page). The contents of the new commit IDs look good (all the way until initial commit) so the replacement worked, but it doesn't get rid of the old data. One's only hope might be that the old commit IDs are not discoverable after recreating the remote system and pushing to there.Pantechnicon
B
22

With Git 2.24 (Q4 2019), git filter-branch (and BFG) is deprecated.

newren/git-filter-repo does NOT do what you want.
It has an example that is ALMOST what you want in its example section:

cd repo
git filter-repo --path-glob '*.txt' --replace-text expressions.txt

with expressions.txt:

literal:originalpassword==>newpassword

However, WARNING: As Hasturkun adds in the comments

Using --path-glob (or --path) causes git filter-branch to only keep files matching those specifications.
The functionality to only replace text in specific files is available in bfg-ish as -fi, or the lint-history script.
Otherwise, it looks like this is only currently possible with a custom commit callback.
See newren/git-filter-repo issue 74

Which makes senses, considering the --replace-text option is itself a blob callback.


Q1 2024, newren/git-filter-repo issue 74 proposes (from Daniil):

Solution

git filter-branch --tree-filter "find . -path './src/*' -regextype egrep -regex '.*\.(hpp|cpp)' -exec perl -0777 -pe 's{\n\n\n+}{\n\n}g' -i {} \;" <branch/HEAD/hash..HEAD>

It was replacing ">1 blank lines" with single one

Battology answered 5/10, 2019 at 20:40 Comment(6)
this wasnt working, so I went through the documentation. You have a small typo. Inside the expressions.txt it should be literal:originalpassword==>newpasswordMongrelize
@KausUntwale Thank you. I have edited the answer accordingly. Don't hesitate to edit it if you see anything else.Battology
I tried this on a repo, the result was a repo with a single commit, and with only the file mentioned in --path-glob. I expected that the many many commits in my repo was still there and files not matched by the glob was untouched.Valparaiso
@Valparaiso It should have worked the way you expected. Not sure what went wrong there.Battology
Using --path-glob (or --path) causes git filter-branch to only keep files matching those specifications. The functionality to only replace text in specific files is available in bfg-ish as -fi, or the lint-history script. Otherwise, it looks like this is only currently possible with a custom commit callback. See also github.com/newren/git-filter-repo/issues/74Innocent
@Innocent Thank you. I have included your comment in the answer for more visibility. And added the link to the lint-history script.Battology
T
6

I created a file at /usr/local/git/findsed.sh , with the following contents:

find . -name 'githubDirToSubmodule.sh' -exec sed -i '' -e 's/What I want to remove//g' {} \;

I ran the command:

git filter-branch --tree-filter "sh /usr/local/git/findsed.sh"

Explanation of commands

When you run git filter-branch, this goes through each revision that you ever committed, one by one. --tree-filter runs the findsed.sh script on each committed revision, saves it, then progresses to the next revision.

The find command finds a specific file or set of files and executes (-exec) the sed editor on that file. sed is a command that takes the regex after s/ and replaces it with the string between / and /g (blank in my example). {} is a reference to the files path that was given by the find command. The file path is fed to sed, so that sed knows what to work on. \; just ends the -exec command.

Seperating the shell script and command out into seperate pieces allows for less complication when it comes to quotes '' or "".

Peculiarities

I successfully implemented this on a mac, and apparently sed is a particular (older?) version on macs. This matters, as it sometimes behaves differently. Make sure to do sed -i '' or else it was adding a "-e" to the end of files, thinking that that was what i wanted to name my backup files. -i '' says dont make backup files, just edit the files in place and no backup file needed.

Specifying -name 'filename.sh' helped me avoid another issue that I could not solve. There was another file with .sh and that file ended without a newline character. sed for some reason, would add a newline character to the end, despite the 's/blah/blah/g' not matching anything in that file. So instead of figuring out that issue, I just told the find to ignore all other files.

Additional commands that work

Additionally, I found these commands to work in the findsed.sh file (only one command at a time, not multple, so comment # the others out):

find . -name '.publishNewZenPackFromGithub.sh.swp' -exec rm -f {} \;
find . -name '*' -exec grep -H PassToRemove {} \;

Enjoy!

Thrive answered 7/11, 2011 at 19:43 Comment(0)
C
6

More info on git-filter-repo

https://mcmap.net/q/12576/-how-to-substitute-text-from-files-in-git-history gives the basics, here is some more info.

Install

As of git 2.5 at least it is not shipped with mainline git so:https://superuser.com/questions/1563034/how-do-you-install-git-filter-repo/1589985#1589985

python3 -m pip install --user git-filter-repo

Usage tips

Here is the more common approach I tend to use:

git filter-repo --replace-text <(echo 'my_password==>xxxxxxxx') HEAD

where:

  • Bash process substitution allows us to not create a file for simple replaces. If your shell does not support this feature, you just have to write it to a file instead:

    echo 'my_password==>xxxxxxxx' > tmp
    git filter-repo --replace-text tmp HEAD
    
  • HEAD makes it affect only the current branch

Modify only a range of commits

How to modify only a range of commits with git filter-repo instead of the entire branch history?

git filter-repo --replace-text <(echo 'my_password==>xxxxxxxx') --refs HEAD~2..HEAD

Replace using the Python API

For more complex replacements, you can use the Python API, see: How to use git filter-repo as a library with the Python module interface?

Coleen answered 1/10, 2020 at 18:54 Comment(5)
@Battology I forgot to upvote yours!!! I was meaning to do it!!!Coleen
How to echo more than one replacement expression in your one-line solution?Aldrin
@s.k <(echo 'my_password==>xxxxxxxx'; echo 'my_password2==>xxxxxxxx') or <(printf my_password==>xxxxxxxx\nmy_password2==>xxxxxxxx\n) should both work.Coleen
@CiroSantilliOurBigBook.com This does not work in Windows from python under git-msys-bash because python does not understand posix notation generated by <(...) operator.Wirra
@Wirra added a mention. I don't think process substitution is POSIX btw: unix.stackexchange.com/questions/309547/…Coleen
P
2

Could be a shell expansion issue. If filter-branch is losing the quotes around "*.php" by the time it evaluates the command, it may be expanding to nothing, thus git ls-files -z listing all files.

You could check the filter-branch source or trying different quoting tricks, but what I'd do is just make a one-line shell script that does your tree-filter and pass that script instead.

Pickmeup answered 5/11, 2010 at 22:56 Comment(5)
What would this one liner look like?Televise
The exact thing you're passing to --tree-filter '...' right now.Pickmeup
Good advice; passing an actual executable script to filter-branch is often much easier than trying to deal with all the quoting.Campanulate
I am on windows though, does it support bat scripts?Televise
please see my edit, I actually used a different script, got mixed up.Televise
C
0

Since this comes up in Google for git replace text in history, and since using non-git tools is sometimes more trouble than it's worth, here's a command that will replace multi-line text all the way from ${COMMIT} onwards to HEAD.

Warning: This is NOT for beginners. It uses git filter-branch, so all of its caveats/pitfalls/etc. apply. Make sure you've committed/backed up everything you need to save, so you don't lose data.

With that said, create the alias in Bash as follows:

git config --global alias.filter-branch-replace-text '!main() { set -eu && if [ -n "${BASH_VERSION+x}" ]; then set -o pipefail; fi && local pattern patternq replacement replacementq commit && pattern="$1" && shift && replacement="$1" && shift && commit="$1" && shift && local sed_binary_flags="" && if [ msys = "${OSTYPE-}" ]; then sed_binary_flags="-b"; fi && patternq="$(printf "%s" "${pattern}" | sed ${sed_binary_flags} "s/'\''/'\''\\\\'\'''\''/g")." && patternq="'\''${patternq%.}'\''" && replacementq="$(printf "%s" "${replacement}" | sed ${sed_binary_flags} "s/'\''/'\''\\\\'\'''\''/g")." && replacementq="'\''${replacementq%.}'\''" && git filter-branch --tree-filter "for path in $(printf "%s\n" "$@" | sed ${sed_binary_flags} -e "s/'\''/'\''\\\\'\'''\''/g" -e "s/\(.*\)/'\''\1'\''/" | tr "\n" " ")"'\''; do if [ -f "${path}" ]; then perl -0777 -i -s -p -e "s/\\Q\$q\\E/\$s/sgm" -- -q='\''"${patternq}"'\'' -s='\''"${replacementq}"'\'' -- "${path}"; fi || break; done'\'' "${commit}~1..HEAD" --; } && main'

and you can then invoke it from Bash as follows:

git filter-branch-replace-text \
    $')\r\n{' \
    $') /* EOL */\r\n{' \
    "${COMMIT}" \
    src/*.txt

Note that this performs literal text replacement, not regular expression replacement.

If you need regexes, you'll need to remove the \Q and \E in the Perl command (which perform escaping) and properly escape the strings as needed for the s/$q/$s/sgm command yourself.

And if you want to pretty-print the script, you can format it like this:

(f="$(git --no-pager config --get alias.filter-branch-replace-text)" && eval "${f%&&*}" && declare -f "${f%%()*}")
Contumacious answered 17/12, 2022 at 22:4 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.