How to grep (search through) committed code in the Git history
Asked Answered
B

22

1930

I have deleted a file or some code in a file sometime in the past. Can I search through the content (not just the commit messages)?

A very poor solution is to grep the log:

git log -p | grep <pattern>

However, this doesn't return the commit hash straight away. I played around with git grep to no avail.

Bolick answered 28/5, 2010 at 11:36 Comment(6)
These blog posts by Junio C Hamano (git maintainer) might be interesting for you: * Linus's ultimate content tracking tool (about pickaxe search i.e. git log -S and blame) * [Fun with "git log --grep"][2] (searching commit messages) * [Fun with "git grep"][3] [2]: gitster.livejournal.com/30195.html [3]: gitster.livejournal.com/27674.htmlDiplomatic
possible duplicate of How to grep git commits for a certain wordLocarno
answer from possible duplicate actually works: https://mcmap.net/q/11999/-how-to-grep-git-commits-for-a-certain-wordTamekia
issue with this is that it doesn't give any context to the change.. i.e. who / whenGaytan
I believe as of 2021, VonC's answer is the only entirely correct one, and well deserves a green checkmark.Esbenshade
@kkm-stillwaryofSEpromises Surprise surprise, the dude who became the king of git on StackOverflow by writing tons of very useful answers wrote a very useful answerDispense
N
745

You should use the pickaxe (-S) option of git log.

To search for Foo:

git log -SFoo -- path_containing_change
git log -SFoo --since=2009.1.1 --until=2010.1.1 -- path_containing_change

See Git history - find lost line by keyword for more.

-S (named pickaxe) comes originally from a git diff option (Git v0.99, May 2005). Then -S (pickaxe) was ported to git log in May 2006 with Git 1.4.0-rc1.


As Jakub Narębski commented:

  • this looks for differences that introduce or remove an instance of <string>. It usually means "revisions where you added or removed line with 'Foo'".

  • the --pickaxe-regex option allows you to use extended POSIX regex instead of searching for a string. Example (from git log): git log -S"frotz\(nitfol" --pickaxe-regex


As Rob commented, this search is case-sensitive - he opened a follow-up question on how to search case-insensitive.


Hi Angel notes in the comments:

Executing a git log -G<regexp> --branches --all (the -G is same as -S but for regexes) does same thing as the accepted one (git grep <regexp> $(git rev-list --all)), but it soooo much faster!

The accepted answer was still searching for text after ≈10 minutes of me running it, whereas this one gives results after ≈4 seconds 🤷‍♂️. The output here is more useful as well

Noemi answered 28/5, 2010 at 11:57 Comment(15)
Thanks, I wasn't aware of this option. Looks like this is the best solution if you're interested in the commit messages and Jeet's solution is most appropriate if you need the traditional UNIX grep behavior of pure line matching.Bolick
Combine it with the -p flag to also output the diff.Immaculate
Is there any way to exclude a all directories matching a specific patterns using git log -S?Bulley
@Bulley why yes, there sure is (it uses the --format option though, not the -S option): https://mcmap.net/q/12315/-making-39-git-log-39-ignore-changes-for-certain-pathsNoemi
does this only search the history of the current branch, rather than commits throughout the whole repo history?Beery
@Beery you would need the --branches --all options to search for the all repo.Noemi
@Jakub Narębski As soon as I try --pickaxe-regex, I get fatal: bad revision '<my_regex>'. Maybe give a concrete example.Escent
@U.Windl I am not Jakub, but I have edited my answer to include an example.Noemi
Comparing this answer to the links within it confuses me: the pickaxe (-S) link jumps to an option that isn't called pickaxe, but pickaxe-all. Also, unlike the text of this link would suggest, -S and pickaxe-all are different (though related); the word pickaxe isn't even in the documentation for -S. And finally, it does look like pickaxe-all would usually be helpful for this task, yet neither of your examples use it.Cutshall
@DanielKaplan Thank you for your feedback. I have edited the answer accordingly and removed any reference/link to --pickaxe-all. -S (pickaxe) is enough.Noemi
@Noemi Cool and thanks. When I click on the new link it goes to a white page. I'm still confused about calling -S "pickaxe" though. The documentation doesn't call it that, as far as I can tell.Cutshall
@DanielKaplan Which link is problematic? As for "pickaxe", it is the original name of the git diff -S option. While the documentation does not mention "pickaxe", that is what -S implements. I have always known that option (-S) referenced as the "pickaxe option".Noemi
@Noemi Nevermind.I'm not sure why but it rendered correctly the second time I triedCutshall
This should be the accepted answer. Let me be first to say that. Executing a git log -G<regexp> --branches --all (the -G is same as -S but for regexes) does same thing as the accepted one, but it soooo much faster! The accepted answer was still searching for text after ≈10 minutes of me running it, whereas this one gives results after ≈4 seconds 🤷‍♂️ The output here is more useful as wellSapienza
@Sapienza Thank you very much for your feedback on that 12 years old answer. I have included your comment in the answer for more visibility.Noemi
L
2453

To search for commit content (i.e., actual lines of source, as opposed to commit messages and the like), you need to do:

git grep <regexp> $(git rev-list --all)

git rev-list --all | xargs git grep <expression> will work if you run into an "Argument list too long" error.

If you want to limit the search to some subtree (for instance, "lib/util"), you will need to pass that to the rev-list subcommand and grep as well:

git grep <regexp> $(git rev-list --all -- lib/util) -- lib/util

This will grep through all your commit text for regexp.

The reason for passing the path in both commands is because rev-list will return the revisions list where all the changes to lib/util happened, but also you need to pass to grep so that it will only search in lib/util.

Just imagine the following scenario: grep might find the same <regexp> on other files which are contained in the same revision returned by rev-list (even if there was no change to that file on that revision).

Here are some other useful ways of searching your source:

Search working tree for text matching regular expression regexp:

git grep <regexp>

Search working tree for lines of text matching regular expression regexp1 or regexp2:

git grep -e <regexp1> [--or] -e <regexp2>

Search working tree for lines of text matching regular expression regexp1 and regexp2, reporting file paths only:

git grep -l -e <regexp1> --and -e <regexp2>

Search working tree for files that have lines of text matching regular expression regexp1 and lines of text matching regular expression regexp2:

git grep -l --all-match -e <regexp1> -e <regexp2>

Search working tree for changed lines of text matching pattern:

git diff --unified=0 | grep <pattern>

Search all revisions for text matching regular expression regexp:

git grep <regexp> $(git rev-list --all)

Search all revisions between rev1 and rev2 for text matching regular expression regexp:

git grep <regexp> $(git rev-list <rev1>..<rev2>)
Leadbelly answered 28/5, 2010 at 13:47 Comment(25)
Excellent. +1. The GitBook add some details (book.git-scm.com/4_finding_with_git_grep.html), and Junio C Hamano illustrates some of your points: gitster.livejournal.com/27674.htmlNoemi
Unfortunately, I cannot get this going with msysgit-1.7.4. It tells me sh.exe": /bin/git: Bad file number. VonC's answer also works with msysgit.Nutrient
If you get an "unable to read tree" error when you invoke git grep history with rev-list, you might need to clean things up. Try git gc or check out: stackoverflow.com/questions/1507463/…Ninnette
Can you also limit the git grep by choosing the directories to recurse down?Josiahjosias
I've created a nice alias for this in my ~/.gitconfig: find = "!f() { git grep -C 2 $1 $(git rev-list --all); }; f". I used a shell function so I can pass the regexp to the alias.Hass
Yeah, this seems to fail on Windows as well, alas.Louls
@Leadbelly would it make sense to replace all your $(...) with xargs calls so your examples will work more consistently (thanks @dlowe)? I got the same error as others on OSX: bash: /usr/local/bin/git: Argument list too longFriedcake
Can I add git grep <regex> $(git rev-list --all) -- '*.txt' to search in a certain file type?Bosco
Another way with dealing with the Argument list too long issue is to: git rev-list --all | (while read rev; do git grep -e <regexp> $rev; done)Leadbelly
@Leadbelly Are there circumstances where this is a preferable option to the git log -G or git log -S answers? It seems to be much "heavier" in terms of processing as it's basically getting a list of commits, and then searching each individual commit?Ingest
@Ingest I think there are differences in what is reported (log output vs. grep output, or patch contents vs. full file contents) that makes a difference in contexts in which one may be preferable to the other.Leadbelly
This does not work: git grep <regexp> $(git rev-list --all -- lib/util) It still returns results from the entire tree.Melosa
@vmakley please see the updated reply here, that should work fine: stackoverflow.com/suggested-edits/3011778Ld
What if I want to find something in the changed lines of the files in the working tree? And not in all the files in the project.Permanency
The answer to my previous question can be found here.Permanency
For me, it does not work, when I'm not in the git root directory! I'm using Git 2.11.Stratosphere
I want to point out that I had scale issues with this solution, i ended up multiprocessing over a small set of small repositories, multiprocessing over revs in batches of 50, and running these jobs on the biggest servers I could find; git grep takes soo much longer and eats up much more cpu than git log -S. I think git-grep reconstructs every file for every revision and greps them, whereas git log only searches diffs. I still trust the results of git grpe all-revs, but unless i really need to know the search pattern did or did not exist, I will prefer git log for trivial searchesPaba
This does not work with git 2.21.0 on macos. No matter what expression is used, output is always empty.Phenobarbital
This is almost useless, when the change I want to find was 20 commits ago, then ALL of the 20 commits containing the change will show up over and over again. If then, 10 commits ago there was another change with the searched string, then it is almost impossible to find. Is there a way to search in the "diff" of the commits only (changed lines only) instead of the entire code body?Lal
I'm developing a tool based on this answer: github.com/GaetanoPiazzolla/git-searchPannonia
Note - make sure your regexp is in single quotes i.e. git grep 'y.rn' $(6c1418bb9a4c3e355s6ag1dfa6s5dgf1)Prognosis
@Lal I have the same question as yours "Is there a way to search in the "diff" of the commits only (changed lines only) instead of the entire code body?" Could anyone answer that question?Liaison
I found it useful to limit the subtree by $(git rev-list --after="2021-12-01" --all) or $(git rev-list --author="Shimon" --all)Prophet
You propose to use a command with xargs, if an "argument too long" error occurs. What is the exact reason für the "argument too long" error? I could imagine something like "In shell xy, the evaluation of $() is not quoted according to zz, therefore all output of $() is written down in raw form".Siphonophore
This is the mother of all answers! :clap:Inconformity
N
745

You should use the pickaxe (-S) option of git log.

To search for Foo:

git log -SFoo -- path_containing_change
git log -SFoo --since=2009.1.1 --until=2010.1.1 -- path_containing_change

See Git history - find lost line by keyword for more.

-S (named pickaxe) comes originally from a git diff option (Git v0.99, May 2005). Then -S (pickaxe) was ported to git log in May 2006 with Git 1.4.0-rc1.


As Jakub Narębski commented:

  • this looks for differences that introduce or remove an instance of <string>. It usually means "revisions where you added or removed line with 'Foo'".

  • the --pickaxe-regex option allows you to use extended POSIX regex instead of searching for a string. Example (from git log): git log -S"frotz\(nitfol" --pickaxe-regex


As Rob commented, this search is case-sensitive - he opened a follow-up question on how to search case-insensitive.


Hi Angel notes in the comments:

Executing a git log -G<regexp> --branches --all (the -G is same as -S but for regexes) does same thing as the accepted one (git grep <regexp> $(git rev-list --all)), but it soooo much faster!

The accepted answer was still searching for text after ≈10 minutes of me running it, whereas this one gives results after ≈4 seconds 🤷‍♂️. The output here is more useful as well

Noemi answered 28/5, 2010 at 11:57 Comment(15)
Thanks, I wasn't aware of this option. Looks like this is the best solution if you're interested in the commit messages and Jeet's solution is most appropriate if you need the traditional UNIX grep behavior of pure line matching.Bolick
Combine it with the -p flag to also output the diff.Immaculate
Is there any way to exclude a all directories matching a specific patterns using git log -S?Bulley
@Bulley why yes, there sure is (it uses the --format option though, not the -S option): https://mcmap.net/q/12315/-making-39-git-log-39-ignore-changes-for-certain-pathsNoemi
does this only search the history of the current branch, rather than commits throughout the whole repo history?Beery
@Beery you would need the --branches --all options to search for the all repo.Noemi
@Jakub Narębski As soon as I try --pickaxe-regex, I get fatal: bad revision '<my_regex>'. Maybe give a concrete example.Escent
@U.Windl I am not Jakub, but I have edited my answer to include an example.Noemi
Comparing this answer to the links within it confuses me: the pickaxe (-S) link jumps to an option that isn't called pickaxe, but pickaxe-all. Also, unlike the text of this link would suggest, -S and pickaxe-all are different (though related); the word pickaxe isn't even in the documentation for -S. And finally, it does look like pickaxe-all would usually be helpful for this task, yet neither of your examples use it.Cutshall
@DanielKaplan Thank you for your feedback. I have edited the answer accordingly and removed any reference/link to --pickaxe-all. -S (pickaxe) is enough.Noemi
@Noemi Cool and thanks. When I click on the new link it goes to a white page. I'm still confused about calling -S "pickaxe" though. The documentation doesn't call it that, as far as I can tell.Cutshall
@DanielKaplan Which link is problematic? As for "pickaxe", it is the original name of the git diff -S option. While the documentation does not mention "pickaxe", that is what -S implements. I have always known that option (-S) referenced as the "pickaxe option".Noemi
@Noemi Nevermind.I'm not sure why but it rendered correctly the second time I triedCutshall
This should be the accepted answer. Let me be first to say that. Executing a git log -G<regexp> --branches --all (the -G is same as -S but for regexes) does same thing as the accepted one, but it soooo much faster! The accepted answer was still searching for text after ≈10 minutes of me running it, whereas this one gives results after ≈4 seconds 🤷‍♂️ The output here is more useful as wellSapienza
@Sapienza Thank you very much for your feedback on that 12 years old answer. I have included your comment in the answer for more visibility.Noemi
A
335

My favorite way to do it is with git log's -G option (added in version 1.7.4).

-G<regex>
       Look for differences whose added or removed line matches the given <regex>.

There is a subtle difference between the way the -G and -S options determine if a commit matches:

  • The -S option essentially counts the number of times your search matches in a file before and after a commit. The commit is shown in the log if the before and after counts are different. This will not, for example, show commits where a line matching your search was moved.
  • With the -G option, the commit is shown in the log if your search matches any line that was added, removed, or changed.

Take this commit as an example:

diff --git a/test b/test
index dddc242..60a8ba6 100644
--- a/test
+++ b/test
@@ -1 +1 @@
-hello hello
+hello goodbye hello

Because the number of times "hello" appears in the file is the same before and after this commit, it will not match using -Shello. However, since there was a change to a line matching hello, the commit will be shown using -Ghello.

Appointed answered 14/9, 2012 at 18:34 Comment(3)
Is there a way to show the matching change context in the git log output?Ratliff
@Thilo-AlexanderGinkel - I usually just add the -p option to show a diff for each commit. Then when the log is opened in my pager, I search for whatever it is I'm looking for. If your pager is less and you git log -Ghello -p, you can type /hello, press Enter, and use n and N to find the next/previous occurrences of "hello".Appointed
I found an interesting issue with -G and Regex: If command line uses UTF-8 and the file you are looking at uses some ISO-Latin (8 bit) encoding, .* fails. For example, I have a change Vierter Entwurf -> Fünfter Entwurf, and while 'V.*ter Entwurf' produces a match, 'F.*ter Entwurf' does not.Escent
C
87

git log can be a more effective way of searching for text across all branches, especially if there are many matches, and you want to see more recent (relevant) changes first.

git log -p --all -S 'search string'
git log -p --all -G 'match regular expression'

These log commands list commits that add or remove the given search string/regex, (generally) more recent first. The -p option causes the relevant diff to be shown where the pattern was added or removed, so you can see it in context.

Having found a relevant commit that adds the text you were looking for (for example, 8beeff00d), find the branches that contain the commit:

git branch -a --contains 8beeff00d
Cyanamide answered 23/6, 2017 at 0:38 Comment(3)
Hi, these lines don't seem to work at all. My command is > git log -p --all -S 'public string DOB { get; set; } = string.Empty;' and every time I try to run it I get > fatal: ambiguous argument 'string': unknown revision or path not in the working tree. > Use '--' to separate paths from revisions, like this: > 'git <command> [<revision>...] -- [<file>...]'Stimulant
@Stimulant For some reason the ' quotes aren't grouping your search string together as a single argument. Instead, 'public is the argument to -S, and it's treating the rest as separate arguments. I'm not sure what environment you're running in, but that context would be necessary to help troubleshoot. I'd suggest opening a separate StackOverflow question if needed to help you troubleshoot, with all the context of how your git command is being sent to the shell. It seems to me that it's getting sent through some other command? Comments here aren't the right place to figure this out.Cyanamide
JFYI if you need Perl regexps instead of the default grep -E regexps you can use --perl-regexp optionCyruscyst
L
64

If you want to browse code changes (see what actually has been changed with the given word in the whole history) go for patch mode - I found a very useful combination of doing:

git log -p
# Hit '/' for search mode.
# Type in the word you are searching.
# If the first search is not relevant, hit 'n' for next (like in Vim ;) )
Last answered 17/4, 2014 at 8:17 Comment(0)
F
38

Search in any revision, any file (Unix/Linux):

git rev-list --all | xargs git grep <regexp>

Search only in some given files, for example XML files:

git rev-list --all | xargs -I{} git grep <regexp> {} -- "*.xml"

The result lines should look like this: 6988bec26b1503d45eb0b2e8a4364afb87dde7af:bla.xml: text of the line it found...

You can then get more information like author, date, and diff using git show:

git show 6988bec26b1503d45eb0b2e8a4364afb87dde7af
Frankfrankalmoign answered 2/4, 2015 at 15:3 Comment(0)
H
29

I took Jeet's answer and adapted it to Windows (thanks to this answer):

FOR /F %x IN ('"git rev-list --all"') DO @git grep <regex> %x > out.txt

Note that for me, for some reason, the actual commit that deleted this regex did not appear in the output of the command, but rather one commit prior to it.

Halfhearted answered 17/11, 2011 at 9:35 Comment(3)
+1 -- and if you want to avoid hitting "q" after each find, add --no-pager to the git command at the endBregma
Also, I would note that appending to a text file has the added advantage of actually displaying the matching text. (append to a text file using >>results.txt for those not versed in Windows piping...Bregma
And I thought bash's syntax is ugly :)Muraida
S
19

For simplicity, I'd suggest using GUI: gitk - The Git repository browser. It's pretty flexible

  1. To search code:

    Enter image description here
  2. To search files:

    Enter image description here
  3. Of course, it also supports regular expressions:

    Enter image description here

And you can navigate through the results using the up/down arrows.

Stupe answered 7/2, 2018 at 7:35 Comment(0)
P
18

Whenever I find myself at your place, I use the following command line:

git log -S "<words/phrases I am trying to find>" --all --oneline  --graph

Explanation:

  1. git log - Need I write more here; it shows the logs in chronological order.
  2. -S "<words/phrases i am trying to find>" - It shows all those Git commits where any file (added/modified/deleted) has the words/phrases I am trying to find without '<>' symbols.
  3. --all - To enforce and search across all the branches.
  4. --oneline - It compresses the Git log in one line.
  5. --graph - It creates the graph of chronologically ordered commits.
Polyneuritis answered 18/8, 2019 at 19:28 Comment(3)
"Whenever I find myself at your place, I feel the need to use git!"Sinew
There is also flag -G for searching using a regular expression and -i to make the search case-insensitive.Ruebenrueda
Quite the command, thank you. Can I also see the branch this happens on?Middleoftheroad
O
7

For anyone else trying to do this in Sourcetree, there is no direct command in the UI for it (as of version 1.6.21.0). However, you can use the commands specified in the accepted answer by opening Terminal window (button available in the main toolbar) and copy/pasting them therein.

Note: Sourcetree's Search view can partially do text searching for you. Press Ctrl + 3 to go to Search view (or click Search tab available at the bottom). From far right, set Search type to File Changes and then type the string you want to search. This method has the following limitations compared to the above command:

  1. Sourcetree only shows the commits that contain the search word in one of the changed files. Finding the exact file that contains the search text is again a manual task.
  2. RegEx is not supported.
Olympic answered 1/10, 2015 at 5:51 Comment(0)
R
4

I was kind of surprised here and maybe I missed the answer I was looking for, but I came here looking for a search on the heads of all the branches. Not for every revision in the repository, so for me, using git rev-list --all is too much information.

In other words, for me the variation most useful would be

git grep -i searchString $(git branch -r)

or

git branch -r | xargs git grep -i searchString

or

git branch -r | xargs -n1 -i{} git grep -i searchString {}

And, of course, you can try the regular expression approach here. What's cool about the approach here is that it worked against the remote branches directly. I did not have to do a check out on any of these branches.

Resplendent answered 24/11, 2021 at 15:32 Comment(0)
A
3

If you know the file in which you might have made do this:

git log --follow -p -S 'search-string' <file-path>

--follow: lists the history of a file

Affidavit answered 14/8, 2020 at 13:29 Comment(0)
P
3

Inspired by the answer https://mcmap.net/q/12006/-how-to-grep-search-through-committed-code-in-the-git-history, I found git grep seems to search for the full code base at each commit, not just the diffs, to the result tends to be repeating and long. This script below will search only the diffs of each git commit instead:

for commit in $(git rev-list --all); do 
    # search only lines starting with + or -
    if  git show "$commit" | grep "^[+|-].*search-string"; then 
        git show --no-patch --pretty=format:'%C(yellow)%h %Cred%ad %Cblue%an%Cgreen%d %Creset%s' --date=short $commit
    fi  
done

Example output, the bottom git commit is the one that first introduced the change I'm searching for:

csshx$ for commit in $(git rev-list --all); do 
>     if  git show "$commit" | grep "^[+|-].*As csshX is a command line tool"; then 
>         git show --no-patch --pretty=format:'%C(yellow)%h %Cred%ad %Cblue%an%Cgreen%d %Creset%s' --date=short $commit
>     fi  
> done

+As csshX is a command line tool, no special installation is needed. It may
987eb89 2009-03-04 Gavin Brock Added code from initial release
Pericardium answered 9/6, 2021 at 22:28 Comment(0)
G
2

Jeet's answer works in PowerShell.

git grep -n <regex> $(git rev-list --all)

The following displays all files, in any commit, that contain a password.

# Store intermediate result
$result = git grep -n "password" $(git rev-list --all)

# Display unique file names
$result | select -unique { $_ -replace "(^.*?:)|(:.*)", "" }
Godwit answered 16/12, 2014 at 2:17 Comment(2)
I like your answer, and can see where it is going, but it's not working on MacOS zsh: parse error near `-unique'` Hypostyle
Okay! I got it working https://mcmap.net/q/11999/-how-to-grep-git-commits-for-a-certain-word GOT I HATE BASHHypostyle
U
2

Okay, twice just today I've seen people wanting a closer equivalent for hg grep, which is like git log -pS but confines its output to just the (annotated) changed lines.

Which I suppose would be handier than /pattern/ in the pager if you're after a quick overview.

So here's a diff-hunk scanner that takes git log --pretty=%h -p output and spits annotated change lines. Put it in diffmarkup.l, say e.g. make ~/bin/diffmarkup, and use it like

git log --pretty=%h -pS pattern | diffmarkup | grep pattern
%option main 8bit nodefault
        // vim: tw=0
%top{
        #define _GNU_SOURCE 1
}
%x commitheader
%x diffheader
%x hunk
%%
        char *afile=0, *bfile=0, *commit=0;
        int aline,aremain,bline,bremain;
        int iline=1;

<hunk>\n        ++iline; if ((aremain+bremain)==0) BEGIN diffheader;
<*>\n   ++iline;

<INITIAL,commitheader,diffheader>^diff.*        BEGIN diffheader;
<INITIAL>.*     BEGIN commitheader; if(commit)free(commit); commit=strdup(yytext);
<commitheader>.*

<diffheader>^(deleted|new|index)" ".*   {}
<diffheader>^"---".*            if (afile)free(afile); afile=strdup(strchrnul(yytext,'/'));
<diffheader>^"+++".*            if (bfile)free(bfile); bfile=strdup(strchrnul(yytext,'/'));
<diffheader,hunk>^"@@ ".*       {
        BEGIN hunk; char *next=yytext+3;
        #define checkread(format,number) { int span; if ( !sscanf(next,format"%n",&number,&span) ) goto lostinhunkheader; next+=span; }
        checkread(" -%d",aline); if ( *next == ',' ) checkread(",%d",aremain) else aremain=1;
        checkread(" +%d",bline); if ( *next == ',' ) checkread(",%d",bremain) else bremain=1;
        break;
        lostinhunkheader: fprintf(stderr,"Lost at line %d, can't parse hunk header '%s'.\n",iline,yytext), exit(1);
        }
<diffheader>. yyless(0); BEGIN INITIAL;

<hunk>^"+".*    printf("%s:%s:%d:%c:%s\n",commit,bfile+1,bline++,*yytext,yytext+1); --bremain;
<hunk>^"-".*    printf("%s:%s:%d:%c:%s\n",commit,afile+1,aline++,*yytext,yytext+1); --aremain;
<hunk>^" ".*    ++aline, ++bline; --aremain; --bremain;
<hunk>. fprintf(stderr,"Lost at line %d, Can't parse hunk.\n",iline), exit(1);
Unexampled answered 8/12, 2020 at 18:1 Comment(0)
S
2

This is the only answer here which performs multi-process (like multi-threaded) search in order to dramatically speed up big git grep searches. I'll also cover a variety of tools and scenarios not covered in any other answer here.

This should work in Windows, Mac, and Linux. Tested on Windows 11 in Git Bash, and on Linux Ubuntu 22.04.

For Windows, use the Git Bash Linux-like terminal that comes with Git for Windows. You can also use the MSYS2 terminal.

Here are my installation instructions for those terminals in Windows:

  1. Installing Git for Windows
  2. Stack Overflow: Installing & setting up MSYS2 from scratch, including adding all 7 profiles to Windows Terminal

All about searching (via grep or similar) in your git repositories

Note: to exclude a certain string, such as a file name or path, simply tack this onto the end of any of the commands below:

| grep -vF 'some/path/to/exclude' | grep -vF 'some_file_to_exclude.c:'

# etc.

Quick summary

Add this to the end of any of the search commands to search only in certain files and folders. Or, remove it from any command to search all files and folders:

-- "path/to/my_file.c" "path/to/my_folder"

Example:

# Search only these files and folders in all local branches
time git branch | awk '{print $NF}' \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {} \
    -- "path/to/my_file.c" "path/to/my_folder"

Main commands from this answer

# ---------------------------------------------
# 1. Search all local branches
# ---------------------------------------------
# Search only these files and folders in all local branches
time git branch | awk '{print $NF}' \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {} \
    -- "path/to/my_file.c" "path/to/my_folder"

# ---------------------------------------------
# 2. Search all remote branches of all remotes
# ---------------------------------------------
# Search only these files and folders in all remote branches
time git branch -r | awk '{print $NF}' \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {} \
    -- "path/to/my_file.c" "path/to/my_folder"

# ---------------------------------------------
# 3. Search all local **and** remote branches
# ---------------------------------------------
# Search only these files and folders in all local and remote branches
time git branch -a | awk '{print $NF}' \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {} \
    -- "path/to/my_file.c" "path/to/my_folder"

# ---------------------------------------------
# Search **all commits** in the entire repository
# ---------------------------------------------
# Search only these files and folders in all commits (reachable from any branch 
# or tag) in the whole repository
time git rev-list --all \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {} \
    -- "path/to/my_file.c" "path/to/my_folder"

# ---------------------------------------------
# 1. Search in branch/commit HEAD, which is all checked-in and committed changes 
#    in the current branch
# ---------------------------------------------
# Search only these files and folders
git grep -n 'my regex search' -- "path/to/my_file.c" "path/to/my_folder"

# ---------------------------------------------
# 2. Search in a specified list of branches or commits
# ---------------------------------------------
# Search only these files and folders
git grep -n 'my regex search' my_branch commit1 commit2 \
    -- "path/to/my_file.c" "path/to/my_folder"

# ---------------------------------------------
# 3. Search in this range of commits
# ---------------------------------------------

# Search all commits over the range `commit_start` to `commit_end`, 
# NOT including `commit_start`, but including `commit_end`
time git rev-list commit_start..commit_end \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {}

# Search all commits over the range `commit_start` to `commit_end`, 
# including both `commit_start` and `commit_end`
time git rev-list commit_start~..commit_end \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {}

# Search only these files and folders in this range of commits
time git rev-list commit_start~..commit_end \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {} \
    -- "path/to/my_file.c" "path/to/my_folder"

# ----------------------------------------------------------
# Search only these files and folders in the current file system
# ----------------------------------------------------------

# Fastest: ripgrep

# regular expression search
time rg 'my regex search' -- "path/to/my_file.c" "path/to/my_folder"
# fixed string search
time rg -F 'my fixed string' -- "path/to/my_file.c" "path/to/my_folder"

# Slowest

# regular expression search
time grep -rn 'my regex search' -- "path/to/my_file.c" "path/to/my_folder"
# fixed string search
time grep -rnF 'my fixed string' -- "path/to/my_file.c" "path/to/my_folder"

From below:

To just get a list of branch names and/or commit hashes where matches were found, just pipe any of the above commands to:

| cut -d ':' -f 1 | sort -u

Details

1. Search the tips of all branches in a git repository

This answers the main question here:

Is it possible to perform a 'grep search' in all the branches of a Git project?

Here are the answers on how to search for 'my regex search' in all branches. Note that the time command in the front is just to time how long it takes. You can optionally remove that part. Here are the solutions:

# ---------------------------------------------
# 1. Search all local branches
# ---------------------------------------------

# Search all files and folders in all local branches
time git branch | awk '{print $NF}' \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {}

# Search only these files and folders in all local branches
time git branch | awk '{print $NF}' \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {} \
    -- "path/to/my_file.c" "path/to/my_folder"

# ---------------------------------------------
# 2. Search all remote branches of all remotes
# ---------------------------------------------

# A. Fetch all remote branches of `--all` remote repositories, to ensure that 
# your locally-stored remote-tracking branches are all up-to-date.
git fetch --all  

# B. Now perform the search

# Search all files and folders in all remote branches
time git branch -r | awk '{print $NF}' \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {}

# Search only these files and folders in all remote branches
time git branch -r | awk '{print $NF}' \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {} \
    -- "path/to/my_file.c" "path/to/my_folder"

# ---------------------------------------------
# 3. Search all local **and** remote branches
# ---------------------------------------------

# A. Fetch all remote branches of `--all` remote repositories, to ensure that
# your locally-stored remote-tracking branches are all up-to-date.
git fetch --all

# B. Now perform the search

# Search all files and folders in all local and remote branches
time git branch -a | awk '{print $NF}' \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {}

# Search only these files and folders in all local and remote branches
time git branch -a | awk '{print $NF}' \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {} \
    -- "path/to/my_file.c" "path/to/my_folder"

The output of the above commands is in this form:

branch_name:path/to/file:line_number:line of text containing the match in it

Example:

main:some_file.c:123:    myFunction();

Explanation of the commands above

  1. git branch lists all local branches

  2. git branch -r lists all locally-stored, remote-tracking branches

  3. git branch -a lists all local and remote branches

  4. awk '{print $NF}' prints only the last field (column) of each line, which is the branch name. $NF means "Number of Fields", so this grabs the last field (column). This removes the leading spaces before the branch names, as well as the * character in front of the current branch name.

  5. xargs -P "$(nproc)" -I {} runs the command command following it in parallel, with 1 argument (branch name, in this case) per process, and with the number of processes allowed to run at once equal to the number of processors on your machine (-P "$(nproc)"). This is the key to speeding up the search.

    The -I {} option tells xargs to replace {} in the git --no-pager grep -n 'my regex search' {} command which follows with the argument (branch name, in this case) passed to xargs from git branch.

    The -- part after git grep is how you tell git grep that all of the following arguments after that point are file or folder paths, and not more options.

    To show that using parallel processing via -P "$(nproc)" speeds up git grep, I ran a test search in a medium-sized repository (~100 MB in the .git dir) both with and without parallelization, and here are the results. I was searching for the fixed string (-F) STATIC_ASSERT(:

    # 0.440 sec on Linux Ubuntu 22.04       0.564 sec on Windows 11 in Git Bash
    # With parallelization (I have 20 logical cores on my Dell Precision 
    # 5570 laptop):
    time git branch | awk '{print $NF}' \
        | xargs -P "$(nproc)" -I {} git --no-pager grep -n -F 'STATIC_ASSERT(' {}
    
    # 1.840 sec on Linux Ubuntu 22.04       1.316 sec on Windows 11 in Git Bash
    # No parallelization: 
    time git branch | awk '{print $NF}' \
        | xargs -I {} git --no-pager grep -n -F 'STATIC_ASSERT(' {}
    
  6. git --no-pager grep -n 'my regex search' {} searches for the regex (regular expression) 'my regex search' in the branch name which comes from the output of git branch | awk '{print $NF}' and is inserted (by xargs) into the git grep command in place of the {} characters.

    Ie: xargs swaps out the {} chars for the branch name there.

    The -n option tells git grep to print the line number of each match.

    The --no-pager option is used to prevent the less pager from being used, which otherwise disables echo in your terminal and causes it to no longer display characters you type. See my question and this answer by @wjandrea here: Git Grep sometimes makes terminal stop showing typed commands

2. Search in all commits of a git repository

git rev-list --all, as shown in the main answer by @VonC, lists all commits reachable from any branch or tag in the whole repository. To search all commits, therefore, not just the head commit at the tip of each branch name, we will use git rev-list --all instead of git branch:

# Search all commits (reachable from any branch or tag) in the whole repository
time git rev-list --all \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {}

# Search only these files and folders in all commits (reachable from any branch 
# or tag) in the whole repository
time git rev-list --all \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {} \
    -- "path/to/my_file.c" "path/to/my_folder"

Speed tests:

Performing this search in the same ~100 MB repository as above, which has git log --oneline | wc -l = 154 commits, by the way, I get the following speed test results with and without parallelization:

# With parallelization.
# 12.835 sec on Linux Ubuntu 22.04      35.808 sec on Windows 11 in Git Bash
time git rev-list --all \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'STATIC_ASSERT' {}

# No parallelization.
# 58.482 sec on Linux Ubuntu 22.04      1min 53.929sec on Windows 11 in Git Bash 
time git rev-list --all \
    | xargs -I {} git --no-pager grep -n 'STATIC_ASSERT' {}

3. Search in just a specified list of branches or commits

# ---------------------------------------------
# 1. Search in branch/commit HEAD, which is all checked-in and committed changes 
#    in the current branch
# ---------------------------------------------

# Search all files and folders of the committed contents of the current branch
git grep -n 'my regex search'

# Search only these files and folders
git grep -n 'my regex search' -- "path/to/my_file.c" "path/to/my_folder"

# ---------------------------------------------
# 2. Search in a specified list of branches or commits
# ---------------------------------------------

# Search all files and folders just in branch `my_branch` and `commit1` and 
# `commit2`
git grep -n 'my regex search' my_branch commit1 commit2

# Search only these files and folders
git grep -n 'my regex search' my_branch commit1 commit2 \
    -- "path/to/my_file.c" "path/to/my_folder"

# ---------------------------------------------
# 3. Search in this range of commits
# ---------------------------------------------

# Search all commits over the range `commit_start` to `commit_end`, 
# NOT including `commit_start`, but including `commit_end`
time git rev-list commit_start..commit_end \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {}

# Search all commits over the range `commit_start` to `commit_end`, 
# including both `commit_start` and `commit_end`
time git rev-list commit_start~..commit_end \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {}

# Search only these files and folders in this range of commits
time git rev-list commit_start~..commit_end \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {} \
    -- "path/to/my_file.c" "path/to/my_folder"

4. Search just the current file-system / checked-out branch

To search just in the current file system, but not in the git history, here are the best techniques. The "current file system" includes the current branch plus any changes or new files not yet committed, but none of the other branches or past commits.

First, install ripgrep (rg), which is the world's fastest grep:

  1. For Windows: see my Q&A: How to install ripgrep on Windows?
  2. For Linux Ubuntu or similar:
    1. Get the latest release URL here: https://github.com/BurntSushi/ripgrep/releases
    2. Use that URL below:
      curl -LO https://github.com/BurntSushi/ripgrep/releases/download/14.1.0/ripgrep_14.1.0-1_amd64.deb
      sudo dpkg -i ripgrep_14.1.0-1_amd64.deb
      

Now, here are some techniques:

# ----------------------------------------------------------
# Search all files and folders in the current file system
# ----------------------------------------------------------

# Fastest: ripgrep
time rg 'my regex search'               # regular expression search
time rg -F 'my fixed string'            # fixed string search

# Slowest
time grep -rn 'my regex search'         # regular expression search
time grep -rnF 'my fixed string'        # fixed string search

# ----------------------------------------------------------
# Search only these files and folders in the current file system
# ----------------------------------------------------------

# Fastest: ripgrep

# regular expression search
time rg 'my regex search' -- "path/to/my_file.c" "path/to/my_folder"
# fixed string search
time rg -F 'my fixed string' -- "path/to/my_file.c" "path/to/my_folder"

# Slowest

# regular expression search
time grep -rn 'my regex search' -- "path/to/my_file.c" "path/to/my_folder"
# fixed string search
time grep -rnF 'my fixed string' -- "path/to/my_file.c" "path/to/my_folder"

Note that the following is not a technique to search the current file-system! git grep, rather, searches the git history, not the current file system:

# git grep: searches over all **committed** changes in HEAD
time git grep -n 'my regex search'      # regular expression search
time git grep -nF 'my fixed string'     # fixed string search

Speed tests on Linux Ubuntu 22.04 in my ~100 MB repository:

# Fastest: ripgrep
# 0.014 sec avg.
time rg -F 'STATIC_ASSERT('

# git grep [not quite the same thing; see above; but shown here for 
# speed comparison]
# Next fastest. 
# 0.013 sec avg.
time git grep -nF 'STATIC_ASSERT('

# Slowest
# 0.027 sec avg.
time grep -rnF 'STATIC_ASSERT('

5. Find only where a certain string changed, not just where it is present

See the "pickaxe" answer by @VonC.

6. Find only the branch names or commit hashes where matches exist

To just get a list of branch names and/or commit hashes where matches were found, just pipe any of the above commands to:

| cut -d ':' -f 1 | sort -u

Example:

# Get a list of branch names where matches were found
time git branch | awk '{print $NF}' \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {} \
    -- "path/to/my_file.c" "path/to/my_folder" \
    | cut -d ':' -f 1 | sort -u

Explanation:

  1. cut -d ':' -f 1 cuts the output at each : character and prints only the first field (column) (-f 1) of each line, which is the branch name or commit hash.
  2. sort -u sorts the output and removes duplicates. The -u option stands for 'u'nique.

7. Alternative forms, and additional explanations

This:

# works
time git rev-list --all \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {}

Can also be done like this:

# works; same as above
time git rev-list --all \
    | xargs -n 1 -P "$(nproc)" git --no-pager grep 'my regex search'

But, there's a big problem there with expanding from the last one. If you add -- "path/to/my_file.c" "path/to/my_folder" to the end of the last command above, it does not work! This is because the last form above automatically sticks one (due to -n 1) commit hash from the git rev-list --all output onto the end of the git --no-pager grep 'my regex search' command, meaning that it will be tacked on after the -- "path/to/my_file.c" "path/to/my_folder" part.

So, this:

# Does NOT work!
time git rev-list --all \
    | xargs -n 1 -P "$(nproc)" git --no-pager grep 'my regex search' \
    -- "path/to/my_file.c" "path/to/my_folder"

turns the git grep command into this:

git --no-pager grep 'my regex search' \
    -- "path/to/my_file.c" "path/to/my_folder" commit_hash

And now the commit_hash part is tacked onto the end in the wrong spot! It should be before the -- part, but instead xargs shoves it on after, and now the git grep command is broken!

So the solution is this:

# works (best)
time git rev-list --all \
    | xargs -P "$(nproc)" -I {} git --no-pager grep -n 'my regex search' {} \
    -- "path/to/my_file.c" "path/to/my_folder"

...or even this, which is much hackier I think, but also works:

# also works
time git rev-list --all \
    | xargs -n 1 -P "$(nproc)" \
    sh -c 'git --no-pager grep "STATIC_ASSERT" "$0" -- my_file.c'

In the last example just above, the sh -c command is used to run a shell command, and the "$0" is used to pass the first argument from xargs, which is a single (due to -n 1) commit hash from git rev-list --all, to the shell command. And again, same as before in every other example above, the -- is used to tell git grep that all of the arguments after that point are file or folder paths, and not more options to git grep.

References

  1. My question and this answer by @wjandrea here: Git Grep sometimes makes terminal stop showing typed commands.
  2. A ton of my own testing and research.
  3. GitHub Copilot AI in VSCode. I had a lot of chats with the GitHub Copilot AI in VSCode while figuring out all of the above. This answer is my own content and wording and I've tested and studied each command here to understand all parts.
  4. Is it possible to perform a 'grep search' in all the branches of a Git project?
  5. Using Git, how could I search for a string across all branches?

See also

  1. My answer: git grep by file extensions
  2. My answer which references this answer: Using Git, how could I search for a string across all branches?
Sada answered 31/1 at 23:48 Comment(0)
E
1
git rev-list --all | xargs -n 5 git grep EXPRESSION

is a tweak to Jeet's solution, so it shows results while it searches and not just at the end (which can take a long time in a large repository).

Eames answered 19/12, 2017 at 18:59 Comment(1)
It gives "real-time" results by running git grep on 5 revisions at a time, for anyone who was curious.Romo
A
0

So are you trying to grep through older versions of the code looking to see where something last exists?

If I were doing this, I would probably use git bisect. Using bisect, you can specify a known good version, a known bad version, and a simple script that does a check to see if the version is good or bad (in this case a grep to see if the code you are looking for is present). Running this will find when the code was removed.

Anchorite answered 28/5, 2010 at 11:52 Comment(4)
Yes, but your "test" can be a script that greps for the code and returns "true" if the code exists and "false" if it does not.Anchorite
Well, what if code was bad in revision 10, become good in revision 11 and become bad again in revision 15...Udometer
I agree with Paolo. Binary search is only appropriate for "ordered" values. In the case of git bisect, this means all "good" revisions come before all "bad" revisions, starting from the reference point, but that assumption can't be made when looking for transitory code. This solution might work in some cases, but it isn't a good general purpose solution.Compound
I think this is highly inefficient as the whole tree is checked out multiple times for bisect.Escent
O
0

Scenario: You did a big clean up of your code by using your IDE. Problem: The IDE cleaned up more than it should and now you code does not compile (missing resources, etc.)

Solution:

git grep --cached "text_to_find"

It will find the file where "text_to_find" was changed.

You can now undo this change and compile your code.

Outrage answered 17/1, 2019 at 11:48 Comment(0)
H
0

A. Full, unique, sorted, paths:

# Get all unique filepaths of files matching 'password'
# Source: https://mcmap.net/q/11999/-how-to-grep-git-commits-for-a-certain-word
git rev-list --all | (
    while read revision; do
        git grep -F --files-with-matches 'password' $revision | cat | sed "s/[^:]*://"
    done
) | sort | uniq

B. Unique, sorted, filenames (not paths):

# Get all unique filenames matching 'password'
# Source: https://mcmap.net/q/11999/-how-to-grep-git-commits-for-a-certain-word
git rev-list --all | (
    while read revision; do
        git grep -F --files-with-matches 'password' $revision | cat | sed "s/[^:]*://"
    done
) | xargs basename | sort | uniq

This second command is useful for BFG, because it only accept file names and not repository-relative/system-absolute paths.

Check out my full answer here for more explanation.

Hypostyle answered 25/10, 2021 at 21:58 Comment(0)
T
0

Command to search in git history

git log -S"alter" --author="authorname" --since=2021.1.1 --until=2023.1.1 -- .
Theresiatheresina answered 12/11, 2022 at 7:40 Comment(0)
P
-1

Another solution for the Windows and PowerShell is below:

git rev-list --all | ForEach-Object { git grep <expression> $_ }

You need to replace <expression> with your regular expression.

Preside answered 4/7, 2023 at 23:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.