ripgrep to find files which contain multiple words, possibly on different lines?
Asked Answered
P

2

8

If i have files like:

cat file1.txt
foo
bar
cat file2.txt
foo
baz
cat file3.txt
bar
baz

Is there a command on ripgrep (or similar) that will search for e.g. files containing foo and bar? E.g. it will display file1.txt but not the other two files? (note foo and bar might not be on the same line.)

And then second question, to get even more fancy, can I use some syntax to count files with foo but exclude them if they also contain bar? So e.g. it would only display file2.txt?

Thanks!

Presumptive answered 26/5, 2021 at 23:12 Comment(2)
What does git have to do with any of this? (I haven't used ripgrep so don't have an answer to that, but I just read this question in the Git tag.)Acculturate
It is expected to show what you have tried to solve this yourself. Anyway, you can go through this discussion thread github.com/BurntSushi/ripgrep/discussions/1845 as a start.Lichenin
P
10

Because rg can be passed a list of files to search... You can just create a second search on the results of the first:

rg "foo" $(rg "bar" -l)

This searches for files that have both "bar" and "foo" in them.

Paduasoy answered 7/2, 2023 at 11:0 Comment(1)
Simple and elegant--thanks!Syconium
A
6

RipGrep uses the Rust RegEx crate by default. It is quite good, but it is still missing a couple of features, namely look-ahead and back references. RipGrep also supports PCRE2 which does have look-ahead and back references, but it must be specifically compiled in (Rust must be installed.)

$ cargo build --release --features 'pcre2'

Regarding searching in files for text that may be separated by newlines, default ripgrep provides a multiline option:

$ rg multiline 'foo.*bar' # can be shortened to -U

However, the . character type matches anything except newlines, so they have to be matched specifically:

$ rg -U 'foo.*[\n\r]*.*bar' *.txt # troublesome...

This becomes problematic with multiple lines involved, so another technique is to use an option to tell the . to also match newlines:

$ rg -U --multiline-dotall 'foo.*bar' *.txt

or use an option setting to tell the . to also match newlines:

$ rg -U '(?s)foo.*bar' *.txt

Result:

$ echo -e 'foo\nbar' > file1.txt
$ echo -e 'foo\nbaz' > file2.txt
$ echo -e 'bar\nbaz' > file3.txt

$ rg -U '(?s)foo.*bar' file*.txt
file1.txt
1:foo
2:bar

In order to find all files with 'foo' but not also 'bar' afterwards, it will be necessary to use look-ahead, or, more specifically, negative look-ahead. Only RipGrep with PCRE2 support compiled in will work:

$ rg -U --pcre2 '(?s)foo.*^(?!.*bar)'
file2.txt
1:foo
Archfiend answered 28/1, 2022 at 19:45 Comment(1)
I think this is order dependent, for example if bar is first and foo is later, would these commands still work?Paduasoy

© 2022 - 2024 — McMap. All rights reserved.