grep or ripgrep: How to find only files that match multiple patterns (not only on the same line)?
Asked Answered
H

6

19

I'm searching for a fast method to find all files in a folder which contain 2 or more patterns

grep -l -e foo -e bar ./* or rg -l -e foo -e bar

show all files containing 'foo' AND 'bar' in the same line or 'foo' OR 'bar' in different lines but I want only files that have at a minimum one 'foo' match AND one 'bar' match in different lines. Files which only have 'foo' matches or only 'bar' matches shall be filtered out.

I know I could chain the grep calls but this will be too slow.

Hallucinosis answered 26/11, 2019 at 14:4 Comment(0)
S
15

So this doesn't perfectly answer the question, but, this is the StackOverflow question that pops up every time I google "ripgrep multiple patterns". So I'm leaving my answer here for the future googler (including myself)...

I primarily work in PowerShell, so this is how I perform an and search in ripgrep in PowerShell. This will match same line matches, which is why it's not a perfect answer, but it will identify files that match both patterns, and runs relatively quickly:

rg -l 'SecondSearchPattern' (rg -l 'FirstSearchPattern')

Explanation:

  • First the parens run: rg -l 'FirstSearchPattern', which searches all files for the pattern FirstSearchPattern. By using -l it returns a list of file paths only.

  • By placing it in (parentheses), it runs the whole command first, then "splats" the results of the command into the external rg command.

  • The external rg command is now run like this:

    rg -l 'SecondSearchPattern' "file.txt" "directory\file.txt"

    And yes, it does put them into quotes, so it handles paths with spaces. This searches all provided files that match the pattern SecondSearchPattern. Thus returning only files that match both patterns.

You can go one step further and add on | Get-Item (| gi) to return filesystem objects, and | % FullName to get the full path.

rg -l 'SecondSearchPattern' (rg -l 'FirstSearchPattern') | gi | % FullName
Stunk answered 29/3, 2022 at 20:52 Comment(1)
Works also with bash rg -l 'SecondSearchPattern' $(rg -l 'FirstSearchPattern')Estus
U
10

rg with multiline does work, however it will print as result everything in-between the criteria and sometimes that's not useful.

For the use case of chaining searches (in e.g. html, json, etc), where the 1st criterium is just to narrow down the files, and the 2nd criterium is actually what I am looking for, this is a possible solution:

rg -0 -l crit1 | xargs -0 -I % rg -H crit2 %

Alternatively I have just discovered ugrep which supports combining multiple criteria using boolean operators both on line and file level. This is quite something. It's a bit slower than rg + xargs, however it prints nicely all lines matching all criteria from the files (instead of just showing the last criteria from above):

ugrep --files -e crit1 --and -e crit2
Umlaut answered 28/3, 2022 at 13:3 Comment(0)
L
6

If you want to search for two or more words that occur on multiple lines you can use ripgrep's option --multiline-dotall, in addition to to provide -U/--multiline. You also need to search for foo before bar and bar before foo using the | operator:

rg -lU --multiline-dotall 'foo.*bar|bar.*foo' .

For any number of words you'll need to | all permutations of those words. For that I use a small python script (which I called rga) which searches in the current directory (and downwards), for files that contain all arguments given on the commandline:

#! /opt/util/py310/bin/python

import sys
import subprocess
from itertools import permutations

rgarg = '|'.join(('.*'.join(x) for x in permutations(sys.argv[1:])))
cmd = ['rg', '-lU', '--multiline-dotall', rgarg, '.']
# print(' '.join(cmd))
proc = subprocess.run(cmd, capture_output=True)
sys.stdout.write(proc.stdout.decode('utf-8'))

I have searched successfully with six arguments, above that the commandline becomes to long. There are probably ways around that by saving the argument to a file and adding -f file_name, but I never needed/investigated that.

Lop answered 8/11, 2021 at 7:21 Comment(0)
R
4
$ cat f1
afoot
2bar
$ cat f2
foo bar
$ cat f3
foot
$ cat f4
bar
$ cat f5
barred
123
foo3

$ rg -Ul '(?s)foo.*?\n.*?bar|bar.*?\n.*?foo'
f5
f1

You can use -U option to match across lines. The s flag will enable . to match newlines as well. Since you want the matches to be across different lines, you need to match a newline character in between the search terms as well.

Reitz answered 26/11, 2019 at 14:16 Comment(0)
W
2

you can add the following function: (tested in zsh)

multisearch() {
  case $# in
    0) return 1 ;;
    1) rg $1 ;;
  esac

  local lastArg=${@[${#}]}
  local files=(`rg --files-with-matches ${1}`)

  (( ${#files} )) || return 0

  # skip first and last arg
  for arg in ${@:2:# - 2}; do
    files=(`rg --files-with-matches ${arg} ${files[@]}`)

    (( ${#files} )) || return 0
  done

  rg ${lastArg} ${files[@]}
}

and use like:

$ multisearch foo bar
Wieren answered 18/6, 2023 at 18:13 Comment(0)
O
0

rg 'text1|text2'

This way files containing both text1 and text2 will be found.

Olwena answered 17/7 at 7:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.