Piping find results into grep for fast directory exclusion
Asked Answered
F

3

10

I am successfully using find to create a list of all files in the current subdirectory, excluding those in the subdirectory "cache." Here's my first bit of code:

find . -wholename './cach*' -prune -o -print

I now wish to pipe this into a grep command. It seems like that should be simple:

find . -wholename './cach*' -prune -o -print | xargs grep -r -R -i "samson"

... but this is returning results that are mostly from the cache directory. I've tried removing the xargs reference, but that does what you'd expect, running the grep on text of the file names, rather than on the files themselves. My goal is to find "samson" in any files that aren't cached content.

I'll probably get around this issue by just using doubled greps in this instance, but I'm very curious about why this one-liner behaves this way. I'd love to hear thoughts on a way to modify it while still using these two commands (as there are speed advantages to doing it this way).

(This is in CentOS 5, btw.)

Funk answered 19/7, 2012 at 16:41 Comment(0)
G
9

The wholename match may be the reason why it's still including "cache" files. If you're executing the find command in the directory that contains the "cache" folder, it should work. If not, try changing it to -name '*cache*' instead.

Also, you do not need the -r or -R for your grep, that tells it to recurse through directories - but you're testing individual files.

You can update your command using the piped version, or a single-command:

find . -name '*cache*' -prune -o -print0 | xargs -0 grep -il "samson"

or

find . -name '*cache*' -prune -o -exec grep -iq "samson" {} \; -print

Note, the -l in the first command tells grep to "list the file" and not the line(s) that match. The -q in the second does the same; it tells grep to respond quietly so find will then just print the filename.

Giselegisella answered 19/7, 2012 at 16:52 Comment(2)
Thanks! The removal of recursion is what did the trick for me. (Old habits die hard. Incidentally, that was a mistype on my part, as I usually use "-r -i -I", which makes a lot more sense than the redundant recursion flags.) The "wholename" part was fine, since the unwanted subdirectory is indeed in the root level of the current directory. So it's now: find . -wholename './cach*' -prune -o -print | xargs grep -i -I "samson"Funk
Awesome, glad it was something simple =]Giselegisella
M
3

Use the -exec option on find instead of piping them to another command. From there you can use grep "samson" {} \; to look for samson in each file listed.

For example:

find . -wholename './cach*' -prune -o -exec grep "samson" "{}" +
Melesa answered 19/7, 2012 at 16:44 Comment(0)
F
3

You've told grep itself to recurse (twice! -r and -R are synonyms). Since one of the arguments you're passing is . (the top directory), grep is searching in every file (some of them twice, or even more if they're in subdirectories).

If you're going to use find and grep, do this:

find . -path './cach*' -prune -o -print0 | xargs -0 grep -i "samson"

Using -print0 and -0 makes your script work even with file names that contain spaces or punctuation characters.

However, you probably don't need to bother with find here, since GNU grep is capable of excluding directories:

grep -R --exclude-dir='cach*' -i "samson" .

(This also excludes ./deeply/nested/directory/cache. If you only want to exclude cache directories at the toplevel, use find as you did.)

Fruiterer answered 19/7, 2012 at 17:1 Comment(3)
If there are too many files in the current folder/path, the single grep will return a "too many arguments" error - so you'll need to be careful with that by itself.Giselegisella
Thanks for catching this! As mentioned in the "accepted" answer, cleaning that up fixed things right away. You guys are great.Funk
@Giselegisella No, a “too many arguments” error would come from the shell, if the command line was too long (e.g. if I'd written grep … * and there were a lot of files). Here there is no shell globbing, the command line is exactly 43 characters.Goerke

© 2022 - 2024 — McMap. All rights reserved.