gnu find: apply -regex on basename only

Asked 24/4, 2012 at 5:51 Answered 27/8, 2022 at 9:40

I want to search for files with basename matching regexs. I tried this:

$ find  '/my/path' -regextype posix-extended -regex 'reg1' -regex 'reg2'

My problem is that regex is tested against full path. I'd like to only test the base name of the files.

Libelee answered 24/4, 2012 at 5:51 Comment(0)

GNU find does not include any regex operators that only apply to the basename. This is unfortunate. The closest we can come is by modifying the regex to strip slash-delimited portions from the front of the regex:

find /my/path -regextype posix-extended -regex ".*/reg1"

This will work for normal linux path names, but could fail for pathnames with unusual characters (newlines, for example).

As geekosaur points out, your input regular expressions should not match multiple components. If you don't have any control over the regex (say, if it's passed as a variable $REG1), you can try mangling it to convert . into [^/]:

find /my/path -regextype posix-extended -regex ".*/${REG1/./[^/]}"

This is going to fail for a lot of regular expressions (for instance, '.*.txt' gets horribly mangled). However, if you know that the regex are going to be simple then it might work.

For a slower but working solution, you can do all the pattern matching inside an -exec block:

find /my/path -exec bash -c 'basename "$0" | egrep -q '"'$REG1'"' && echo "$0"' '{}' ';'

The logic here is that find enumerates over all files and assigns them to $0 in the subshell. The subshell uses basename and egrep to filter the output down to paths that match the input regex. Note that egrep finds local matches; if you want to match the full basename, use egrep -q '"'^$REG1\$'"'

Depending on the semantics of the input regular expression (e.g. if $REG1 is intended to match any substring of the basename), you can get better performance for first searching for the regex in the whole path and then filtering to just the basename:

find /my/path -regextype posix-extended -regex ".*${REG1}.*" \
    -exec bash -c 'basename "$0" | egrep -q '"'$REG1'"' && echo "$0"' '{}' ';'

Blackington answered 26/11, 2019 at 16:6 Comment(0)

If you don't need full paths, you could use something like this. It only prints basenames and search pattern in them.

find -printf '%f\n' | egrep -E ".*\.mp3$"

There is also tool like fd/fdfind (name depending of the distribiution) that by default matches only the basename.

From the man page:

-p, --full-path
       By default, the search pattern is only matched against the filename (or directory
          name). Using this flag, the pattern is matched against the full path.

So by using below command, you get the full path and matching regex only to the basename.

fdfind --regex ".*\.mp3$"

Uttermost answered 27/8, 2022 at 9:40 Comment(0)

-1

You would need to anchor the regex, with something like

find /my/path -regextype posix-extended -regex 'mumble$'

where mumble must be written in a way that excludes / characters (for example, you could not use .*, you would need to say [^/]*).

Oppenheimer answered 24/4, 2012 at 6:5 Comment(3)

Yes but... my commands are generated by a program and I don't control regex. – Libelee 24/4, 2012 at 13:0

Actually, this solution matches any filename that ends with “mumble”, e.g., /my/path/we_mumble. – Ipa 29/7, 2019 at 9:9

Also, -regex must match the full path, so it has an implied ^...$ – Blackington 26/11, 2019 at 15:6

Recommended topics

Hot tags