rename multiple files splitting filenames by '_' and retaining first and last fields

Asked 30/8, 2021 at 10:57 Answered 30/8, 2021 at 15:39

Solved regex awk split rename file-rename

Say I have the following files:

a_b.txt               a_b_c.txt             a_b_c_d_e.txt         a_b_c_d_e_f_g_h_i.txt

I want to rename them in such a way that I split their filenames by _ and I retain the first and last field, so I end up with:

a_b.txt               a_c.txt             a_e.txt         a_i.txt

Thought it would be easy, but I'm a bit stuck...

I tried rename with the following regexp:

rename 's/^([^_]*).*([^_]*[.]txt)/$1_$2/' *.txt

But what I would really need to do is to actually split the filename, so I thought of awk, but I'm not so proficient with it... This is what I have so far (I know at some point I should specify FS="_" and grab the first and last field somehow...

find . -name "*.txt" | awk -v mvcmd='mv "%s" "%s"\n' '{old=$0; <<split by _ here somehow and retain first and last fields>>; printf mvcmd,old,$0}'

Any help? I don't have a preferred method, but it would be nice to use this to learn awk. Thanks!

Bismuthinite answered 30/8, 2021 at 10:57 Comment(1)

regex is being used in rename command(in OP's efforts as well as in one of the answer), IMHO there is no reason for removing regex tag, so re-adding it now. In case you are removing it please do mention reason in comments, thank you. – Forster 30/8, 2021 at 11:25

Your rename attempt was close; you just need to make sure the final group is greedy.

rename 's/^([^_]*).*_([^_]*[.]txt)$/$1_$2/' *_*_*.txt

I added a _ before the last opening parenthesis (this is the crucial fix), and a $ anchor at the end, and also extended the wildcard so that you don't process any files which don't contain at least two underscores.

The equivalent in Awk might look something like

find . -name "*_*_*.txt" |
awk -F _ '{ system("mv " $0 " " $1 "_" $(NF)) }'

This is somewhat brittle because of the system call; you might need to rethink your approach if your file names could contain whitespace or other shell metacharacters. You could add quoting to partially fix that, but then the command will fail if the file name contains literal quotes. You could fix that, too, but then this will be a little too complex for my taste.

Here's a less brittle approach which should cope with completely arbitrary file names, even ones with newlines in them:

find . -name "*_*_*.txt" -exec sh -c 'for f; do
    mv "$f" "${f%%_*}_${f##*_}"
  done' _ {} +

find will supply a leading path before each file name, so we don't need mv -- here (there will never be a file name which starts with a dash).

The parameter expansion ${f##pattern} produces the value of the variable f with the longest available match on pattern trimmed off from the beginning; ${f%%pattern} does the same, but trims from the end of the string.

Surfactant answered 30/8, 2021 at 11:5 Comment(0)

With your shown samples, please try following pure bash code(with great use parameter expansion capability of BASH). This will catch all files with name/format .txt in their name. Then it will NOT pick files like: a_b.txt it will only pick files which have more than 1 underscore in their name as per requirement.

for file in *_*_*.txt
do
   firstPart="${file%%_*}"
   secondPart="${file##*_}"
   newName="${firstPart}_${secondPart}"
   mv -- "$file"  "$newName"
done

Forster answered 30/8, 2021 at 11:9 Comment(4)

The subprocess is rather unnecessary; just use case $file in *_*_*.txt) or change the wildcard so you are sure that only files with at least two underscores are matched. – Surfactant 30/8, 2021 at 11:13

@tripleee, so you mean awk command in it where I am checking number of - in name? – Forster 30/8, 2021 at 11:15

Yeah, exactly. If you use for file in *_*_*.txt instead, you can reduce the function body to a one-liner (though keeping the temporary variables might improve readability slightly). – Surfactant 30/8, 2021 at 11:16

@tripleee, thank you, I have edited code now. I kept variables in code so that it can be easily understood by anyone(by variable names itself). – Forster 30/8, 2021 at 11:18

This answer works for your example, but @tripleee's "find" approach is more robust.

for f in a_*.txt; do mv "$f" "${f%%_*}_${f##*_}"; done

Details: https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html / https://www.gnu.org/software/bash/manual/html_node/Pattern-Matching.html

Ledezma answered 30/8, 2021 at 11:5 Comment(0)

Here's an alternate regexp for the given samples:

$ rename -n 's/_.*_/_/' *.txt
rename(a_b_c_d_e_f_g_h_i.txt, a_i.txt)
rename(a_b_c_d_e.txt, a_e.txt)
rename(a_b_c.txt, a_c.txt)

Alcoran answered 30/8, 2021 at 15:39 Comment(0)

A different rename regex

rename 's/(\S_)[a-z_]*(\S\.txt)/$1$2/'

Using the same regex with sed or using awk within a loop.

for a in a_*; do 
    name=$(echo $a | awk -F_ '{print $1, $NF}'); #Or
    #name=$(echo $a | sed -E 's/(\S_)[a-z_]*(\S\.txt)/\1\2/g');  
    mv "$a" "$name"; 
done

Salespeople answered 30/8, 2021 at 12:35 Comment(0)

Recommended topics

Hot tags