python: can i move a file based on part of the name to a folder with that name
Asked Answered
C

1

3

I have a directory with a large number of files that I want to move into folders based on part of the file name. My list of files looks like this:

  • ID1_geneabc_species1.fa

  • ID1_genexy_species1.fa

  • ID2_geneabc_species1.fa

  • ID3_geneabc_species2.fa

  • ID3_genexy_species2.fa

  • ID4_genexy_species3.fa

I want to move the files I have into separate folders based on the last part of the file name (species1, species2, species3). The first parts of the file name do not always have the same number of numbers and/or letters but are always in 3 parts separated by an underscore '_'.

This is what I have tried from looking online but it does not work:

import os
import glob

dirs = glob.glob('*_*')

files = glob.glob('*.fa')

for file in files:
   name = os.path.splitext(file)[0]
   matchdir = next(x for x in dirs if name == x.rsplit('_')[0])
   os.rename(file, os.path.join(matchdir, file))

I have the list of names (species1, species2, species3) in a list in the script below, which correspond to the third part of my file name. I am able to create a set of directories in my current working directory from each of these names. Is there be a better way to do this after the following script, like looping through the list of species, matching the file, then moving it into the correct directory? THANKS.

from Bio import SeqIO
import os
import itertools

#to get a list of all the species in genbank file
all_species = []
for seq_record in SeqIO.parse("sequence.gb", "genbank"):
    all_species.append(seq_record.annotations["organism"])

#get unique names and change from set to list
Unique_species = set(all_species)
Species = list(Unique_species)

#send to file
f = open('speciesnames.txt', 'w')
for names in Species:
    f.write(names+'\n')
f.close()

print ('There are ' + str(int(len(Species))) + ' species.')

#make directory for each species
path = os.path.dirname(os.path.abspath(__file__))
for item in itertools.product(Species):
    os.makedirs(os.path.join(path, *item))
Cristionna answered 19/2, 2016 at 17:10 Comment(1)
Do you want the files to keep their names, or should the _species* be removed?Displeasure
I
1

So, you want a function, which gets folder name from file. Then you iterate over files, create dirs which don't exist and move files there. Stuff like that should work out.

def get_dir_name(filename):
    pos1 = filename.rfind('_')
    pos2 = filename.find('.')
    return filename[pos1+1:pos2]

for f in glob.glob('*.fa'):
    cwd = os.getcwd()
    dir_name = cwd+'/'+get_dir_name(f)
    print dir_name
    if not os.path.exists(dir_name):
        os.mkdir(dir_name)
    os.rename(f, dir_name+'/'+f)
Informative answered 19/2, 2016 at 18:39 Comment(4)
You are using os anyway; why don't you use os.path.join()?Displeasure
That is exactly what I was looking for! Thank you.Cristionna
Is it possible for you to walk me through the first part of this script? I want to understand how to define different parts of the filename for future use (i.e. concatenating files based on parts of the filename). I am mostly just lost on this line: return filename[pos1+1:pos2]. Thank you.Cristionna
pos1 is a position of the rightmost _ in the string, and pos2 is a position of a ., and to take line part between those 2 characters you use slicing. For example: s = "some_string.txt", pos1 here equals 4, and pos2 equals 11. Then you take substring s[5:11] starting with position 5, in order to exclude _ and finishing with position 11 (which is already excluded).Informative

© 2022 - 2025 — McMap. All rights reserved.