renaming the extracted file from zipfile
Asked Answered
S

5

27

I have lots of zipped files on a Linux server and each file includes multiple text files.

what I want is to extract some of those text files, which have the same name across zipped files and save it a folder; I am creating one folder for each zipped file and extract the text file to it. I need to add the parent zipped folder name to the end of file names and save all text files in one directory. For example, if the zipped folder was March132017.zip and I extracted holding.txt, my filename would be holding_march13207.txt.

My problem is that I am not able to change the extracted file's name. I would appreciate if you could advise.

import os 
import sys 
import zipfile
os.chdir("/feeds/lipper/emaxx") 

pwkwd = "/feeds/lipper/emaxx" 

for item in os.listdir(pwkwd): # loop through items in dir
    if item.endswith(".zip"): # check for ".zip" extension
        file_name = os.path.abspath(item) # get full path of files
        fh = open(file_name, "rb")
        zip_ref = zipfile.ZipFile(fh)

        filelist = 'ISSUERS.TXT' , 'SECMAST.TXT' , 'FUND.TXT' , 'HOLDING.TXT'
        for name in filelist :
            try:
                outpath = "/SCRATCH/emaxx" + "/" + os.path.splitext(item)[0]
                zip_ref.extract(name, outpath)

            except KeyError:
                {}

        fh.close()
Stultz answered 19/5, 2017 at 22:36 Comment(4)
Use with..open, and then you dont need to take care of closing the file. I also recommend to use os.path.join instead of concating strings and pathOsanna
As an aside, this code only works if pwkwd is the current working directory. Otherwise file_name = os.path.abspath(item) doesn't build a correct path. You don't need an absolute path... os.path.join(pwkwd, item) would do.Carlie
@Matt.St thanks for your adviceStultz
@Carlie thank. Pwkd is set to be my cwd.Stultz
P
10

Why not just read the file in question and save it yourself instead of extracting? Something like:

import os
import zipfile

source_dir = "/feeds/lipper/emaxx"  # folder with zip files
target_dir = "/SCRATCH/emaxx"  # folder to save the extracted files

# Are you sure your files names are capitalized in your zip files?
filelist = ['ISSUERS.TXT', 'SECMAST.TXT', 'FUND.TXT', 'HOLDING.TXT']

for item in os.listdir(source_dir):  # loop through items in dir
    if item.endswith(".zip"):  # check for ".zip" extension
        file_path = os.path.join(source_dir, item)  # get zip file path
        with zipfile.ZipFile(file_path) as zf:  # open the zip file
            for target_file in filelist:  # loop through the list of files to extract
                if target_file in zf.namelist():  # check if the file exists in the archive
                    # generate the desired output name:
                    target_name = os.path.splitext(target_file)[0] + "_" + os.path.splitext(file_path)[0] + ".txt"
                    target_path = os.path.join(target_dir, target_name)  # output path
                    with open(target_path, "w") as f:  # open the output path for writing
                        f.write(zf.read(target_file))  # save the contents of the file in it
                # next file from the list...
    # next zip file...
Preponderate answered 19/5, 2017 at 23:28 Comment(4)
Thanks for your solution!Stultz
For text files, this solution works well, however it is worth noting that for a mix of files, we should use with open(target_path, "wb") as f: instead.Tiana
The answer by Saikiran, which simply modifies the zipinfo before extracting, is simpler, more direct and probably a bit more efficient.Whereon
Doesn't this approach load the full file into memory? What if you had a 10GB file inside the zip file but only 8GB of RAM?Escharotic
E
56
import zipfile
 
zipdata = zipfile.ZipFile('somefile.zip')
zipinfos = zipdata.infolist()

# iterate through each file
for zipinfo in zipinfos:
    # This will do the renaming
    zipinfo.filename = do_something_to(zipinfo.filename)
    zipdata.extract(zipinfo)

Reference: https://bitdrop.st0w.com/2010/07/23/python-extracting-a-file-from-a-zip-file-with-a-different-name/


Example:

from zipfile import ZipFile

src = "path/in/zip/file.txt"
dest = "extracted/path/file.txt"

with ZipFile("zipfile.zip", "r") as file:
    file.getinfo(src).filename = dest
    file.extract(src)
Enchase answered 29/5, 2019 at 13:55 Comment(3)
This should be the right answer: straight and effective, without post processing.Shadowy
What does do_something_to suppose to mean?Opinion
@MaryN I know I am a bit late but I added an example of how you can use .getinfo().Escharotic
P
10

Why not just read the file in question and save it yourself instead of extracting? Something like:

import os
import zipfile

source_dir = "/feeds/lipper/emaxx"  # folder with zip files
target_dir = "/SCRATCH/emaxx"  # folder to save the extracted files

# Are you sure your files names are capitalized in your zip files?
filelist = ['ISSUERS.TXT', 'SECMAST.TXT', 'FUND.TXT', 'HOLDING.TXT']

for item in os.listdir(source_dir):  # loop through items in dir
    if item.endswith(".zip"):  # check for ".zip" extension
        file_path = os.path.join(source_dir, item)  # get zip file path
        with zipfile.ZipFile(file_path) as zf:  # open the zip file
            for target_file in filelist:  # loop through the list of files to extract
                if target_file in zf.namelist():  # check if the file exists in the archive
                    # generate the desired output name:
                    target_name = os.path.splitext(target_file)[0] + "_" + os.path.splitext(file_path)[0] + ".txt"
                    target_path = os.path.join(target_dir, target_name)  # output path
                    with open(target_path, "w") as f:  # open the output path for writing
                        f.write(zf.read(target_file))  # save the contents of the file in it
                # next file from the list...
    # next zip file...
Preponderate answered 19/5, 2017 at 23:28 Comment(4)
Thanks for your solution!Stultz
For text files, this solution works well, however it is worth noting that for a mix of files, we should use with open(target_path, "wb") as f: instead.Tiana
The answer by Saikiran, which simply modifies the zipinfo before extracting, is simpler, more direct and probably a bit more efficient.Whereon
Doesn't this approach load the full file into memory? What if you had a 10GB file inside the zip file but only 8GB of RAM?Escharotic
H
4

You could simply run a rename after each file is extracted right? os.rename should do the trick.

zip_ref.extract(name, outpath)
parent_zip = os.path.basename(os.path.dirname(outpath)) + ".zip"
new_file_name = os.path.splitext(os.path.basename(name))[0] # just the filename

new_name_path = os.path.dirname(outpath) + os.sep + new_file_name + "_" + parent_zip
os.rename(outpath, new_namepath)

For the filename, if you want it to be incremental, simply start a count and for each file, go up by on.

count = 0
for file in files:
    count += 1
    # ... Do our file actions
    new_file_name = original_file_name + "_" + str(count)
    # ...

Or if you don't care about the end name you could always use something like a uuid.

import uuid
random_name = uuid.uuid4()
Heroworship answered 19/5, 2017 at 22:43 Comment(1)
Thanks for your advice!Stultz
P
0
outpath = '/SCRATCH/emaxx'
suffix = os.path.splitext(item)[0]

for name in filelist :
    index = zip_ref.namelist().find(name)
    if index != -1: # check the file exists in the zipfile
        filename, ext = os.path.splitext(name)
        zip_ref.filelist[index].filename = f'{filename}_{suffix}.{ext}' # rename the extracting file to the suffix file name
        zip_ref.extract(zip_ref.filelist[index], outpath) # use the renamed file descriptor to extract the file
Probity answered 19/11, 2018 at 9:18 Comment(0)
R
-1

I doubt this is possible to rename file during their extraction. What about renaming files once they are extracted ?

Relying on linux bash, you can achieve it in a one line :

os.system("find "+outpath+" -name '*.txt' -exec echo mv {} `echo {} | sed s/.txt/"+zipName+".txt/` \;")

So, first we search all txt files in the specified folder, then exec the renaming command, with the new name computed by sed.

Code not tested, i'm on windows now ^^'

Reindeer answered 19/5, 2017 at 22:51 Comment(1)
Great thanks. Right, using bash is always a way out :-)Stultz

© 2022 - 2024 — McMap. All rights reserved.