gzip a file in Python
Asked Answered
P

9

63

I want to gzip a file in Python. I am trying to use the subprocss.check_call(), but it keeps failing with the error 'OSError: [Errno 2] No such file or directory'. Is there a problem with what I am trying here? Is there a better way to gzip a file than using subprocess.check_call?

from subprocess import check_call

def gZipFile(fullFilePath)
    check_call('gzip ' + fullFilePath)

Thanks!!

Potshot answered 16/11, 2011 at 18:25 Comment(3)
Why not docs.python.org/library/gzip.html ?Innervate
related: to create a gzipped tarball archive.tar.gz from a directory /dir/path, you could use shutil.make_archive('archive', 'gztar', '/dir/path')Chat
Does this answer your question? python subprocess with gzipDermoid
A
19

Try this:

check_call(['gzip', fullFilePath])

Depending on what you're doing with the data of these files, Skirmantas's link to http://docs.python.org/library/gzip.html may also be helpful. Note the examples near the bottom of the page. If you aren't needing to access the data, or don't have the data already in your Python code, executing gzip may be the cleanest way to do it so you don't have to handle the data in Python.

Adnopoz answered 16/11, 2011 at 18:27 Comment(1)
well, idk if “clean” is the right word for it but it certainly is the fastest way, and the one needing the least code on your side.Lated
T
93

There is a module gzip. Usage:

Example of how to create a compressed GZIP file:

import gzip
content = b"Lots of content here"
f = gzip.open('/home/joe/file.txt.gz', 'wb')
f.write(content)
f.close()

Example of how to GZIP compress an existing file:

import gzip
f_in = open('/home/joe/file.txt')
f_out = gzip.open('/home/joe/file.txt.gz', 'wb')
f_out.writelines(f_in)
f_out.close()
f_in.close()

EDIT:

Jace Browning's answer using with in Python >= 2.7 is obviously more terse and readable, so my second snippet would (and should) look like:

import gzip
with open('/home/joe/file.txt', 'rb') as f_in, gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
    f_out.writelines(f_in)
Teshatesla answered 16/11, 2011 at 18:28 Comment(7)
does the second version replaces the original file with the gzipped one, as the gzip command would ? It seems it doesn't.Vivienne
@Benoît: Since the output file has a different name than the one being read, it's pretty obvious that it doesn't do that. Doing so would require the compressed data to be temporarily stored somewhere else until the all the data in the original file had been compressed.Embrocation
Using gzip, the ouput filename is different from the input filename. And it still removes the input file after having created the output one. I was simply asking whether the python gzip module did the same thing.Vivienne
the file opened in read mode is just read from normally. there’s no way for the gzip module to know where the data came from and do something like deleting the file. use Path(in_path).unlink() to remove the file afterwards. or just use check_call(['gzip', in_path]), which compresses faster and deletes the file.Lated
I suggest a correction: content = b"Lots of content here"Quantifier
You need to turn the contents into bytes first in python3. docs.python.org/3.7/library/gzip.html#examples-of-usage . Something like f.write(content.encode("utf-8")) works.Deakin
@GumwonHong Thanks for suggestion, the original answer was written for Python 2.x.Thimerosal
S
50

Read the original file in binary (rb) mode and then use gzip.open to create the gzip file that you can write to like a normal file using writelines:

import gzip

with open("path/to/file", 'rb') as orig_file:
    with gzip.open("path/to/file.gz", 'wb') as zipped_file:
        zipped_file.writelines(orig_file)

Even shorter, you can combine the with statements on one line:

with open('path/to/file', 'rb') as src, gzip.open('path/to/file.gz', 'wb') as dst:
    dst.writelines(src)
Shipmate answered 17/7, 2012 at 14:10 Comment(2)
In this case are we writing files back to the same path necessarily? can't we store them some where else temporarily so that we can later save them to S3?Belgravia
Iterating on "lines" of... binary data does not seem like a good idea.Myotonia
D
30

From the docs for Python3

Gzip an existing file

import gzip
import shutil
with open('file.txt', 'rb') as f_in:
    with gzip.open('file.txt.gz', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

Or if you hate nested with statements

import gzip
import shutil
from contextlib import ExitStack
with ExitStack() as stack:
    f_in = stack.enter_context(open('file.txt', 'rb'))
    f_out = stack.enter_context(gzip.open('file.txt.gz', 'wb'))
    shutil.copyfileobj(f_in, f_out)

Create a new gzip file:

import gzip
content = b"Lots of content here"
with gzip.open("file.txt.gz", "wb") as f:
    f.write(content)

Note the fact that content is turned into bytes

Another method for if you aren't creating content as a string/byte literal like the above example would be

import gzip
# get content as a string from somewhere else in the code
with gzip.open("file.txt.gz", "wb") as f:
    f.write(content.encode("utf-8"))

See this SO question for a discussion of other encoding methods.

Deakin answered 19/2, 2019 at 15:37 Comment(3)
I didn't know about ExitStack...interesting!Gourmand
Well you don't need ExitStack in this case. You can simply separate the context managers using comas, see the answer by Jace Browning: https://mcmap.net/q/301424/-gzip-a-file-in-python ExitStack has a very niche usage actually.Rainwater
kudos for using shutil.copyfileobj()Rainwater
A
19

Try this:

check_call(['gzip', fullFilePath])

Depending on what you're doing with the data of these files, Skirmantas's link to http://docs.python.org/library/gzip.html may also be helpful. Note the examples near the bottom of the page. If you aren't needing to access the data, or don't have the data already in your Python code, executing gzip may be the cleanest way to do it so you don't have to handle the data in Python.

Adnopoz answered 16/11, 2011 at 18:27 Comment(1)
well, idk if “clean” is the right word for it but it certainly is the fastest way, and the one needing the least code on your side.Lated
L
4

Use the gzip module:

import gzip
import os

in_file = "somefile.data"
in_data = open(in_file, "rb").read()
out_gz = "foo.gz"
gzf = gzip.open(out_gz, "wb")
gzf.write(in_data)
gzf.close()

# If you want to delete the original file after the gzip is done:
os.unlink(in_file)

Your error: OSError: [Errno 2] No such file or directory' is telling you that the file fullFilePath does not exist. If you still need to go that route, please make sure that file exists on your system and you are using an absolute path not relative.

Lacto answered 16/11, 2011 at 18:31 Comment(4)
Thanks everyone for the quick repoonses.Everyone here is suggesting gzip.I had tried that as well.Is it a better way?The reason why I am not using that is that it leaves the original file as is.So I end up with both versions -regular and gzip file. I am accessing the data of the file though.@retracile, your fix worked,thanks a ton.I am still wondering if I should use subprocess or gzip.Potshot
@Potshot The easiest way to do that would be: When the gzip is done, call os.unlink(original_File_Name) to delete the original file that you made the gzip from. See my edits.Lacto
@Rinks: The reason why I am not using that is that it leaves the original file as is - so why don't you delete file afterwards?Thimerosal
Thanks again. I can certainly delete the file later on. I am going to test both methods -gzip and check_call for a few days and finalize on one.Potshot
G
4

the documentation on this is actually insanely straightforward

Example of how to read a compressed file:

import gzip
f = gzip.open('file.txt.gz', 'rb')
file_content = f.read()
f.close()

Example of how to create a compressed GZIP file:

import gzip
content = "Lots of content here"
f = gzip.open('file.txt.gz', 'wb')
f.write(content)
f.close()

Example of how to GZIP compress an existing file:

import gzip
f_in = open('file.txt', 'rb')
f_out = gzip.open('file.txt.gz', 'wb')
f_out.writelines(f_in)
f_out.close()
f_in.close()

https://docs.python.org/2/library/gzip.html

That's the whole documentation . . .

Gourmand answered 9/11, 2014 at 22:42 Comment(1)
Middle example only runs if I write content = b"Lots of content here" (note the b).Spelaean
G
3
import gzip

def gzip_file(src_path, dst_path):
    with open(src_path, 'rb') as src, gzip.open(dst_path, 'wb') as dst:
        for chunk in iter(lambda: src.read(4096), b""):
            dst.write(chunk)

The advantage of this solution is that it is guaranteed to be memory-efficient: we don't keep the whole input file in memory, instead, we read and convert it using 4K chunks.

Grimonia answered 29/6, 2018 at 15:56 Comment(0)
L
0

For windows subprocess can be used to run 7za utility: from https://www.7-zip.org/download.html download 7-Zip Extra: standalone console version, 7z DLL, Plugin for Far Manager compact takes all csv files inside gzip directory and compress each one to gzip format. Original files are deleted. 7z options can be found in https://sevenzip.osdn.jp/chm/cmdline/index.htm

import os
from pathlib import Path
import subprocess


def compact(cspath, tec, extn, prgm):  # compress each extn file in tec dir to gzip format
    xlspath = cspath / tec  # tec location
    for baself in xlspath.glob('*.' + str(extn)):  # file iteration inside directory
        source = str(baself)
        target = str(baself) + '.gz'
        try:
            subprocess.call(prgm + " a -tgzip \"" + target + "\" \"" + source + "\" -mx=5")
            os.remove(baself)  # remove src xls file
        except:
            print("Error while deleting file : ", baself)
    return 


exe = "C:\\7za\\7za.exe"  # 7za.exe (a = alone) is a standalone version of 7-Zip
csvpath = Path('C:/xml/baseline/')  # working directory
compact(csvpath, 'gzip', 'csv', exe)  # xpress each csv file in gzip dir to gzip format 
Linnea answered 10/10, 2021 at 13:6 Comment(0)
B
0

Just for the sake of completeness. None of the examples actually compresses the data. To do so, one needs to invoke gzip.compress The following snippet reads from pg_dump and actually compresses the output.

cmd = ['pg_dump', '-d', 'mydb']
sql = subprocess.check_output(cmd)

with open('backups/{}.gz'.format('mydb'), 'wb') as zfile:
   zfile.write(gzip.compress(sql))
Brokaw answered 14/2, 2023 at 14:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.