seek() a file within a zip file in Python without passing it to memory
Asked Answered
M

2

9

is there anyway to make a file inside a zip file seekable in Python without reading it to memory?

I tried the obvious procedure but I get an error since the file is not seekable:

In [74]: inputZipFile = zipfile.ZipFile("linear_g_LAN2A_F_3keV_1MeV_30_small.zip", 'r')

In [76]: inputCSVFile = inputZipFile.open(inputZipFile.namelist()[0], 'r')   

In [77]: inputCSVFile
Out[77]: <zipfile.ZipExtFile at 0x102f5fad0>

In [78]: inputCSVFile.se
inputCSVFile.seek      inputCSVFile.seekable  

In [78]: inputCSVFile.seek(0)
---------------------------------------------------------------------------
UnsupportedOperation                      Traceback (most recent call last)
<ipython-input-78-f1f9795b3d55> in <module>()
----> 1 inputCSVFile.seek(0)

UnsupportedOperation: seek
Matthieu answered 10/10, 2012 at 14:34 Comment(0)
P
9

There is no way to do so for all zip files. DEFLATE is a stream compression algorithm, which means that there is no way to decompress arbitrary parts of the file without having decompressed everything before it. It could possibly be implemented for files that have been stored, but then you get in the unfavorable position where some entries are seekable and others aren't.

Panettone answered 10/10, 2012 at 14:48 Comment(3)
I see, thank you. But from what I'm searching, it's possible with tar files, correct?Matthieu
Only if the tar file is uncompressed. As soon as you throw in gzip (DEFLATE) compression, you get the same problem.Panettone
Although it happens on the fly, I can use a gzip compressed tar file and seek inside it, Python seems to either be decompressing it in memory or somewhere in a tmp disk and the process takes lot of time compared to an uncompressed file - about 1min vs 4 seconds to the example I'm trying. Thank you for all the help.Matthieu

© 2022 - 2024 — McMap. All rights reserved.