Size of an open file object
Asked Answered
P

5

81

Is there a way to find the size of a file object that is currently open?

Specifically, I am working with the tarfile module to create tarfiles, but I don't want my tarfile to exceed a certain size. As far as I know, tarfile objects are file-like objects, so I imagine a generic solution would work.

Pericarp answered 12/11, 2008 at 11:49 Comment(0)
I
134
$ ls -la chardet-1.0.1.tgz
-rwxr-xr-x 1 vinko vinko 179218 2008-10-20 17:49 chardet-1.0.1.tgz
$ python
Python 2.5.1 (r251:54863, Jul 31 2008, 22:53:39)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open('chardet-1.0.1.tgz','rb')
>>> f.seek(0, os.SEEK_END)
>>> f.tell()
179218L

Adding ChrisJY's idea to the example

>>> import os
>>> os.fstat(f.fileno()).st_size
179218L
>>>        

Note: Based on the comments, f.seek(0, os.SEEK_END) is must before calling f.tell(), without which it would return a size of 0. The reason is that f.seek(0, os.SEEK_END) moves the file object's position to the end of the file.

Intimidate answered 12/11, 2008 at 11:55 Comment(12)
docs.python.org/library/stat.html#stat.ST_SIZE os.fstat return stat structure, please use st_sizeSummons
Can someone shed some light on the magic of f.seek(0,2)? Why tell() returns 0 without it?Maladminister
@m_poorUser f.seek(0, 2) moves the file object's position to 0 bytes from the end of the file, so the file object's position is at the end of the file. Then, f.tell() returns the current file object's position, which is the size of the file in this case. See docs.python.org/2/tutorial/…Sumptuous
f.seek(...) returns the absolute position. No need to follow with f.tell(). Try this: print(f.seek(0, 2)) and you will see.Retribution
@Retribution - that's new in Python3. In Python2 f.seek returns nothing, regardless of which arguments you pass to it. As such, the f.tell() should be kept as it's needed!Fiddling
In Python 3.6, while BufferedIO and RawIO you may use .tell() to estimate file size, by definition it returns the current stream position as an opaque number. And that number does not usually represent a number of bytes in the underlying binary storage for TextIO. FYI.Malefic
The example would be more clear if f.seek(0, 2) was written as f.seek(0, os.SEEK_END).Stound
f.seek(0, os.SEEK_END); file_size = f.tell() is good. f.seek(...) does not return anything. docs.python.org/2/library/stdtypes.html#file.seekKussell
You don't need tell() because seek() already returns the position it has been set to.Tiemroth
Write f.seek(0,0) after file_size = f.seek(0,2) if you plan to use the file later.Ampere
Also dont forgot to set f.seek() to zero , otherwise you will not get any data. f.seek(0, os.SEEK_END); file_size = f.tell(); f.seek(0)Dabster
Also if your file is a SpooledTemporaryFile, calling fileno() causes the contents of your file be written to disk more info.Avrilavrit
P
15

Well, if the file object support the tell method, you can do:

current_size = f.tell()

That will tell you were it is currently writing. If you write in a sequential way this will be the size of the file.

Otherwise, you can use the file system capabilities, i.e. os.fstat as suggested by others.

Photokinesis answered 12/11, 2008 at 11:59 Comment(3)
current_size is a bad variable name since it means current size of the file. tell() gives the current position of the file stream - that is, where the next read/write will occur.Retribution
According to the Python 3.6 doc, .tell() Return the current stream position as an opaque number. The number does not usually represent a number of bytes in the underlying binary storage.Malefic
@Malefic only if the file is opened in text mode.Dacy
S
7

If you have the file descriptor, you can use fstat to find out the size, if any. A more generic solution is to seek to the end of the file, and read its location there.

Sedgewake answered 12/11, 2008 at 11:55 Comment(0)
N
3

I was curious about the performance implications of both, since once you open a file, the name attribute of the handle gives you the filename (so you can call os.stat on it).

Here's a function for the seek/tell method:

import io
def seek_size(f):
    pos = f.tell()
    f.seek(0, io.SEEK_END)
    size = f.tell()
    f.seek(pos) # back to where we were
    return size

With a 65 MiB file on an SSD, Windows 10, this is some 6.5x faster than calling os.stat(f.name)

Nashua answered 3/8, 2022 at 0:2 Comment(1)
First, I consider it bad style to use speed as an input in the decision making where it is not called for. And this is extreme case of “not called for”. Second, it should be compared to os.fstat(f.fileno()), because otherwise the kernel needs to search for the file again.Menides
T
2

Another solution is using StringIO "if you are doing in-memory operations".

with open(file_path, 'rb') as x:
    body = StringIO()
    body.write(x.read())
    body.seek(0, 0)

Now body behaves like a file object with various attributes like body.read().

body.len gives the file size.

Thenna answered 17/8, 2016 at 9:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.