How do I determine an open file's size in Python?

Asked 8/12, 2009 at 14:33 Answered 17/5, 2014 at 2:47

Solved python linux file filesystems ext2

There's a file that I would like to make sure does not grow larger than 2 GB (as it must run on a system that uses ext 2). What's a good way to check a file's size bearing in mind that I will be writing to this file in between checks? In particular, do I need to worry about buffered, unflushed changes that haven't been written to disk yet?

Motorcar answered 8/12, 2009 at 14:33 Comment(9)

Is there a reason you can't just keep track of the file size yourself - that is, see what the size is when you open it and increment a counter when you write? Not particularly elegant, but it should work. – Jaguar 8/12, 2009 at 14:39

I suppose that's a possibility I hadn't thought of... I might try that as well. – Motorcar 8/12, 2009 at 14:40

Is that not inefficient as hell though? – Lombard 8/12, 2009 at 14:51

The maximum file size limit under ext2 is 16GiB -- 64TiB depending on the block size. See en.wikipedia.org/wiki/Ext2. This doesn't answer your question, but just thought this might be helpful. – Spire 8/12, 2009 at 14:53

incrementing an integer is about the fastest thing a CPU can do, so probably no - this won't be inefficient :) – Defile 8/12, 2009 at 14:54

Jason, what would happen if you let the file grow too large? Generally in Python, try not to "look before you leap"... let exceptions occur, and handle them then. Usually faster and cleaner. What would you do if your counter said the file was about to become too large? Can you do the same after catching an exception when it does get too large? Some extra detail might help in your question. – Hydrokinetics 8/12, 2009 at 14:58

@~unutbu - I saw that, but the thing that scared me is this: "There are also many userspace programs that can't handle files larger than 2 GB" – Motorcar 8/12, 2009 at 14:59

@Peter - that's an interesting approach that I hadn't thought of. The thing is that I can see that as being a very platform-dependent thing. Correct me if I'm wrong. – Motorcar 8/12, 2009 at 15:6

It would be a good thing to test. Throw some too-big files at it and see what happens. Beats being scared of it. – Blueness 8/12, 2009 at 23:38

You could start with something like this:

class TrackedFile(file):
    def __init__(self, filename, mode):
        self.size = 0
        super(TrackedFile, self).__init__(filename, mode)
    def write(self, s):
        self.size += len(s)
        super(TrackedFile, self).write(s)

Then you could use it like this:

>>> f = TrackedFile('palindrome.txt', 'w')
>>> f.size
0
>>> f.write('A man a plan a canal ')
>>> f.size
21
>>> f.write('Panama')
27

Obviously, this implementation doesn't work if you aren't writing the file from scratch, but you could adapt your __init__ method to handle initial data. You might also need to override some other methods: writelines, for instance.

This works regardless of encoding, as strings are just sequences of bytes.

>>> f2 = TrackedFile('palindrome-latin1.txt', 'w')
>>> f2.write(u'A man a plan a canál '.encode('latin1')
>>> f3 = TrackedFile('palindrome-utf8.txt', 'w')
>>> f3.write(u'A man a plan a canál '.encode('utf-8'))
>>> f2.size
21
>>> f3.size
22

Blueness answered 8/12, 2009 at 15:17 Comment(3)

That's not actually. It you use ASCII, ISO1559 and UTF-8, the result will be the same, but the on disk size will not be. – Babylonia 9/12, 2009 at 17:25

No. It works for other encodings too, if you use actual strings. Answer modified to demonstrate. – Blueness 9/12, 2009 at 17:32

The trick is you can't just write unicode objects and rely on the os's encoding. – Blueness 9/12, 2009 at 17:37

Perhaps not what you want, but I'll suggest it anyway.

import os
a = os.path.getsize("C:/TestFolder/Input/1.avi")

Alternatively for an opened file you can use the fstat function, which can be used on an opened file. It takes an integer file handle, not a file object, so you have to use the fileno method on the file object:

a = open("C:/TestFolder/Input/1.avi")
b = os.fstat(a.fileno()).st_size

Lombard answered 8/12, 2009 at 14:38 Comment(0)

os.fstat(file_obj.fileno()).st_size should do the trick. I think that it will return the bytes written. You can always do a flush before hand if you are concerned about buffering.

Gravelblind answered 8/12, 2009 at 14:44 Comment(1)

And works in append mode too! Thank you. And yeah, I would flush before calling this. – Oruntha 6/10, 2017 at 22:7

Though this is an old question, I think that Isak has the simplest solution. Here's how to do it in Python:

# Assuming f is an open file
>>> pos = f.tell()  # Save the current position
>>> f.seek(0, 2)  # Seek to the end of the file
>>> length = f.tell()  # The current position is the length
>>> f.seek(pos)  # Return to the saved position
>>> print length
1024

Destination answered 17/5, 2014 at 2:47 Comment(4)

I think that in the first line (save current position), you should use f.tell(), not the seek(), which would cause an exception since seek() needs at least 1 argument. – Fallacy 13/6, 2017 at 3:12

@Fallacy Yes, you are right! Not sure how I missed that. Thanks! – Destination 13/6, 2017 at 14:38

This will calculate the file size correctly, but won't restore the position correctly due to known issues with tell in append mode. – Oruntha 6/10, 2017 at 22:6

@Oruntha I thought that would not be an issue as long as you did not write between the tell and seek, but I may be wrong. I didn't have an issue in my tests, but it looks like those issues vary by platform. Thanks for pointing that out. – Destination 9/10, 2017 at 16:16

I'm not familiar with python, but doesn't the stream object (or whatever you get when opening a file) have a property that contains the current position of the stream?

Similar to what you get with the ftell() C function, or Stream.Position in .NET.

Obviously, this only works if you are positioned at the end of the stream, which you are if you are currently writing to it.

The benefit of this approach is that you don't have to close the file or worry about unflushed data.

Defile answered 8/12, 2009 at 14:47 Comment(2)

'filehandle.tell()' indeed shows the number of bytes in the opened file, and works in either write or append mode. Not sure why all these more complex answers got upvoted. – Tamtama 30/7, 2015 at 17:17

@Tamtama No, f.tell() does not seem to work reliably in append mode. Unless you first f.seek(0,2). I have no idea why. – Oruntha 6/10, 2017 at 21:51

You could start with something like this:

class TrackedFile(file):
    def __init__(self, filename, mode):
        self.size = 0
        super(TrackedFile, self).__init__(filename, mode)
    def write(self, s):
        self.size += len(s)
        super(TrackedFile, self).write(s)

Then you could use it like this:

>>> f = TrackedFile('palindrome.txt', 'w')
>>> f.size
0
>>> f.write('A man a plan a canal ')
>>> f.size
21
>>> f.write('Panama')
27

This works regardless of encoding, as strings are just sequences of bytes.

>>> f2 = TrackedFile('palindrome-latin1.txt', 'w')
>>> f2.write(u'A man a plan a canál '.encode('latin1')
>>> f3 = TrackedFile('palindrome-utf8.txt', 'w')
>>> f3.write(u'A man a plan a canál '.encode('utf-8'))
>>> f2.size
21
>>> f3.size
22

Blueness answered 8/12, 2009 at 15:17 Comment(3)

That's not actually. It you use ASCII, ISO1559 and UTF-8, the result will be the same, but the on disk size will not be. – Babylonia 9/12, 2009 at 17:25

No. It works for other encodings too, if you use actual strings. Answer modified to demonstrate. – Blueness 9/12, 2009 at 17:32

The trick is you can't just write unicode objects and rely on the os's encoding. – Blueness 9/12, 2009 at 17:37

Or, if the file is already open:

>>> fsock = open('/etc/hosts', 'rb').read()
>>> len(fsock)
444

That's how many bytes the file is.

Fairfax answered 8/12, 2009 at 14:42 Comment(0)

Most reliable would be create a wrapping class which would check file's size when you open it, track write and seek operations, count current size based on those operations and prevent from exceeding size limit.

Lubric answered 8/12, 2009 at 14:41 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags