After writing to a file, why does os.path.getsize still return the previous size?
Asked Answered
C

5

13

I am trying to split up a large xml file into smaller chunks. I write to the output file and then check its size to see if its passed a threshold, but I dont think the getsize() method is working as expected.

What would be a good way to get the filesize of a file that is changing in size.

Ive done something like this...

import string
import os

f1 = open('VSERVICE.xml', 'r')
f2 = open('split.xml', 'w')

for line in f1:
  if str(line) == '</Service>\n':
    break
  else:
    f2.write(line)
    size = os.path.getsize('split.xml')
    print('size = ' + str(size))

running this prints 0 as the filesize for about 80 iterations and then 4176. Does Python store the output in a buffer before actually outputting it?

Crossarm answered 18/6, 2009 at 16:38 Comment(0)
C
10

Yes, Python is buffering your output. You'd be better off tracking the size yourself, something like this:

size = 0
for line in f1:
  if str(line) == '</Service>\n':
    break
  else:
    f2.write(line)
    size += len(line)
    print('size = ' + str(size))

(That might not be 100% accurate, eg. on Windows each line will gain a byte because of the \r\n line separator, but it should be good enough for simple chunking.)

Cottonwood answered 18/6, 2009 at 16:41 Comment(1)
Thanks! That should work. I dont need it to be 100% accurate.Crossarm
K
11

File size is different from file position. For example,

os.path.getsize('sample.txt') 

It exactly returns file size in bytes.

But

f = open('sample.txt')
print f.readline()
f.tell() 

Here f.tell() returns the current position of the file handler - i.e. where the next write will put its data. Since it is aware of the buffering, it should be accurate as long as you are simply appending to the output file.

Kero answered 28/4, 2011 at 16:22 Comment(0)
C
10

Yes, Python is buffering your output. You'd be better off tracking the size yourself, something like this:

size = 0
for line in f1:
  if str(line) == '</Service>\n':
    break
  else:
    f2.write(line)
    size += len(line)
    print('size = ' + str(size))

(That might not be 100% accurate, eg. on Windows each line will gain a byte because of the \r\n line separator, but it should be good enough for simple chunking.)

Cottonwood answered 18/6, 2009 at 16:41 Comment(1)
Thanks! That should work. I dont need it to be 100% accurate.Crossarm
O
5

Have you tried to replace os.path.getsize with os.tell, like this:

f2.write(line)
size = f2.tell()
Order answered 6/8, 2009 at 14:26 Comment(0)
S
4

Tracking the size yourself will be fine for your case. A different way would be to flush the file buffers just before you check the size:

f2.write(line)
f2.flush()  # <-- buffers are written to disk
size = os.path.getsize('split.xml')

Doing that too often will slow down file I/O, of course.

Stevenstevena answered 18/6, 2009 at 19:16 Comment(0)
R
1

To find the offset to the end of a file:

file.seek(0,2)
print file.tell()

Real world example - read updates to a file and print them as they happen:

file = open('log.txt', 'r')
#find inital End Of File offset
file.seek(0,2)
eof = file.tell()
while True:
    #set the file size agian
    file.seek(0,2)
    neweof = file.tell()
    #if the file is larger...
    if neweof > eof:
        #go back to last position...
        file.seek(eof)
        # print from last postion to current one
        print file.read(neweof-eof),
        eof = neweof
Raster answered 25/11, 2011 at 11:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.