truncating a text file does not change the file
Asked Answered
H

2

5

When a novice (like me) asks for reading/processing a text file in python he often gets answers like:

with open("input.txt", 'r') as f:
    for line in f:
        #do your stuff

Now I would like to truncate everything in the file I'm reading after a special line. After modifying the example above I use:

with open("input.txt", 'r+') as file:
    for line in file:
        print line.rstrip("\n\r") #for debug
        if line.rstrip("\n\r")=="CC":
           print "truncating!"  #for debug
           file.truncate();
           break;

and expect it to throw away everything after the first "CC" seen. Running this code on input.txt:

AA
CC
DD

the following is printed on the console (as expected):

AA
CC
truncating!

but the file "input.txt" stays unchanged!?!?

How can that be? What I'm doing wrong?

Edit: After the operation I want the file to contain:

AA
CC
Hippocampus answered 18/1, 2016 at 15:15 Comment(4)
You are truncating from the end of the file so you need to seek to the correct position in the file so something like file.seek(0); file.truncate()Dingo
@Dingo I want to truncate the file after "CC", after file.seek(0) the whole information would be gone!Hippocampus
That's why I said something like. You'll need to find the point in the file right after CC and seek to thereDingo
@Dingo Maybe my expectations are wrong, but I expect to move the current position with every read of a line and truncate the file after having read "CC" (because then the current position becomes the "right" position)Hippocampus
H
3

In addition to glibdud's answer, truncate() needs the size from where it deletes the content. You can get the current position in your file by the tell() command. As he mentioned, by using the for-loop, the next() prohibits commands like tell. But in the suggested while-loop, you can truncate at the current tell()-position. So the complete code would look like this:

Python 3:

with open("test.txt", 'r+') as file:
line = file.readline()
while line:
    print(line.strip())
    if line.strip() == "CC":
        print("truncating")
        file.truncate(file.tell())
        break
    line = file.readline()
Hydrogenolysis answered 18/1, 2016 at 15:55 Comment(6)
truncate defaults to the current position, so the tell shouldn't be necessary.Jacelynjacenta
I thought so, but in my script, it didn't work and deleted nothing. Giving it any position, it truncated right after that.Hydrogenolysis
That's odd, it worked fine without the tell in my testing.Jacelynjacenta
I'm sorry, I'm in Python 3. In Python 2 it worked without tell as expected. Seems to make a difference.Hydrogenolysis
Yeah, I can reproduce that. Not sure what to make of it... seems like it must be a bug, given that the Python 3 documentation still indicates that it should default to the current position.Jacelynjacenta
Thank you for pointing this problem out. I can reproduce it as well. Firstly I thought that maybe python3.4 doing an internal buffering and just gets confused somehow, but my test with python3 showed nothing of the kind: for line in file is twice as fast as while loop for my test cases. The same holds for python2: the is a factor 2 between two method, but the slower python2 method is as fast as the faster python3 method...Hippocampus
J
5

It looks like you're falling victim to a read-ahead buffer used internally by Python. From the documentation for the file.next() method:

A file object is its own iterator, for example iter(f) returns f (unless f is closed). When a file is used as an iterator, typically in a for loop (for example, for line in f: print line.strip()), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit when the file is open for reading (behavior is undefined when the file is open for writing). In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right. However, using seek() to reposition the file to an absolute position will flush the read-ahead buffer.

The upshot is that the file's position is not where you would expect it to be when you truncate. One way around this is to use readline to loop over the file, rather than the iterator:

line = file.readline()
while line:
    ...
    line = file.readline()
Jacelynjacenta answered 18/1, 2016 at 15:43 Comment(0)
H
3

In addition to glibdud's answer, truncate() needs the size from where it deletes the content. You can get the current position in your file by the tell() command. As he mentioned, by using the for-loop, the next() prohibits commands like tell. But in the suggested while-loop, you can truncate at the current tell()-position. So the complete code would look like this:

Python 3:

with open("test.txt", 'r+') as file:
line = file.readline()
while line:
    print(line.strip())
    if line.strip() == "CC":
        print("truncating")
        file.truncate(file.tell())
        break
    line = file.readline()
Hydrogenolysis answered 18/1, 2016 at 15:55 Comment(6)
truncate defaults to the current position, so the tell shouldn't be necessary.Jacelynjacenta
I thought so, but in my script, it didn't work and deleted nothing. Giving it any position, it truncated right after that.Hydrogenolysis
That's odd, it worked fine without the tell in my testing.Jacelynjacenta
I'm sorry, I'm in Python 3. In Python 2 it worked without tell as expected. Seems to make a difference.Hydrogenolysis
Yeah, I can reproduce that. Not sure what to make of it... seems like it must be a bug, given that the Python 3 documentation still indicates that it should default to the current position.Jacelynjacenta
Thank you for pointing this problem out. I can reproduce it as well. Firstly I thought that maybe python3.4 doing an internal buffering and just gets confused somehow, but my test with python3 showed nothing of the kind: for line in file is twice as fast as while loop for my test cases. The same holds for python2: the is a factor 2 between two method, but the slower python2 method is as fast as the faster python3 method...Hippocampus

© 2022 - 2024 — McMap. All rights reserved.