Python in-place write to file at arbitrary position
Asked Answered
P

2

0

I'm trying to edit a text file in-place in python. It is very large (so loading it into memory is not an option). I intend to replace byte-for-byte strings I find inside.

with f as open("filename.txt", "r+b"):
    if f.read(8) == "01234567":
        f.seek(-8, 1)
        f.write("87654321")

However, the write() operation adds onto the end of the file when I tried it:

>>> n.read()
'sdf'
>>> n.read(1)
''
>>> n.seek(0,0)
>>> n.read(1)
's'
>>> n.read(1)
'd'
>>> n.write("sdf")
>>> n.read(1)
''
>>> n.seek(0,0)
>>> n.read()
'sdfsdf'
`

I want the result of that to be sdsdf.

Privett answered 1/11, 2015 at 0:8 Comment(8)
This should work with r+b mode. It may well not work with any a mode. Your code sample at the top uses r+b and stream bound to f, but your interactive example uses a stream bound to n, so I wonder if maybe n is opened differently. Or, if not, I note that your n.read(1) is not followed by a seek operation (the intermediate seek requirement is annoying, but is standard).Cohen
sorry, the n is opened with: n = open("test.text", "r+b"). Intermediate seek requirement?Privett
Yes: any time you want to switch from reading to writing, or vice versa, you must invoke seek (even just a relative seek of 0 bytes for instance). There are a few exceptions, including "write allowed without seek if read just returned EOF", but it's easier just to always-seek.Cohen
Is that documented somewhere? It works, butPrivett
The original documentation is the C standard for stdio. Not sure where (if anywhere) Python docs refer back to this, nor why it wasn't fixed in the Python wrappers. For that matter, there's no fundamental reason it can't be corrected in the C library—my original BSD stdio avoided it!Cohen
@Cohen Your original BSD stdio - your in the sense that you used it or wrote it?Huth
Wrote (most of it, all the float conversion code was other people's, for instance).Cohen
@Cohen Impressive credentials. :) Maybe it would make sense to file a Python documentation bug for this - or even an enhancement request to fix it?Huth
C
2

The original ANSI / ISO C standards required a seek operation when switching a read-write mode stream from read mode to write mode, and vice versa. This restriction persists, e.g., n1570 includes this text:

When a file is opened with update mode ('+' as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file. Opening (or creating) a text file with update mode may instead open (or create) a binary stream in some implementations.

For whatever reason this restriction has been imported into Python,1 even though it would be possible for the Python wrappers to handle it automatically.

For what it's worth, the reason for the original ANSI C restriction was the low-budget implementation found on many Unix-based systems: they kept, for each stream, a "current byte count" and "current pointer". The current byte count was 0 if the macro-ized getc and putc operations had to call into underlying implementation, which could check whether a stream was opened in update mode and switch it as needed. But once you successfully obtained a character, the counter would hold the number of characters that could continue to be read from the underlying stream; and once you successfully wrote a character, the counter would hold the number of buffer-locations that allowed adding characters.

This meant that if you did a successful getc that filled an internal buffer, but followed it by a putc, the "written" character from putc would simply overwrite the buffered data. If you had a successful putc but followed with a poorly-implemented getc, you would see un-set value out of the buffer.

This problem was trivial to fix (just provide separate input and output counters, one of which is always zero, and have the functions that implement buffer-refill check for mode-switch as well).


1Citation needed :-)

Cohen answered 1/11, 2015 at 0:41 Comment(0)
S
0

You can check the difference of following codes:

>>> f = open("file.txt", "r+b")
>>> f.seek(2)
>>> f.write("sdf")
>>> f.seek(0)
>>> f.read()
'sdsdf'


>>> f = open("file.txt", "r+b")
>>> f.read(1)
's'
>>> f.read(1)
'd'
>>> f.write("sdf")
>>> f.seek(0)
>>> f.read()
'sdfsdf'

The pointer of .write is originally at the end of the file. Only .seek() will change its position, but not .read(). So you have to call .seek() before writing the bytes. The following code works well:

>>> f = open("file.txt", "r+b")
>>> f.read(1)
's'
>>> f.read(1)
'd'
>>> f.seek(2)
>>> f.write("sdf")
>>> f.seek(0)
>>> f.read()
'sdsdf'
Stank answered 1/11, 2015 at 1:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.