How often does python flush to a file?
Asked Answered
C

5

293
  1. How often does Python flush to a file?
  2. How often does Python flush to stdout?

I'm unsure about (1).

As for (2), I believe Python flushes to stdout after every new line. But, if you overload stdout to be to a file, does it flush as often?

Christianity answered 2/7, 2010 at 16:30 Comment(0)
H
412

For file operations, Python uses the operating system's default buffering unless you configure it do otherwise. You can specify a buffer size, unbuffered, or line buffered.

For example, the open function takes a buffer size argument.

http://docs.python.org/library/functions.html#open

"The optional buffering argument specifies the file’s desired buffer size:"

  • 0 means unbuffered,
  • 1 means line buffered,
  • any other positive value means use a buffer of (approximately) that size.
  • A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files.
  • If omitted, the system default is used.

code:

bufsize = 0
f = open('file.txt', 'w', buffering=bufsize)
Henceforth answered 2/7, 2010 at 19:2 Comment(7)
+1 for the "line buffered" part. That's exactly what I was looking for and it works like a charm.Mesenchyme
Using Python 3.4.3 when I do open('file.txt', 'w', 1) I get proper line buffering. But if I do anything larger (I wanted open('file.txt', 'w', 512)) it buffers the full io.DEFAULT_BUFFER_SIZE of 8192. Is that a Python bug, a Linux bug, or an ID10t bug?Tusk
Is it possible to change the buffering for the already opened streams? Say, I want stdout to be line-buffered regardless of whether it is a console or redirected to a file?Parallel
what I am confused is what the term flushing even means. Why do we need it? What is it for? why should I care about it?Unexampled
@CharlieParker when you call write() on a file handle, the output is buffered in memory and accumulated until the buffer is full... at which time the buffer gets "flushed" (content is written from the buffer to the file). You can explicitly flush the buffer by calling the flush() method on a file handle.Henceforth
Note that unbuffered (0) is only available in binary mode and line buffered (1) is only available in text mode.Latticework
It doesn't answer the question about flush. Also "operating system's default buffering" is misleading. This has nothing to do with OS internal buffering. This is solely about the buffering done on userland, in this case by Python standard library. Old doc quoted "system default" (now outdated) probably mean python implementation's default. Docs has been improved since them to clarify how buffer size is determined based on OS block size when available with fallback to implementation default.Adamandeve
A
206

You can also force flush the buffer to a file programmatically with the flush() method.

with open('out.log', 'w+') as f:
    f.write('output is ')
    # some work
    s = 'OK.'
    f.write(s)
    f.write('\n')
    f.flush()
    # some other work
    f.write('done\n')
    f.flush()

I have found this useful when tailing an output file with tail -f.

Animal answered 10/3, 2011 at 5:13 Comment(7)
From the docs: Note: flush() does not necessarily write the file’s data to disk. Use flush() followed by os.fsync() to ensure this behavior.Augie
@Augie next time link to said docs. Only reference I can find is from github.com/jprzywoski/python-reference/blob/master/source/docs/… and I don't know who that is.Tusk
@Bruno Bronosky Good point. Docs: Note: flush() does not necessarily write the file’s data to disk. Use flush() followed by os.fsync() to ensure this behavior.Augie
what I am confused is what the term flushing even means. Why do we need it? What is it for? why should I care about it?Unexampled
@CharlieParker when you write, you write to a copy of (part of) the file in RAM, which might not be saved to disk for a while. It improves performance, but can mean data loss if that copy never gets written (disk removed, OS crashes, etc). flush() tells Python to immediately write that buffer back to disk. (Then, os.fsync() tells the OS to also do it. There are many layers of buffers...)Retirement
@Augie This might have been fixed in Python 3. I don't see any such note in the new documentation.Clothier
@Clothier There's nothing to fix - flush flushes the user-space buffers and os.fsync returns only when the OS is told that the file is persisted (it still may not be physically written depending on the filesystem - eg. nfs, or the storage-hardware's own caches etc - but as long as they say they will persist it, it should be considered fine because the manufacturer is taking the responsibility). As for the documentation, check the os.fsync docs in python: docs.python.org/3/library/os.html#os.fsyncDelogu
A
16

You can also check the default buffer size by calling the read only DEFAULT_BUFFER_SIZE attribute from io module.

import io
print (io.DEFAULT_BUFFER_SIZE)
Audre answered 20/5, 2016 at 15:28 Comment(1)
Thanks! It's good to know that python sets it as OS defines... but this helps find out what the OS pre-defines.Seventh
S
15

I don't know if this applies to python as well, but I think it depends on the operating system that you are running.

On Linux for example, output to terminal flushes the buffer on a newline, whereas for output to files it only flushes when the buffer is full (by default). This is because it is more efficient to flush the buffer fewer times, and the user is less likely to notice if the output is not flushed on a newline in a file.

You might be able to auto-flush the output if that is what you need.

EDIT: I think you would auto-flush in python this way (based from here)

#0 means there is no buffer, so all output
#will be auto-flushed
fsock = open('out.log', 'w', 0)
sys.stdout = fsock
#do whatever
fsock.close()
Selfheal answered 2/7, 2010 at 16:37 Comment(0)
G
2

Here is another approach, up to the OP to choose which one he prefers.

When including the code below in the __init__.py file before any other code, messages printed with print and any errors will no longer be logged to Ableton's Log.txt but to separate files on your disk:

import sys

path = "/Users/#username#"

errorLog = open(path + "/stderr.txt", "w", 1)
errorLog.write("---Starting Error Log---\n")
sys.stderr = errorLog
stdoutLog = open(path + "/stdout.txt", "w", 1)
stdoutLog.write("---Starting Standard Out Log---\n")
sys.stdout = stdoutLog

(for Mac, change #username# to the name of your user folder. On Windows the path to your user folder will have a different format)

When you open the files in a text editor that refreshes its content when the file on disk is changed (example for Mac: TextEdit does not but TextWrangler does), you will see the logs being updated in real-time.

Credits: this code was copied mostly from the liveAPI control surface scripts by Nathan Ramella

Grandeur answered 29/7, 2016 at 9:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.