Print LF with Python 3 to Windows stdout
Asked Answered
F

2

15

How to get \n printed to stdout on Windows? This code works in Python 2, but not with Python 3:

# set sys.stdout to binary mode on Windows
import sys, os, msvcrt
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

# the length of testfile created with
#     python test_py3k_lf_print.py > testfile
# below should be exactly 4 symbols (23 0A 23 0A)
print("#\n#")
Filomena answered 23/1, 2016 at 7:43 Comment(0)
S
15

Python 3 already configures standard I/O in binary mode, but it has its own I/O implementation that does newline translation. Instead of using print, which requires a text-mode file, you could manually call sys.stdout.buffer.write to use the binary-mode BufferedWriter. If you need to use print, then you'll need a new text I/O wrapper that doesn't use universal newlines. For example:

stdout = open(sys.__stdout__.fileno(), 
              mode=sys.__stdout__.mode, 
              buffering=1, 
              encoding=sys.__stdout__.encoding, 
              errors=sys.__stdout__.errors, 
              newline='\n', 
              closefd=False)

Since closefd is false, closing this file won't close the original sys.stdout file descriptor. You can use this file explicitly via print("#\n#", file=stdout), or replace sys.stdout = stdout. The original is available as sys.__stdout__.

Background

Python 3's io module was designed to provide a cross-platform and cross-implementation (CPython, PyPy, IronPython, Jython) specification for all filelike objects in terms of the abstract base classes RawIOBase, BufferedIOBase, and TextIOBase. It includes a reference pure Python implementation in the _pyio module. The common denominator for the raw io.FileIO implementation is the set of low-level POSIX system calls such as read and write, which eliminates the problem of CRT stdio inconsistencies. On Windows, the POSIX layer is just the low I/O layer of the CRT, but at least that's limited to the quirks of a single platform.

One of the Windows quirks is having non-standard text and binary modes in its POSIX I/O layer. Python addresses this by always using binary mode and calling setmode on the stdio file descriptors 1.

Python can avoid using the Windows CRT for I/O by implementing a WinFileIO registered subclass of RawIOBase. There's a proposed patch for this in issue 12939. Another example is the win_unicode_console module, which implements WindowsConsoleRawReader and WindowsConsoleRawWriter classes.


1. This has caused problems for programs that embed Python and expect stdio to use the default text mode. For example, in binary mode printing wide-character strings no longer casts to char as it does in ANSI text mode, and it certainly doesn't print using WriteConsoleW as it would in UTF-16 text mode. For example:

Python 2.7.10 (default, May 23 2015, 09:44:00) 
[MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, os, msvcrt, ctypes 
>>> ctypes.cdll.msvcr90.wprintf(b'w\x00i\x00d\x00e\x00\n\x00') 
wide
5
>>> msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY) 
16384
>>> ctypes.cdll.msvcr90.wprintf(b'w\x00i\x00d\x00e\x00\n\x00')
w i d e
 5

Swordtail answered 25/1, 2016 at 16:24 Comment(12)
grief for programs that embed Python and expect stdio to use the default text mode - who is expecting that and why? For me it is a major headache on Windows, because it corrupts redirected binary streams.Filomena
People printing wchar_t strings to the console expect text mode since the CRT at least casts them to char (good enough for ASCII), whereas in binary mode it just writes the raw wide characters, which p r i n t s l i k e t h i s.Swordtail
I added a ctypes example that demonstrates the problem with calling the CRT's wprintf function in binary mode.Swordtail
OMG. A whole can of worms. =)Filomena
I'd like to see something like Drekin's win_unicode_console implemented in C and incorporated into Python 3.6 or at least 3.7. This would be used instead of io.FileIO, but only for console I/O. Standard I/O for a pipe, disk file, or non-console character device such as \\.\NUL would still use io.FileIO. Also, I'd like to see Python 4 divorce itself completely from the CRT I/O by integrating the patch from issue 12939. It's too limiting to force Windows into a POSIX box. They have different strengths and weaknesses.Swordtail
I wish Python 4 was modular with ability to replace modules like console access with your own. For that it needs a user level API to system functions that is not based strictly on POSIX layer. But that needs engineering that it hard to do in distributed fashion unless everybody has a very good visualization skills and/or a lot of time.Filomena
The automatic conversion of newlines can be a difficult beast to track down. I knew exactly what was going when my prints seemed to double newlines when the output was viewed inside of atom's process-pallette, but I didn't know how to disable universal newline conversion -- of course, my attempts at directly calling sys.stdout.write() also failed -- one step closer to the problem but still on the wrong end. Your code to redefine sys.stdout worked perfectly, thank you.Tita
Windows has its strengths but the inability to write a byte (b'\n') to a pipe can't be described with polite words. You are not safe even if your binary data has no newline in it ( b'\r\n' may appear out of thin air).Stegosaur
@J.F.Sebastian, that's due to PowerShell's object pipeline. When run by PowerShell, the two instances of python.exe don't run at the same time and stdout of the first instance is not the same pipe as stdin of the second instance. PowerShell sits between them, in both space and time and does a funky text-mode transcode, and even appends a newline. The object pipeline is a fine idea in principle, but its default behavior for piping between native processes is a disaster. Just set up the pipeline directly using Python or even (buggy and archaic, but not completely insane on this point) cmd.exe.Swordtail
@eryksun I understand: there is explicit "piped vs. no pipe" example in the link where "pipe" refers to the PowerShell pipe otherwise both cases use ordinary pipes (implicitly via subprocess.check_output()).Stegosaur
@J.F.Sebastian, then why generalize the behavior of PowerShell to all of Windows? PS doesn't implement anything like a traditional pipeline. It uses Windows pipes (from the NT NamedPipe filesystem, i.e. \Device\NamedPipe & \FileSystem\Npfs), but it sticks itself in between each channel as a man in the middle and corrupts binary data. AFAIK, the only text-mode processing that's implemented in the Windows API is that, when reading from the console, ReadFile handles Ctrl+Z at the start of a buffer as EOF (i.e. 0 bytes read). Otherwise, text mode is implemented in the CRT.Swordtail
"why generalize": the shell is how we interact with the system (the place where you type a | b) e.g., the shell command language is specified by POSIX (the keyboard is still the most efficient general-purpose interface for a power user). PowerShell is supposed to be a non-lobotomized version of the command-line. The command prompt is not the strong part in Windows.Stegosaur
V
0

Since Python 3.7, the io.TextIOWrapper class has included a .reconfigure() method. See details: https://docs.python.org/3/library/io.html#io.TextIOWrapper.reconfigure

So you can simply call sys.stdout.reconfigure(newline="\n") as a solution.

Note that sys.stdout and sys.__stdout__ are the same object, so sys.__stdout__ will also be affected.

Virgilvirgilia answered 7/2 at 16:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.