Prevent Python print()'s automatic newline conversion to CRLF on Windows [duplicate]
Asked Answered
Q

1

8

I'd like to pipe text with unix-like EOL (LF) from Python via Windows CMD (console). However, Python seems to automatically convert single newlines into Windows-style end-of-line (EOL) characters (i.e. \r\n, <CR><LF>, 0D 0A, 13 10):

#!python3
#coding=utf-8
import sys
print(sys.version)
print("one\ntwo")
# run as py t.py > t.txt

results in

3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)]
one
two

or in hexadecimal ... 6F 6E 65 0D 0A 74 77 6F 0D 0A

The second EOL is because print() defaults to end='\n', but it also does the conversion.

There is no newline argument or property for print like there is for open(), so how can this be controlled?

Quip answered 7/4, 2018 at 16:8 Comment(4)
See Print LF with Python 3 to Windows stdout.Deception
@StevenRumbalski Damn, thanks.Quip
No problem. I advise not deleting the question. Your question will help people find the other one.Deception
@StevenRumbalski Wouldn't dream of it, this took forever to compile - only "closing". The previous answer is much more concise and provides background information.Quip
Q
5

See this answer: https://mcmap.net/q/47429/-print-lf-with-python-3-to-windows-stdout


print() usually writes to sys.stdout. The following are excerpts of the documentation, for non-interactive mode:

  • stdout is used for the output of print()

  • sys.stdout: File object used by the interpreter for standard ... output

  • These streams are regular text files like those returned by the open() function.

  • character encoding on Windows is ANSI

  • standard streams are ... block-buffered like regular text files.

  • Note
    To write or read binary data from/to the standard streams, use the underlying binary buffer object. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc').

Let's try this direct approach first:

import sys
print("one\ntwo")
sys.stdout.write('three\nfour')
sys.stdout.buffer.write(b'five\nsix')

results in

five\n
sixone\r\n
two\r\n
three\r\n
four

The buffer write seems to work as desired, although it's "messing" with the output order.

Flushing before writing to the buffer directly helps:

import sys
print("one\ntwo")
sys.stdout.write('three\nfour')
sys.stdout.flush()
sys.stdout.buffer.write(b'five\nsix')

results in

one\r\n
two\r\n
three\r\n
fourfive\n
six

But still it's not "fixing" print(). Back to the file objects / streams / text files (short info on IO objects in Python Data model):

https://docs.python.org/3/glossary.html#term-text-file

A file object able to read and write str objects. Often, a text file actually accesses a byte-oriented datastream and handles the text encoding automatically. Examples of text files are files opened in text mode ('r' or 'w'), sys.stdin, sys.stdout, and instances of io.StringIO.

So (how) can the sys.stdout file be reconfigured or reopened to control the newline behaviour? And what exactly is it?

>>> import sys
>>> type(sys.stdout)
<class '_io.TextIOWrapper'>

Docs: class io.TextIOWrapper(buffer, encoding=None, errors=None, newline=None, line_buffering=False, write_through=False):

newline controls how line endings are handled. It can be None, '', '\n', '\r', and '\r\n'.
It works as follows:
When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller.
If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated.
If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep.
If newline is '' or '\n', no translation takes place.
If newline is any of the other legal values, any '\n' characters written are translated to the given string.

Let's see:

>>> sys.stdout.newline = "\n"
>>>

OK, and what about

import sys
sys.stdout.newline = '\n'
print("one\ntwo")

Does not work:

one\r\n
two\r\n

because the property does not exist:

>>> sys.stdout.newline
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: '_io.TextIOWrapper' object has no attribute 'newline'

Which I should have checked earlier ..

>>> vars(sys.stdout)
{'mode': 'w'}

So really, there's no newline attribute for us to redefine.

Any useful methods?

>>> dir(sys.stdout)
['_CHUNK_SIZE', '__class__', '__del__', '__delattr__', '__dict__', 
'__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', 
'__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', 
'__init__', '__init_subclass__', '__iter__', '__le__', '__lt__',
'__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', 
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 
'_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', 
'_finalizing', 'buffer', 'close', 'closed', 'detach', 'encoding', 
'errors', 'fileno', 'flush', 'isatty', 'line_buffering', 'mode', 
'name', 'newlines', 'read', 'readable', 'readline', 'readlines',
'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 
'writelines']

Not really.

But we can at least replace the default interface to the buffer end specify the required newline character(s):

import sys, io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, newline='\n' )
print("one\ntwo")

which finally results in

one\n
two\n

To restore, just reassign sys.stdout to a copy you made. Or, apparently not recommended, use the internally kept sys.__stdout__ to do that.

Warning: See eryksun's comment below, this requires some care. Use his solution instead (link below):


It seems it might also be possible to reopen the file, see Wrap an open stream with io.TextIOWrapper for inspiration, and this answer https://mcmap.net/q/47429/-print-lf-with-python-3-to-windows-stdout for the implementation.


If you want to take a closer look, check out the Python (CPython) sources: https://github.com/python/cpython/blob/master/Modules/_io/textio.c


There's also os.linesep, let's see if it's really "\r\n" for Windows:

>>> import os
>>> os.linesep
'\r\n'
>>> ",".join([f'0x{ord(c):X}' for c in os.linesep])
'0xD,0xA'

Could this be redefined?

#!python3
#coding=utf-8
import sys, os
saved = os.linesep
os.linesep = '\n'
print(os.linesep)
print("one\ntwo")
os.linesep = saved

It can in the interactive mode, but apparently not otherwise:

\r\n
\r\n
one\r\n
two\r\n

Quip answered 7/4, 2018 at 16:8 Comment(2)
Take care with restoring sys.stdout. By default, deallocating the new TextIOWrapper will close the buffer and thus usually the stdout file descriptor. Define it as new_stdout = io.TextIOWrapper(sys.stdout.buffer, newline='\n', encoding=sys.stdout.encoding, errors=sys.stdout.errors); old_stdout, sys.stdout = sys.stdout, new_stdout. Then to reassign use sys.stdout = old_stdout, and before deallocating the new one, call new_stdout.detach(), so it won't close the underlying buffer. Or, if you have an fd, you can open it, either with closefd=False or a dup of the fd.Bennett
@eryksun Thanks for the warning, I'll add a link to your comment and will probably just use your open solution instead.Quip

© 2022 - 2024 — McMap. All rights reserved.