Python Write Replaces "\n" With "\r\n" in Windows
Asked Answered
F

3

16

After looking into my question here, I found that it was caused by a simpler problem.

When I write "\n" to a file, I expect to read in "\n" from the file. This is not always the case in Windows.

In [1]: with open("out", "w") as file:
   ...:     file.write("\n")
   ...:

In [2]: with open("out", "r") as file:
   ...:     s = file.read()
   ...:

In [3]: s  # I expect "\n" and I get it
Out[3]: '\n'

In [4]: with open("out", "rb") as file:
   ...:     b = file.read()
   ...:

In [5]: b  # I expect b"\n"... Uh-oh
Out[5]: b'\r\n'

In [6]: with open("out", "wb") as file:
   ...:     file.write(b"\n")
   ...:

In [7]: with open("out", "r") as file:
   ...:     s = file.read()
   ...:

In [8]: s  # I expect "\n" and I get it
Out[8]: '\n'

In [9]: with open("out", "rb") as file:
   ...:     b = file.read()
   ...:

In [10]: b  # I expect b"\n" and I get it
Out[10]: b'\n'

In a more organized way:

| Method of Writing | Method of Reading | "\n" Turns Into |
|-------------------|-------------------|-----------------|
| "w"               | "r"               | "\n"            |
| "w"               | "rb"              | b"\r\n"         |
| "wb"              | "r"               | "\n"            |
| "wb"              | "rb"              | b"\n"           |

When I try this on my Linux virtual machine, it always returns \n. How can I do this in Windows?

Edit: This is especially problematic with the pandas library, which appears to write DataFrames to csv with "w" and read csvs with "rb". See the question linked at the top for an example of this.

Forster answered 20/11, 2017 at 3:24 Comment(2)
In text mode, Python replaces all line endings with the system default. Use binary mode and encode the strings yourself to use custom line endings.Grillparzer
Alternatively, specify the line endings when you open the file. This is probably a much cleaner approach.Grillparzer
G
14

Since you are using Python 3, you're in luck. When you open the file for writing, just specify newline='\n' to ensure that it writes '\n' instead of the system default, which is \r\n on Windows. From the docs:

When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.

The reason that you think that you are "sometimes" seeing the two-character output is that when you open the file in binary mode, no conversion is done at all. Byte arrays are just displayed in ASCII whenever possible for your convenience. Don't think of them as real strings until they have been decoded. The binary output you show is the true contents of the file in all your examples.

When you open the file for reading in the default text mode, the newline parameter will work similarly to how it does for writing. By default all \r\n in the file will be converted to just \n after the characters are decoded. This is very nice when your code travels between OSes but your files do not since you can use the exact same code that relies only on \n. If your files travel too, you should stick to the relatively portable newline='\n' for at least the output.

Grillparzer answered 20/11, 2017 at 3:36 Comment(0)
C
4

From the documentation:

newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:

[...]

  • When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.
open(..., 'w', newline='')
Consul answered 20/11, 2017 at 3:29 Comment(0)
G
0

The encoding of files are usually system dependent. As above answers have mentioned, we can hardcode newline option to be '\n' if it works for us. But this method won't work when you are fetching files or data from cloud as they often have restricted access and parsed to lightweight portable file format. So the best way to remove the default binary or any other encoding is to use a decode() method with file.read() output for any encode data. For Example, In your case

In [1]: with open("out", "w") as file:
   ...:     file.write("\n")

In [q]: with open("out", "file permission") as file:
   ...:     s = file.read().decode()

#--------------------------- OR --------------------------c

In [q`]: with open(..., newline='delimiter of your choice') as file:
   ...:     s = file.read()  



Gaily answered 25/10, 2021 at 8:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.