Python write valid json with newlines to file
Asked Answered
I

3

10

Valid json expects escaped newline characters to be encoded as '\\n', with two backslashes. I have data that contains newline characters that I want to save to a file. Here's a simplified version:

data = {'mystring': 'Line 1\nLine 2'}

I can encode it with json.dumps():

import json
json_data = json.dumps(data)
json_data
# -> '{"mystring": "Line 1\\nLine 2"}'

When I print it, the newline displays as '\n', not '\\n' (which I find odd but I can live with):

print(json_data)
# -> {"mystring": "Line 1\nLine 2"}

However (here's the problem) when I output it to a file, the content of the file no longer contains valid json:

f = open('mydata.json', 'w')
f.write(json_data)
f.close()

If I open the file and read it, it contains this:

{"mystring": "Line 1\nLine 2"}

but I was hoping for this:

{"mystring": "Line 1\\nLine 2"}

Oddly (I think), if I read the file using python's open(), the json data is considered valid:

f = open('mydata.json', 'r')
json_data = f.read()
f.close()
json_data
# -> '{"mystring": "Line 1\\nLine 2"}'

... and it decodes OK:

json.loads(json_data)
# -> {u'mystring': u'Line 1\nLine 2'}

My question is why is the data in the file not valid json? If I need another - non Python - application to read it it would probably be incorrect. If I copy and paste the file contents and use json.loads() on it it fails:

import json
json.loads('{"mystring": "Line 1\nLine 2"}')
# -> ValueError: Invalid control character at: line 1 column 21 (char 20)

Can anybody explain if this is the expected behaviour or am I doing something wrong?

Interchangeable answered 3/7, 2015 at 9:27 Comment(1)
Just to explain: the newline displays as '\n', not '\\n' (which I find odd but I can live with). That's because \\ is the escape character for printing \ itself. I'm not certain this is your problem but I suspect that in order to actually write two backslashes, you need to give python \\\\nNosewheel
C
7

You ran into the pitfall of neglecting the fact that the \ character is also an escape sequence character in Python. Try printing out the last example instead of calling json.loads:

>>> print('{"mystring": "Line 1\nLine 2"}')
{"mystring": "Line 1
Line 2"}

No way the above is valid JSON. What if the \ character is correctly encoded?

>>> print('{"mystring": "Line 1\\nLine 2"}')
{"mystring": "Line 1\nLine 2"}

Much better, you can then:

>>> json.loads('{"mystring": "Line 1\\nLine 2"}')
{'mystring': 'Line 1\nLine 2'}

Alternatively, if you really appreciate being able to copy some text from some other buffer and paste it into your live interpreter to do decode, you may consider using the raw modifier for your string:

>>> print(r'{"mystring": "Line 1\nLine 2"}')
{"mystring": "Line 1\nLine 2"}
>>> json.loads(r'{"mystring": "Line 1\nLine 2"}')
{'mystring': 'Line 1\nLine 2'}

See that the \ is no longer automatically escaping with the newline.

Also see: How do I handle newlines in JSON? and note how this is not a problem that exists strictly within Python.

Citrange answered 3/7, 2015 at 9:36 Comment(2)
The r'' raw string syntax will be really helpful. Thanks so much.Interchangeable
Out of interest, .encode('string-escape') also works, e.g. json.loads('{"mystring": "Line 1\nLine2"}'.encode('string-escape'))Interchangeable
S
1

The reason for this:

print(json_data)
# -> {"mystring": "Line 1\nLine 2"}

Is that \\ is a valid escape sequence that ends up as a single backslash \ when trying to print it.

The data in the json file is valid, as the parser is able to parse it :)

The confusion stems from the fact that when you try to print a string with escape sequences those get interpreted. And the sequence \\n is interpreted as \n

Spiroid answered 3/7, 2015 at 9:38 Comment(0)
P
0

This is not an answer to the OP's question but to my question which led me here:

How do you load (arguably invalid) JSON with newlines within strings?

Use the strict=False option, available in json.load(), json.loads() or JSONDecoder().

For example:

json.loads('{"mystring": "Line 1\nLine 2"}', strict=False)
# -> {'mystring': 'Line 1\nLine 2'}

Here is the documentation for JSONDecoder:

If strict is false (True is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0-31 range, including '\t' (tab), '\n', '\r' and '\0'.

Pyramidon answered 16/8 at 18:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.