This is a development of Raman's suggestion above.
I love the JSON format, but there are two things I want to be able to do with it and cannot:
- Paste some arbitrary text into a value using a text editor
- Transparently convert between XML and JSON if the XML contains CDATA sections.
This thread is germane to both these issues.
I am proposing to overcome this in the following manner, which doesn't break the formal definition of JSON, and I wonder if I'm storing up any problems if I do this?
Define a JSON-compatible string format as follows:
"<![CDATA[ (some text, escaped according to JSON rules) ]]>"
Write an Unescape routine in my favorite programming language, which unescapes anything between <![CDATA[ and ]]>
. This will be called before offering any JSON file to my text editor.
Write the complementary routine to call after editing the file, which re-escapes anything between <![CDATA[ and ]]>
according to JSON rules.
Then in order to paste any arbitrary data into the file, all I need to do is signal the start and end of the arbitrary data within a JSON string by typing <![CDATA[ before and ]]>
after it.
This is a routine to call before and after text-editing, in Python3:
lang-python3
escape_list = {
8 : 'b',
9 : 't',
10: 'n',
12: 'f',
13: 'r',
34: '"',
} #List of ASCII character codes to escape, with their escaped equivalents
escape_char = "\\" #this must be dealt with separately
unlikely_string = "ZzFfGgQqWw"
shebang = "#!/json/unesc\n"
start_cdata = "<![CDATA["
end_cdata = "]]>"
def escapejson(json_path):
if (os.path.isfile(json_path)): #If it doesn't exist, we can't update it
with open(json_path) as json_in:
data_in = json_in.read() #use read() 'cos we're goint to treat as txt
#Set direction of escaping
if (data_in[:len(shebang)] == shebang): #data is unescaped, so re-escape
data_in = data_in[len(shebang):]
unescape = False
data_out = ""
else:
data_out = shebang
unescape = True
while (data_in != ""): #while there is still some input to deal with
x = data_in.find(start_cdata)
x1 = data_in.find(end_cdata)
if (x > -1): #something needs escaping
if (x1 <0):
print ("Unterminated CDATA section!")
exit()
elif (x1 < x): #end before next start
print ("Extra CDATA terminator!")
exit()
data_out += data_in[:x]
data_in = data_in[x:]
y = data_in.find(end_cdata) + len(end_cdata)
to_fix = data_in[:y] #this is what we're going to (un)escape
if (to_fix[len(start_cdata):].find(start_cdata) >= 0):
print ("Nested CDATA sections not supported!")
exit()
data_in = data_in[y:] #chop data to fix from front of source
if (unescape):
to_fix = to_fix.replace(escape_char + escape_char,unlikely_string)
for each_ascii in escape_list:
to_fix = to_fix.replace(escape_char + escape_list[each_ascii],chr(each_ascii))
to_fix = to_fix.replace(unlikely_string,escape_char)
else:
to_fix = to_fix.replace(escape_char,escape_char + escape_char)
for each_ascii in escape_list:
to_fix = to_fix.replace(chr(each_ascii),escape_char + escape_list[each_ascii],)
data_out += to_fix
else:
if (x1 > 0):
print ("Termination without start!")
exit()
data_out += data_in
data_in = ""
#Save all to file of same name in same location
try:
with open(json_path, 'w') as outfile:
outfile.write(data_out)
except IOError as e:
print("Writing "+ json_path + " failed "+ str(e))
else:
print("JSON file not found")
Operating on the following legal JSON data
{
"test": "<![CDATA[\n We can put all sorts of wicked things like\n \\slashes and\n \ttabs and \n \"double-quotes\"in here!]]>"
}
...will produce the following:
#!/json/unesc
{
"test": "<![CDATA[
We can put all sorts of wicked things like
\slashes and
tabs and
"double-quotes"in here!]]>"
}
In this form, you can paste in any text between the markers. Calling the rountine again will change it back to the original legal JSON.
I think this can also be made to work when converting to/from XML with CDATA regions. (I'm going to try that next!)