Python how convert single quotes to double quotes to format as json string [duplicate]
Asked Answered
A

5

11

I have a file where on each line I have text like this (representing cast of a film):

[{'cast_id': 23, 'character': "Roger 'Verbal' Kint", 'credit_id': '52fe4260c3a36847f8019af7', 'gender': 2, 'id': 1979, 'name': 'Kevin Spacey', 'order': 5, 'profile_path': '/x7wF050iuCASefLLG75s2uDPFUu.jpg'}, {'cast_id': 27, 'character': 'Edie's Finneran', 'credit_id': '52fe4260c3a36847f8019b07', 'gender': 1, 'id': 2179, 'name': 'Suzy Amis', 'order': 6, 'profile_path': '/b1pjkncyLuBtMUmqD1MztD2SG80.jpg'}]

I need to convert it in a valid json string, thus converting only the necessary single quotes to double quotes (e.g. the single quotes around word Verbal must not be converted, eventual apostrophes in the text also should not be converted).

I am using python 3.x. I need to find a regular expression which will convert only the right single quotes to double quotes, thus the whole text resulting in a valid json string. Any idea?

Ancestry answered 5/12, 2017 at 17:55 Comment(10)
What produced the file? The right thing to do is parse it as a list of dictionaries, then encode it with json.dump. A regular expression is right out; this is not a regular language.Donaldson
import json;json.dumps(your_dict)Pignut
@AmitTripathi It's not a dict yet; it's a string in a file.Donaldson
the string as shown above has a syntax error in the first place.Marlow
You have a serious problem with that input: the value Edie's Finneran is enclosed in single quotes; no parser is going to be able to tell that the apostrophe is not a closing quote. You going to have to fix whatever is producing that file, in which case you may as well have it output JSON in the first place.Donaldson
@Donaldson yeah right. Json dumps cant be used here.Pignut
you still haven't anwered the question: where does this string come from? why is it not already json compatible? how much of it is there?Marlow
When you go to the doctor's do you want them to prescribe you medication to help your symptoms (but mask the overall problem) or do you want them to prescribe medication that will fix whatever is causing the symptoms in the first place? i.e. Do you want the doctor to fix your cough or do you want them to cure you of your cold?Autopsy
@hop Here it is just a line. The whole file is about 11000 rowsAncestry
and you still haven't answered most of the questions…Marlow
M
14

First of all, the line you gave as example is not parsable! … 'Edie's Finneran' … contains a syntax error, not matter what.

Assuming that you have control over the input, you could simply use eval() to read in the file. (Although, in that case one would wonder why you can't produce valid JSON in the first place…)

>>> f = open('list.txt', 'r')
>>> s = f.read().strip()
>>> l = eval(s)

>>> import pprint
>>> pprint.pprint(l)
[{'cast_id': 23,
  'character': "Roger 'Verbal' Kint",
  ...
  'profile_path': '/b1pjkncyLuBtMUmqD1MztD2SG80.jpg'}]

>>> import json
>>> json.dumps(l)
'[{"cast_id": 23, "character": "Roger \'Verbal\' Kint", "credit_id": "52fe4260ca36847f8019af7", "gender": 2, "id": 1979, "name": "Kevin Spacey", "order": 5, "rofile_path": "/x7wF050iuCASefLLG75s2uDPFUu.jpg"}, {"cast_id": 27, "character":"Edie\'s Finneran", "credit_id": "52fe4260c3a36847f8019b07", "gender": 1, "id":2179, "name": "Suzy Amis", "order": 6, "profile_path": "/b1pjkncyLuBtMUmqD1MztDSG80.jpg"}]'

If you don't have control over the input, this is very dangerous, as it opens you up to code injection attacks.

I cannot emphasize enough that the best solution would be to produce valid JSON in the first place.

Marlow answered 5/12, 2017 at 21:43 Comment(0)
D
3

If you do not have control over the JSON data, do not eval() it!

I created a simple JSON correction mechanism, as that is more secure:

def correctSingleQuoteJSON(s):
    rstr = ""
    escaped = False

    for c in s:
    
        if c == "'" and not escaped:
            c = '"' # replace single with double quote
        
        elif c == "'" and escaped:
            rstr = rstr[:-1] # remove escape character before single quotes
        
        elif c == '"':
            c = '\\' + c # escape existing double quotes
   
        escaped = (c == "\\") # check for an escape character
        rstr += c # append the correct json
    
    return rstr

You can use the function in the following way:

import json

singleQuoteJson = "[{'cast_id': 23, 'character': 'Roger \\'Verbal\\' Kint', 'credit_id': '52fe4260c3a36847f8019af7', 'gender': 2, 'id': 1979, 'name': 'Kevin Spacey', 'order': 5, 'profile_path': '/x7wF050iuCASefLLG75s2uDPFUu.jpg'}, {'cast_id': 27, 'character': 'Edie\\'s Finneran', 'credit_id': '52fe4260c3a36847f8019b07', 'gender': 1, 'id': 2179, 'name': 'Suzy Amis', 'order': 6, 'profile_path': '/b1pjkncyLuBtMUmqD1MztD2SG80.jpg'}]"

correctJson = correctSingleQuoteJSON(singleQuoteJson)
print(json.loads(correctJson))
Desiraedesire answered 10/7, 2021 at 5:56 Comment(0)
A
1

Here is the code to get desired output

import ast
def getJson(filepath):
    fr = open(filepath, 'r')
    lines = []
    for line in fr.readlines():
        line_split = line.split(",")
        set_line_split = []
        for i in line_split:
            i_split = i.split(":")
            i_set_split = []
            for split_i in i_split:
                set_split_i = ""
                rev = ""
                i = 0
                for ch in split_i:
                    if ch in ['\"','\'']:
                        set_split_i += ch
                        i += 1
                        break
                    else:
                        set_split_i += ch
                        i += 1
                i_rev = (split_i[i:])[::-1]
                state = False
                for ch in i_rev:
                    if ch in ['\"','\''] and state == False:
                        rev += ch
                        state = True
                    elif ch in ['\"','\''] and state == True:
                        rev += ch+"\\"
                    else:
                        rev += ch
                i_rev = rev[::-1]
                set_split_i += i_rev
                i_set_split.append(set_split_i)
            set_line_split.append(":".join(i_set_split))
        line_modified = ",".join(set_line_split)
        lines.append(ast.literal_eval(str(line_modified)))
    return lines
lines = getJson('test.txt')
for i in lines:
    print(i)
Awning answered 5/12, 2017 at 18:48 Comment(1)
my lord. You can instead use '"'.join(str.split("'")) or "\"".join(str.split("'")) for readabilityEffortful
T
1

Apart from eval() (mentioned in user3850's answer), you can use ast.literal_eval

This has been discussed in the thread: Using python's eval() vs. ast.literal_eval()?

You can also look at the following discussion threads from Kaggle competition which has data similar to the one mentioned by OP:

https://www.kaggle.com/c/tmdb-box-office-prediction/discussion/89313#latest-517927 https://www.kaggle.com/c/tmdb-box-office-prediction/discussion/80045#latest-518338

Thirtytwo answered 18/4, 2019 at 5:8 Comment(0)
C
0
import ast
json_dat = json.dumps(ast.literal_eval(row['prod_cat']))
dict_dat = json.loads(json_dat)
Contort answered 24/3, 2023 at 3:21 Comment(1)
Please add some explanation to your code rather than posting only code.Abacist

© 2022 - 2024 — McMap. All rights reserved.