Generating a dynamic nested JSON object and array - python
Asked Answered
B

2

9

As the question explains the problem, I've been trying to generate nested JSON object. In this case I have for loops getting the data out of dictionary dic. Below is the code:

f = open("test_json.txt", 'w')
flag = False
temp = ""
start = "{\n\t\"filename\"" + " : \"" +initial_filename+"\",\n\t\"data\"" +" : " +" [\n"
end = "\n\t]" +"\n}"
f.write(start)
for i, (key,value) in enumerate(dic.iteritems()):
    f.write("{\n\t\"keyword\":"+"\""+str(key)+"\""+",\n")
    f.write("\"term_freq\":"+str(len(value))+",\n")
    f.write("\"lists\":[\n\t")
    for item in value:
        f.write("{\n")
        f.write("\t\t\"occurance\" :"+str(item)+"\n")
        #Check last object
        if value.index(item)+1 == len(value):
            f.write("}\n" 
            f.write("]\n")
        else:
            f.write("},") # close occurrence object
    # Check last item in dic
    if i == len(dic)-1:
        flag = True
    if(flag):
        f.write("}")
    else:
        f.write("},") #close lists object
        flag = False 

#check for flag
f.write("]") #close lists array 
f.write("}")

Expected output is:

{
"filename": "abc.pdf",
"data": [{
    "keyword": "irritation",
    "term_freq": 5,
    "lists": [{
        "occurance": 1
    }, {
        "occurance": 1
    }, {
        "occurance": 1
    }, {
        "occurance": 1
    }, {
        "occurance": 2
    }]
}, {
    "keyword": "bomber",
    "lists": [{
        "occurance": 1
    }, {
        "occurance": 1
    }, {
        "occurance": 1
    }, {
        "occurance": 1
    }, {
        "occurance": 2
    }],
    "term_freq": 5
}]
}

But currently I'm getting an output like below:

{
"filename": "abc.pdf",
"data": [{
    "keyword": "irritation",
    "term_freq": 5,
    "lists": [{
        "occurance": 1
    }, {
        "occurance": 1
    }, {
        "occurance": 1
    }, {
        "occurance": 1
    }, {
        "occurance": 2
    },]                // Here lies the problem "," before array(last element)
}, {
    "keyword": "bomber",
    "lists": [{
        "occurance": 1
    }, {
        "occurance": 1
    }, {
        "occurance": 1
    }, {
        "occurance": 1
    }, {
        "occurance": 2
    },],                  // Here lies the problem "," before array(last element)
    "term_freq": 5
}]
}

Please help, I've trying to solve it, but failed. Please don't mark it duplicate since I have already checked other answers and didn't help at all.

Edit 1: Input is basically taken from a dictionary dic whose mapping type is <String, List> for example: "irritation" => [1,3,5,7,8] where irritation is the key, and mapped to a list of page numbers. This is basically read in the outer for loop where key is the keyword and value is a list of pages of occurrence of that keyword.

Edit 2:

dic = collections.defaultdict(list) # declaring the variable dictionary
dic[key].append(value) # inserting the values - useless to tell here
for key in dic:
    # Here dic[x] represents list - each value of x
    print key,":",dic[x],"\n" #prints the data in dictionary
Bordiuk answered 13/2, 2017 at 12:32 Comment(11)
if it's a properly-formed json file you could use the json module rather than importing as text. Can you provide any detail of your input file?Counterpoison
input is a long process, i'll edit the answer with abstraction of the the inputBordiuk
Why don't you use json.dump and then update the JSON object as more data arrives?Angle
Well there is a nice simple library called json (pip install not required as well ), use import json; print json.dumps(dic) and chillRomaromagna
using that library, how does it solve the problem, could you please share a code snippet? @RomaromagnaBordiuk
I have already shared the code required in the above comment it is as simple as import json print json.dumps(dic)Romaromagna
@Romaromagna this is what i get as the output { "over-dries": [4], "Self": [2], "Cooling": [4] } which is not what I want, please understand the problem of reading dictionary and expected outputBordiuk
Please put your dic var in the question in order for us to reconstruct a perfect answer.Telly
Sure, editing the question, though half of the answer is solved, you could edit the first answer and I'll accept it.Bordiuk
Thx, I add a new part for the printing, tell me in the comment if it was somehow useful or if you want to dig more.Telly
Any type of help is respected @MaxChrétien , So, thank you for all suggestions and help.Bordiuk
C
9

What @andrea-f looks good to me, here another solution:

Feel free to pick in both :)

import json

dic = {
        "bomber": [1, 2, 3, 4, 5],
        "irritation": [1, 3, 5, 7, 8]
      }

filename = "abc.pdf"

json_dict = {}
data = []

for k, v in dic.iteritems():
  tmp_dict = {}
  tmp_dict["keyword"] = k
  tmp_dict["term_freq"] = len(v)
  tmp_dict["lists"] = [{"occurrance": i} for i in v]
  data.append(tmp_dict)

json_dict["filename"] = filename
json_dict["data"] = data

with open("abc.json", "w") as outfile:
    json.dump(json_dict, outfile, indent=4, sort_keys=True)

It's the same idea, I first create a big json_dict to be saved directly in json. I use the with statement to save the json avoiding the catch of exception

Also, you should have a look to the doc of json.dumps() if you need future improve in your json output.

EDIT

And just for fun, if you don't like tmp var, you can do all the data for loop in a one-liner :)

json_dict["data"] = [{"keyword": k, "term_freq": len(v), "lists": [{"occurrance": i} for i in v]} for k, v in dic.iteritems()]

It could gave for final solution something not totally readable like this:

import json

json_dict = {
              "filename": "abc.pdf",
              "data": [{
                        "keyword": k,
                        "term_freq": len(v),
                        "lists": [{"occurrance": i} for i in v]
                       } for k, v in dic.iteritems()]
            }

with open("abc.json", "w") as outfile:
    json.dump(json_dict, outfile, indent=4, sort_keys=True)

EDIT 2

It looks like you don't want to save your json as the desired output, but be abble to read it.

In fact, you can also use json.dumps() in order to print your json.

with open('abc.json', 'r') as handle:
    new_json_dict = json.load(handle)
    print json.dumps(json_dict, indent=4, sort_keys=True)

There is still one problem here though, "filename": is printed at the end of the list because the d of data comes before the f.

To force the order, you will have to use an OrderedDict in the generation of the dict. Be careful the syntax is ugly (imo) with python 2.X

Here is the new complete solution ;)

import json
from collections import OrderedDict

dic = {
        'bomber': [1, 2, 3, 4, 5],
        'irritation': [1, 3, 5, 7, 8]
      }

json_dict = OrderedDict([
              ('filename', 'abc.pdf'),
              ('data', [ OrderedDict([
                                        ('keyword', k),
                                        ('term_freq', len(v)),
                                        ('lists', [{'occurrance': i} for i in v])
                                     ]) for k, v in dic.iteritems()])
            ])

with open('abc.json', 'w') as outfile:
    json.dump(json_dict, outfile)


# Now to read the orderer json file

with open('abc.json', 'r') as handle:
    new_json_dict = json.load(handle, object_pairs_hook=OrderedDict)
    print json.dumps(json_dict, indent=4)

Will output:

{
    "filename": "abc.pdf", 
    "data": [
        {
            "keyword": "bomber", 
            "term_freq": 5, 
            "lists": [
                {
                    "occurrance": 1
                }, 
                {
                    "occurrance": 2
                }, 
                {
                    "occurrance": 3
                }, 
                {
                    "occurrance": 4
                }, 
                {
                    "occurrance": 5
                }
            ]
        }, 
        {
            "keyword": "irritation", 
            "term_freq": 5, 
            "lists": [
                {
                    "occurrance": 1
                }, 
                {
                    "occurrance": 3
                }, 
                {
                    "occurrance": 5
                }, 
                {
                    "occurrance": 7
                }, 
                {
                    "occurrance": 8
                }
            ]
        }
    ]
}

But be carefull, most of the time, it is better to save a regular .json file in order to be cross languages.

Centro answered 13/2, 2017 at 13:50 Comment(2)
You nailed it boss.Bordiuk
@Owl Max, sorry to be coming so late to this question but can you please help me with a question that I have which is based on similar lines,#53731625Lymphoblast
A
3

Your current code is not working because the loop iterates through the before-last item adding the }, then when the loop runs again it sets the flag to false, but the last time it ran it added a , since it thought that there will be another element.

If this is your dict: a = {"bomber":[1,2,3,4,5]} then you can do:

import json
file_name = "a_file.json"
file_name_input = "abc.pdf"
new_output = {}
new_output["filename"] = file_name_input

new_data = []
i = 0
for key, val in a.iteritems():
   new_data.append({"keyword":key, "lists":[], "term_freq":len(val)})
   for p in val:
       new_data[i]["lists"].append({"occurrance":p})
   i += 1

new_output['data'] = new_data

Then save the data by:

f = open(file_name, 'w+')
f.write(json.dumps(new_output, indent=4, sort_keys=True, default=unicode))
f.close()
Angle answered 13/2, 2017 at 12:48 Comment(4)
I'm sorry but please understand the problem, I'm not reading a JSON file of any kind, I have to create a JSON output from dictionary Edit 1 and not just direct output rather formatted as soon in the expected outputBordiuk
Can you please re-format it in the way shown in expected output. I tried it, ended up getting various errors. It'll be helpful and I'll accept the answer as well, thanks for the effortsBordiuk
@AsifAli how about now?Angle
Your code is good, but I think the original idea of @AsifAli to use enumarate in the for loop rather than i = 0; i++ is more pythonic (imo). Also, please look in my answer the use case of the with statement.Telly

© 2022 - 2024 — McMap. All rights reserved.