Using DictWriter to write a CSV when the fields are not known beforehand
Asked Answered
P

3

6

I am parsing a large piece of text into dictionaries, with the end objective of creating a CSV file with the keys as column headers.

csv.DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)

The problem arises as the dict for any 'n'th row can include a new, never before used key. I then want the CSV to contain a column for this new key as well. In short, all my fields are not known beforehand so I cannot compile a complete fieldnames at the beginning.

Is there a recommended way to have csv.DictWriter not ignore missing fields but add them to fieldnames instead? Merely changing fieldnames at this point would leave the prior lines with an incorrectly lower number of fields.

Puri answered 6/11, 2014 at 4:56 Comment(3)
Can you please provide a sample dictionary structure.Picard
The issue is that the dict keys are unknown until the code executes, yet I want to be able to write a CSV out of a list of dicts. I am working around this by compiling the entire list of dicts and then iterating over the keys to identify unique keys that I can use for fieldnames. However as the dataset grows, I want to be able to write a CSV before I know all the dicts.Puri
Pranab please review my answer below.Picard
P
6

Instead of using DictWriter which can be confusing in your case as dictionaries are not ordered I tried using writerow method of csv. Here is what i did :

"""
a) First took all the keys of dictionary and sorted it, which is not necessary.
b) Created a result list which appends value related the headers which is key of our input dict and if key is not available then .get() will return None. 
   So result list will contain lists for rows data.
c) Wrote header and each row from result list in csv file
"""

data_dict = [{ "Header_1":"data_1", "Header_2":"data_2", "Header_3":"data_3"},
             { "Header_1":"data_4", "Header_2":"data_5", "Header_3":"data_6"},
             { "Header_1":"data_7", "Header_2":"data_8", "Header_3":"data_9", "Header_4":"data_10"},
             { "Header_1":"data_11", "Header_3":"data_12"},
             { "Header_1":"data_13", "Header_2":"data_14", "Header_3":"data_15"}]

"""
   In the third dict we have extra key, value.
   In forth we dont have have header_2 were we aspect blank value in our csv file.
"""
process_data = [ [k,v] for _dict in data_dict for k,v in _dict.iteritems() ]           

headers = [ i[0] for i in process_data ]
headers = sorted(list(set(headers)))

result = []
for _dict in data_dict:
    row = []
    for header in headers:
        row.append(_dict.get(header, None))
    result.append(row)


import csv
with open('demo.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=';', dialect='excel', 
                            quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(headers)    
    for r in result:
        spamwriter.writerow(r)

enter image description here

Picard answered 6/11, 2014 at 12:6 Comment(0)
N
1

This was my solution in Python 3. It could be improved but I found using DictWriter more straightforward if you don't need the output ordered.

# input file:
#   {"foo": 1, "bar": 2}
#   {"foo": 3, "baz": 4}
#   {"bar": 5, "zag": 6}
#   {"foo": 7, "baz": 8, "zag": 9}
#   {"whammy": 10}

import json
import csv

FILENAME_IN = 'json_lines_in.log'
FILENAME_OUT = 'log_data.csv'

def json_to_csv_export(filename_in, filename_out):
    
    # load all records - list of dicts
    records = []
    with open(filename_in, 'r') as json_file:
        for line in json_file:
            records.append(json.loads(line))

    # 'set' ensures unique entries
    fieldnames_set = set()

    # discover field names
    for record in records:
        for field in record:
            fieldnames_set.add(field)

    # write csv
    with open(filename_out, 'w', newline='') as csv_file:
        writer = csv.DictWriter(csv_file, fieldnames=fieldnames_set, extrasaction='ignore')
        writer.writeheader()
        writer.writerows(records)

if __name__ == '__main__':
    json_to_csv_export(FILENAME_IN, FILENAME_OUT)

# output:
#  whammy,zag,foo,baz,bar
#   ,,1,,2
#   ,,3,4,
#   ,6,,,5
#   ,9,7,8,
#   10,,,,

CSV output image

Natal answered 11/1, 2022 at 18:8 Comment(0)
M
-1

I did the following: collect all unique values of the headers and create list of these values. With the list you can use a default value (restval='') just to miss your values, that are not in the row.

Matrimonial answered 29/6, 2021 at 22:57 Comment(1)
Hey, that makes sense. But to be a really useful solution you should offer a valid and running code, not the idea behind it. Could you provide it?Atrip

© 2022 - 2024 — McMap. All rights reserved.