Python 3: Flattening nested dictionaries and lists within dictionaries
Asked Answered
K

3

10

I am dealing with a complex nested dictionary and list data structure. I need to flatten the data and bring all nested items to level 0. See below example for more clarity :

{a:1,b:2,c:{c1:[{c11:1,c12:2,c13:3},{c21:1,c22:2,c23:3}],d1:[{d11:1,d12:2,d13:3},{d21:1,d22:2,d23:3}]},x:1,y:2}

i need to flatten this to:

{a:1,b:2,c_c1_c11:1, c_c1_c12:2,c_c1_c13:3,c_c1_c21:1,c_c1_c22:2,c_c1_c23:3, c_d1,d11:1...and so on}

I took reference from the first answer in this post, but it can only work if i have nested dictionaries, and not if lists are nested within dictionaries and more dictionaries nested within those lists.

I modified the code a bit to fit my use case, but this code doesn't work

def flattenDict(d):
node_map = {}
node_path = []
def nodeRecursiveMap(d, node_path):
    for key, val in d.items():
        if ((type(val) is not dict)&(type(val) is not list)): 
            node_map['_'.join(node_path + [key])] = val
        if type(val) is list:
            def nodeListRecursion(val,node_path):
                for element in val:
                    if ((type(element) is not dict)&(type(element) is not list)) : node_map['_'.join(node_path + [key])] = element
                    if type(element) is list: nodeListRecursion(element,node_map)
                    if type(element) is dict: nodeRecursiveMap(element, node_path + [key])
            nodeListRecursion(val,node_path)
        if type(val) is dict: nodeRecursiveMap(val, node_path + [key])
nodeRecursiveMap(d, node_path)
return node_map

The indentation is getting messed up when i paste my code here. But i would really appreciate any help here.

Keir answered 29/8, 2018 at 15:51 Comment(0)
E
21

I think you're overcomplicating things. You start from a dictionary, with keys and values. Its values are either a dictionary or a list of dictionaries which you want to recurse down, or they're not, in which case you want to leave it alone. So:

def flatten(d):
    out = {}
    for key, val in d.items():
        if isinstance(val, dict):
            val = [val]
        if isinstance(val, list):
            for subdict in val:
                deeper = flatten(subdict).items()
                out.update({key + '_' + key2: val2 for key2, val2 in deeper})
        else:
            out[key] = val
    return out

gives me

In [34]: nested = {'a': 1, 'b': 2, 'c': {'c1': [{'c11': 1, 'c12': 2, 'c13': 3}, {'c21': 1, 'c22': 2, 'c23': 3}], 'd1': [{'d11': 1, 'd12': 2, 'd13': 3}, {'d21': 1, 'd22': 2, 'd23': 3}]}, 'x': 1, 'y': 2}

In [35]: flatten(nested)
Out[35]: 
{'a': 1,
 'b': 2,
 'c_c1_c11': 1,
 'c_c1_c12': 2,
 'c_c1_c13': 3,
 'c_c1_c21': 1,
 'c_c1_c22': 2,
 'c_c1_c23': 3,
 'c_d1_d11': 1,
 'c_d1_d12': 2,
 'c_d1_d13': 3,
 'c_d1_d21': 1,
 'c_d1_d22': 2,
 'c_d1_d23': 3,
 'x': 1,
 'y': 2}
Effector answered 29/8, 2018 at 16:4 Comment(6)
Upvoted. Very clever to use val = [val] to process dict values in the same way as list values.Otey
Thanks, this works. I just realized, that in my actual data, the key's are repeating across different subdicts within a list. Because of this, the final flattened output had only the last subdict (overwriting the ones above it).Keir
Not a good solution. For example for nested = {'asd': [{'a': 'hi'}, {'a': 'hi2'}]} you will lose the value 'hi'. And crash for lists of non-dicts for example: nested = {'asd' : ['a', 'b' ] }Mouflon
@DanielBraun: you're correct that for inputs which are shaped differently from the OP's, you'll need a different solution. That's.. not unexpected.Effector
@Effector I presumed the unique field naming was ops way of conveying the idea and not a known attribute of the inputMouflon
This was neat - I have been trying to test a nested dotmap.DotMap/OrderedDict/dict for custom/non-builtin types - and rather than traversing it I first convert it to a flattened object before the (now easy) loop over the values - thank youAerograph
I
15

In my project, I am using an updated version of function from DSMs answer to flatten dict which may contain other dict or list or list of dict. I hope it will be helpful.

def flatten(input_dict, separator='_', prefix=''):
    output_dict = {}
    for key, value in input_dict.items():
        if isinstance(value, dict) and value:
            deeper = flatten(value, separator, prefix+key+separator)
            output_dict.update({key2: val2 for key2, val2 in deeper.items()})
        elif isinstance(value, list) and value:
            for index, sublist in enumerate(value, start=1):
                if isinstance(sublist, dict) and sublist:
                    deeper = flatten(sublist, separator, prefix+key+separator+str(index)+separator)
                    output_dict.update({key2: val2 for key2, val2 in deeper.items()})
                else:
                    output_dict[prefix+key+separator+str(index)] = value
        else:
            output_dict[prefix+key] = value
    return output_dict
Idioplasm answered 24/4, 2019 at 15:54 Comment(0)
C
0

Updated DSMs answer to support a list of dictionaries with the same keys by concatenating the index with the key without adding much complexity to code.

code:

def flatten(d):
    out = {}
    for key, val in d.items():
        if isinstance(val, dict):
            val = [val]
        if isinstance(val, list):
            for subidx, subdict in enumerate(val):
                deeper = flatten(subdict).items()
                out.update({key + f'_{subidx}' + '_' + key2 + f'_{idx}': val2 
                            for idx, (key2, val2) in enumerate(deeper)})
        else:
            out[key] = val
    return out

Input:

nested = {'a': 1, 
          'b': 2, 
          'c': {'c1': [{'c11': 1, 'c12': 2, 'c13': 3}, {'c21': 1, 'c22': 2, 'c23': 3}], 
                'd1': [{'dd1': 1, 'dd2': 2}, {'dd1': 3, 'dd2': 4}]}, # same keys
          'x': [{'xx1': 1, 'xx2': 2}, {'xx1': 3, 'xx2': 4}], # same keys
          'y': 2}

Output:

{'a': 1,
 'b': 2,
 'c_0_c1_0_c11_0_0': 1,
 'c_0_c1_0_c12_1_1': 2,
 'c_0_c1_0_c13_2_2': 3,
 'c_0_c1_1_c21_0_3': 1,
 'c_0_c1_1_c22_1_4': 2,
 'c_0_c1_1_c23_2_5': 3,
 'c_0_d1_0_dd1_0_6': 1,
 'c_0_d1_0_dd2_1_7': 2,
 'c_0_d1_1_dd1_0_8': 3,
 'c_0_d1_1_dd2_1_9': 4,
 'x_0_xx1_0': 1,
 'x_0_xx2_1': 2,
 'x_1_xx1_0': 3,
 'x_1_xx2_1': 4,
 'y': 2}
Configuration answered 16/6, 2023 at 5:9 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.