Pandas Unpack List of Dicts to Columns
Asked Answered
Z

3

5

I have a dataframe that has a field called fields which is a list of dicts (all rows have the same format). Here is how the dataframe is structured:

formId    fields
   123    [{'number': 1, 'label': 'Last Name', 'value': 'Doe'}, {'number': 2, 'label': 'First Name', 'value': 'John'}]

I am trying to unpack the fields column so it looks like:

formId    Last Name    First Name
   123          Doe          John

The code I have currently is:

for i,r in df.iterrows():
    for field in r['fields']:
        df.at[i, field['label']] = field['value']

However this does not seem like the most efficient way. Is there a better way to accomplish this?

Zygodactyl answered 28/8, 2024 at 22:4 Comment(1)
Can you share your input as a dictionary? df.to_dict is one way to do thisAikoail
C
2

Personally, I'd construct new dataframe:

df = pd.DataFrame(
    [
        {"formId": form_id, **{f["label"]: f["value"] for f in fields}}
        for form_id, fields in zip(df["formId"], df["fields"])
    ]
)

print(df)

Prints:

   formId Last Name First Name
0     123       Doe       John
Crowded answered 28/8, 2024 at 22:59 Comment(0)
I
2

You can use .apply and .concat to convert the dicts to series. Finally .pivot to convert the column to headers.

Data:

import pandas as pd


data = {"formId": 123, "fields": [{'number': 1, 'label': 'Last Name', 'value': 'Doe'},
                                  {'number': 2, 'label': 'First Name', 'value': 'John'}]}
df = pd.DataFrame(data=data)

Code:

df = (pd
      .concat(objs=[df, df.pop("fields").apply(func=pd.Series)], axis=1)
      .pivot(index="formId", columns="label", values="value")
      .reset_index()
      .rename_axis(mapper=None, axis=1)
      )

print(df)

Output:

   formId First Name Last Name
0     123       John       Doe
Impenitent answered 28/8, 2024 at 22:58 Comment(0)
C
2

Personally, I'd construct new dataframe:

df = pd.DataFrame(
    [
        {"formId": form_id, **{f["label"]: f["value"] for f in fields}}
        for form_id, fields in zip(df["formId"], df["fields"])
    ]
)

print(df)

Prints:

   formId Last Name First Name
0     123       Doe       John
Crowded answered 28/8, 2024 at 22:59 Comment(0)
M
2

Solution:

v = pd.concat([pd.json_normalize(x) for x in df["fields"]]).pivot_table(
    columns="label", values="value", aggfunc=list
)
out = df[["formId"]].join(v.explode(v.columns.tolist(), ignore_index=True))

Result using example input from OP:

   formId First Name Last Name
0     123       John       Doe

Also works for multiple rows. For this example input:

df = pd.DataFrame(
    {
        "formId": [123, 456],
        "fields": [
            [
                {"number": 1, "label": "Last Name", "value": "Doe"},
                {"number": 2, "label": "First Name", "value": "John"},
            ],
            [
                {"number": 1, "label": "Last Name", "value": "Smith"},
                {"number": 2, "label": "First Name", "value": "Jack"},
            ],
        ],
    }
)

Result:

   formId First Name Last Name
0     123       John       Doe
1     456       Jack     Smith
Mallarme answered 28/8, 2024 at 23:6 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.