Create multiple columns from a single column and group by pandas

Asked 8/5, 2024 at 8:24 Answered 8/5, 2024 at 9:14

work = pd.DataFrame({"JOB" : ['JOB01', 'JOB01', 'JOB02', 'JOB02', 'JOB03', 'JOB03'],
"STATUS" : ['ON_ORDER', 'ACTIVE','TO_BE_ALLOCATED', 'ON_ORDER', 'ACTIVE','TO_BE_ALLOCATED'],
"PART" : ['PART01', 'PART02','PART03','PART04','PART05','PART06']})

How can I use Pandas to groupby the JOB, split Status into columns based on the values and concatenate the Part field based on the JOB.

Desired Output:

JOB    | ON_ORDER  | ACTIVE   | TO_BE_ALLOCATED | PART_CON
JOB01  | True      | True     | False           | Part01\nPart02
JOB02  | True      | False    | True            | Part03\nPart04
JOB03  | False     | True     | True            | Part05\nPart06

Auscultation answered 8/5, 2024 at 8:24 Comment(0)

Try:

x = df.groupby("JOB")["PART"].agg(", ".join).rename("PART_CON")
y = pd.crosstab(df["JOB"], df["STATUS"]).astype(bool)
print(pd.concat([y, x], axis=1).reset_index())

Prints:

     JOB  ACTIVE  ON_ORDER  TO_BE_ALLOCATED        PART_CON
0  JOB01    True      True            False  PART01, PART02
1  JOB02   False      True             True  PART03, PART04
2  JOB03    True     False             True  PART05, PART06

Tichonn answered 8/5, 2024 at 8:37 Comment(1)

Thanks I did not know about pandas crosstab. pandas.pydata.org/pandas-docs/stable/reference/api/… – Auscultation 8/5, 2024 at 9:1

Another possible solution, which uses pivot_table:

out = work.pivot_table(
    index='JOB', columns='STATUS', values='PART', aggfunc='first')
out.notna().assign(
    PART_CON = out.apply(
        lambda x: '\n'.join(x.sort_values().dropna()), axis=1))

Output:

STATUS  ACTIVE  ON_ORDER  TO_BE_ALLOCATED        PART_CON
JOB                                                      
JOB01     True      True            False  PART01\nPART02
JOB02    False      True             True  PART03\nPART04
JOB03     True     False             True  PART05\nPART06

Mandeville answered 8/5, 2024 at 9:14 Comment(2)

This is a nice use of the pivot_table function. The aggfunc parameter took me offgaurd there. I cannot find the "first" value in the documentation. pandas.pydata.org/docs/reference/api/pandas.pivot_table.html – Auscultation 8/5, 2024 at 9:25

Thanks, @MartinCronje! The following clarifies what first aggregate function is: pandas.pydata.org/docs/reference/api/… – Mandeville 8/5, 2024 at 9:29

Recommended topics

Hot tags