melt column by substring of the columns name in pandas (python)
Asked Answered
R

5

5

I have dataframe:

         subject           A_target_word_gd  A_target_word_fd B_target_word_gd  B_target_word_fd  subject_type 
           1                      1             2                3                    4             mild 
           2                      11            12               13                  14             moderate

And I want to melt it to a dataframe that will look:

     cond    subject    subject_type     value_type   value
      A         1        mild             gd           1           
      A         1        mild             fg           2           
      B         1        mild             gd           3            
      B         1        mild             fg           4  
      A         2        moderate         gd           11           
      A         2        moderate         fg           12           
      B         2        moderate         gd           13            
      B         2        moderate         fg           14          
...

...

Meaning, to melt based on the delimiter of the columns name.

What is the best way to do that?

Reword answered 1/1, 2020 at 7:44 Comment(0)
S
1

One more approach (very similar to what @anky_91 has posted. had already started typing it before he posted, hence putting it out there.)

new_df =pd.melt(df, id_vars=['subject_type','subject'], var_name='abc').sort_values(by=['subject', 'subject_type'])
new_df['cond']=new_df['abc'].apply(lambda x: (x.split('_'))[0])
new_df['value_type']=new_df.pop('abc').apply(lambda x: (x.split('_'))[-1])
new_df

Output

subject_type    subject     value   cond    value_type
0   mild              1     1          A    gd
2   mild              1     2          A    fd
4   mild              1     3          B    gd
6   mild              1     4          B    fd
1   moderate          2     11         A    gd
3   moderate          2     12         A    fd
5   moderate          2     13         B    gd
7   moderate          2     14         B    fd
Scaife answered 1/1, 2020 at 8:19 Comment(0)
G
2

Here is my way using melt and series.str.split():

m = df.melt(['subject','subject_type'])
n = m['variable'].str.split('_',expand=True).iloc[:,[0,-1]]
n.columns = ['cond','value_type']
m = m.drop('variable',1).assign(**n).sort_values('subject')

print(m)

   subject subject_type  value cond value_type
0        1         mild      1    A         gd
2        1         mild      2    A         fd
4        1         mild      3    B         gd
6        1         mild      4    B         fd
1        2     moderate     11    A         gd
3        2     moderate     12    A         fd
5        2     moderate     13    B         gd
7        2     moderate     14    B         fd
Gard answered 1/1, 2020 at 8:12 Comment(0)
E
2

Set index to subject, subject_type. Split columns by the string _target_word_ to make multiindex columns. Rename axis to proper names and stack and reset_index

df1 = df.set_index(['subject', 'subject_type'])
df1.columns = df1.columns.str.split('_target_word_', expand=True)
df_final = df1.rename_axis(['cond','value_type'],axis=1).stack([0,1]).reset_index(name='value')

Out[91]:
   subject subject_type cond value_type  value
0        1         mild    A         fd      2
1        1         mild    A         gd      1
2        1         mild    B         fd      4
3        1         mild    B         gd      3
4        2     moderate    A         fd     12
5        2     moderate    A         gd     11
6        2     moderate    B         fd     14
7        2     moderate    B         gd     13
Epochmaking answered 1/1, 2020 at 8:35 Comment(0)
C
2

First reshape DataFrame.set_index with DataFrame.stack and DataFrame.reset_index and then convert column with _ by Series.str.split to new columns:

df = df.set_index(['subject','subject_type']).stack().reset_index(name='value')
df[['cond','value_type']] = df.pop('level_2').str.split('_', expand=True).iloc[:, [0,-1]]
print (df)
   subject subject_type  value cond value_type
0        1         mild      1    A         gd
1        1         mild      2    A         fd
2        1         mild      3    B         gd
3        1         mild      4    B         fd
4        2     moderate     11    A         gd
5        2     moderate     12    A         fd
6        2     moderate     13    B         gd
7        2     moderate     14    B         fd
Corpulent answered 1/1, 2020 at 8:51 Comment(0)
S
1

One more approach (very similar to what @anky_91 has posted. had already started typing it before he posted, hence putting it out there.)

new_df =pd.melt(df, id_vars=['subject_type','subject'], var_name='abc').sort_values(by=['subject', 'subject_type'])
new_df['cond']=new_df['abc'].apply(lambda x: (x.split('_'))[0])
new_df['value_type']=new_df.pop('abc').apply(lambda x: (x.split('_'))[-1])
new_df

Output

subject_type    subject     value   cond    value_type
0   mild              1     1          A    gd
2   mild              1     2          A    fd
4   mild              1     3          B    gd
6   mild              1     4          B    fd
1   moderate          2     11         A    gd
3   moderate          2     12         A    fd
5   moderate          2     13         B    gd
7   moderate          2     14         B    fd
Scaife answered 1/1, 2020 at 8:19 Comment(0)
D
1

one option is with pivot_longer from pyjanitor:

# pip install pyjanitor
import janitor
import pandas as pd

(df
.pivot_longer(
    index = 'subject*', 
    names_to = ('cond', 'value_type'), 
    names_sep = '_target_word_')
)
   subject subject_type cond value_type  value
0        1         mild    A         gd      1
1        2     moderate    A         gd     11
2        1         mild    A         fd      2
3        2     moderate    A         fd     12
4        1         mild    B         gd      3
5        2     moderate    B         gd     13
6        1         mild    B         fd      4
7        2     moderate    B         fd     14
Didymous answered 29/9, 2022 at 2:28 Comment(3)
pyjanitor is coming up with all these cool things. Curious to know how is the performance of this module?Gard
Hi @anky, pivot_longer is very performant; I wrote an article and added some performance benchmarks. Of course, if there are ways to improve it, pls feel free to contribute. Some other functions are for ease of useDidymous
@anky, is there a benchmark location maybe on stackoverflow for reshape functions?Didymous

© 2022 - 2024 — McMap. All rights reserved.