Pandas - combine column values into a list in a new column
Asked Answered
N

3

72

I have a Python Pandas dataframe df:

d = [['hello', 1, 'GOOD', 'long.kw'],
     [1.2, 'chipotle', np.nan, 'bingo'],
     ['various', np.nan, 3000, 123.456]]
t = pd.DataFrame(data=d, columns=['A','B','C','D']) 

which looks like this:

print(t)
         A         B     C        D
0    hello         1  GOOD  long.kw
1      1.2  chipotle   NaN    bingo
2  various       NaN  3000  123.456

I am trying to create a new column which is a list of the values in A, B, C, and D. So it would look like this:

t['combined']                                             

Out[125]: 
0        [hello, 1, GOOD, long.kw]
1        [1.2, chipotle, nan, bingo]
2        [various, nan, 3000, 123.456]
Name: combined, dtype: object

I am trying this code:

t['combined'] = t.apply(lambda x: list([x['A'],
                                        x['B'],
                                        x['C'],
                                        x['D']]),axis=1)    

Which returns this error:

ValueError: Wrong number of items passed 4, placement implies 1 

What is puzzling to me is if I remove one of the columns that I want to put in the list (or add another column to the dataframe that I DON'T add to the list), my code works.

For instance, run this code:

t['combined'] = t.apply(lambda x: list([x['A'],
                                        x['B'],
                                        x['D']]),axis=1)      

Returns this which is perfect if I only wanted the 3 columns:

print(t)
         A         B     C        D                 combined
0    hello         1  GOOD  long.kw      [hello, 1, long.kw]
1      1.2  chipotle   NaN    bingo   [1.2, chipotle, bingo]
2  various       NaN  3000  123.456  [various, nan, 123.456]
                                    

I am at a complete loss as to why requesting the 'combined' list be made of all columns in the dataframe would create an error, but selecting all but 1 column to create the 'combined' list works as expected.

Nickynico answered 10/5, 2017 at 16:24 Comment(0)
H
129

try this :

t['combined']= t.values.tolist()

t
Out[50]: 
         A         B     C        D                       combined
0    hello         1  GOOD  long.kw      [hello, 1, GOOD, long.kw]
1     1.20  chipotle   NaN    bingo    [1.2, chipotle, nan, bingo]
2  various       NaN  3000   123.46  [various, nan, 3000, 123.456]
Haematozoon answered 10/5, 2017 at 16:36 Comment(0)
O
7

Another method is to call list() on the underlying numpy array

t['combined_arr'] = list(t.values)

It should be noted that this produces a slightly different column from using .tolist(). As can be seen from below, tolist() creates a nested list while list() creates a list of arrays.

t['combined_list'] = t[['A', 'B']].values.tolist()
t['combined_arr'] = list(t[['A', 'B']].values)

t.iloc[0, 4]  # ['hello', 1]
t.iloc[0, 5]  # array(['hello', 1], dtype=object)

Depending on the use case, it's sometimes useful to preserve the ndarray type.


If you want to combine columns without NaN values, then the fastest method is to loop over rows while checking for NaN values. As NaN!=NaN, the fastest check is to check if a value equals itself.

t['combined'] = [[e for e in row if e==e] for row in t.values.tolist()]


         A     B     C        D                     combined
0    hello   1.0  GOOD  long.kw  [hello, 1.0, GOOD, long.kw]
1      1.2  10.0   NaN    bingo           [1.2, 10.0, bingo]  <-- no NaN
2  various   NaN  3000  123.456     [various, 3000, 123.456]  <-- no NaN

A more complete check is to use isnan from the built-in math module.

import math
t['combined'] = [[e for e in row if not (isinstance(e, float) and math.isnan(e))] for row in t.values.tolist()]

To combine specific columns of non-NaN values, select the columns first:

cols = ['A', 'B']
t['combined'] = [[e for e in row if e==e] for row in t[cols].values.tolist()]
Onia answered 18/7, 2022 at 20:0 Comment(0)
A
1

Here is a way with the NaN

t.assign(combined = pd.Series(d))

Output:

         A         B     C        D                       combined
0    hello         1  GOOD  long.kw      [hello, 1, GOOD, long.kw]
1      1.2  chipotle   NaN    bingo    [1.2, chipotle, nan, bingo]
2  various       NaN  3000  123.456  [various, nan, 3000, 123.456]

Here is a way without the NaN

t.assign(combined = t.stack().groupby(level=0).agg(list))

Output:

         A         B     C        D                   combined
0    hello         1  GOOD  long.kw  [hello, 1, GOOD, long.kw]
1      1.2  chipotle   NaN    bingo     [1.2, chipotle, bingo]
2  various       NaN  3000  123.456   [various, 3000, 123.456]
Appenzell answered 18/7, 2022 at 20:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.