How to get the minimum value of a row list in a pandas dataframe
Asked Answered
K

1

5

I have a pandas dataframe with a column made of lists.
The goal is to find the min of every list in row (in an efficient way).

E.g.

import pandas as pd
df = pd.DataFrame(columns=['Lists', 'Min'])
df['Lists'] = [ [1,2,3], [4,5,6], [7,8,9] ]
print(df)

The goal is the Min column:

       Lists  Min
0  [1, 2, 3]  1
1  [4, 5, 6]  4
2  [7, 8, 9]  7

Thank you in advance,
gil

Kiri answered 6/2, 2017 at 9:44 Comment(3)
Since your pandas data structures are using the object dtype, you are killing efficiency.Popp
@Popp it is the output of this algo df["b"] =np.array(map(list,[df["a"].shift(x) for x in range(1,4)])).T.tolist() - see [#37968324. Is there a way to speed up?Kiri
The issue is that you are putting lists inside your DataFrame, making it of dtype object. The dtype is inherited for the underlying numpy data structure, and object dtypes are slow. It's not the algorithm, it's your data structure.Popp
B
9

You can use apply with min:

df['Min'] = df.Lists.apply(lambda x: min(x))
print (df)
       Lists  Min
0  [1, 2, 3]    1
1  [4, 5, 6]    4
2  [7, 8, 9]    7

Thank you juanpa.arrivillaga for idea:

df['Min'] = [min(x) for x in df.Lists.tolist()]
print (df)
       Lists  Min
0  [1, 2, 3]    1
1  [4, 5, 6]    4
2  [7, 8, 9]    7

Timings:

##[300000 rows x 2 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [144]: %timeit df['Min1'] = [min(x) for x in df.Lists.values.tolist()]
10 loops, best of 3: 137 ms per loop

In [145]: %timeit df['Min2'] = [min(x) for x in df.Lists.tolist()]
10 loops, best of 3: 142 ms per loop

In [146]: %timeit df['Min3'] = [min(x) for x in df.Lists]
10 loops, best of 3: 139 ms per loop

In [147]: %timeit df['Min4'] = df.Lists.apply(lambda x: min(x))
10 loops, best of 3: 170 ms per loop
Beadruby answered 6/2, 2017 at 9:45 Comment(4)
Heck, a list-comprehension on the Lists column might be faster.Popp
Thank you @Beadruby (amazing speed of answer)Kiri
@Popp Thank you.Kiri
@Popp - Thank you very much, you are right - I add timings to solution.Beadruby

© 2022 - 2024 — McMap. All rights reserved.