How to apply pandas.map() where the function takes more than 1 argument
Asked Answered
N

3

7

Suppose I have a dataframe containing a column of probability. Now I create a map function which returns 1 if the probability is greater than a threshold value, otherwise returns 0. Now the catch is that I want to specify the threshold by giving it as an argument to the function, and then mapping it on the pandas dataframe.

Take the code example below:

def partition(x,threshold):
    if x<threshold:
        return 0
    else:
        return 1

df = pd.DataFrame({'probability':[0.2,0.8,0.4,0.95]})
df2 = df.map(partition)

My question is, how would the last line work, i.e. how do I pass the threshold value inside my map function?

Nonscheduled answered 1/7, 2020 at 17:12 Comment(0)
A
7

We can use Dataframe.applymap

df2 = df.applymap(lambda x: partition(x, threshold=0.5))

Or if only one column:

df['probability']=df['probability'].apply(lambda x: partition(x, threshold=0.5))

but it is not neccesary here. You can do:

df2 = df.ge(threshold).astype(int)

I recommend you see it

Arching answered 1/7, 2020 at 17:18 Comment(1)
Note that, since 2.1.0, applymap is deprecated in favor of DataFrame.map.Asphaltite
M
2

You can use lambda for that purpose:

def partition(x,threshold):
    if x<threshold:
        return 0
    else:
        return 1

df=pd.DataFrame({'probability':[0.2,0.8,0.4,0.95]})
df['probability']=df['probability'].map(lambda x: partition(x, threshold=0.5))
Miru answered 1/7, 2020 at 17:15 Comment(2)
he has a DataframeArching
True, and from the example it does seem like taking the whole data frame is not the way to go @ArchingMiru
H
1

If there are extra arguments, it's better to use apply():

df['new'] = df['probability'].apply(partition, threshold=0.5)

or wrap the function with functools.partial and map this new function:

from functools import partial
df['new'] = df['probability'].map(partial(partition, threshold=0.5))

# a bit more legibly
partition_05 = partial(partition, threshold=0.5)
df['new'] = df['probability'].map(partition_05)

You can pass the extra argument as a kwarg to applymap() too:

df = df.applymap(partition, threshold=0.5)

That said, please use vectorized code wherever possible. For example, in the OP,

df['new'] = (df['probability'] > 0.5) * 1

produces the desired column.

Harmonia answered 2/4, 2023 at 0:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.