Creating a new column based on if-elif-else condition [duplicate]
Asked Answered
R

8

174

I have a DataFrame df:

    A    B
a   2    2 
b   3    1
c   1    3

I want to create a new column based on the following criteria:

if row A == B: 0

if rowA > B: 1

if row A < B: -1

so given the above table, it should be:

    A    B    C
a   2    2    0
b   3    1    1
c   1    3   -1 

For typical if else cases I do np.where(df.A > df.B, 1, -1), does pandas provide a special syntax for solving my problem with one step (without the necessity of creating 3 new columns and then combining the result)?

Roy answered 11/2, 2014 at 12:49 Comment(2)
@DSM has answered this question but I meant something like df['C']=df.apply(myFunc(row), axis=1) where myFunc does what you want, this does not involve creating '3 columns'Dwaindwaine
Since pandas 2.2.0, you can use case_when (a syntax useful in SQL in the context in the OP). See this answer for a demo.Melendez
T
260

To formalize some of the approaches laid out above:

Create a function that operates on the rows of your dataframe like so:

def f(row):
    if row['A'] == row['B']:
        val = 0
    elif row['A'] > row['B']:
        val = 1
    else:
        val = -1
    return val

Then apply it to your dataframe passing in the axis=1 option:

In [1]: df['C'] = df.apply(f, axis=1)

In [2]: df
Out[2]:
   A  B  C
a  2  2  0
b  3  1  1
c  1  3 -1

Of course, this is not vectorized so performance may not be as good when scaled to a large number of records. Still, I think it is much more readable. Especially coming from a SAS background.

Edit

Here is the vectorized version

df['C'] = np.where(
    df['A'] == df['B'], 0, np.where(
    df['A'] >  df['B'], 1, -1)) 
Tonytonya answered 11/2, 2014 at 20:4 Comment(6)
What if I want to pass another parameter along with row in the function? If I do, it says row not defined..Mcanally
You have to use the args parameter of the .apply function: pandas.pydata.org/pandas-docs/stable/generated/…Tonytonya
I'm an old SAS user learning Python, and there's definitely a learning curve! :-) For example, the above code could be written in SAS as: data df; set df; if A=B then C=0; else if A>B then C=1; else C=-1; run; Very elegant and simple.Slosberg
thanks for the answer. However, I could not understand why axis=1 is required. Without passing axis=1 the row had only 1st column.Tiffie
@Tonytonya could you please give a vectorized version?Greataunt
For the unvectorized, your function contains arguments - and so it would be: df['c'] = df(lambda row: f(row), axis=1)Tinderbox
A
92
df.loc[df['A'] == df['B'], 'C'] = 0
df.loc[df['A'] > df['B'], 'C'] = 1
df.loc[df['A'] < df['B'], 'C'] = -1

Easy to solve using indexing. The first line of code reads like so, if column A is equal to column B then create and set column C equal to 0.

Assistance answered 17/10, 2016 at 16:40 Comment(0)
I
18

For this particular relationship, you could use np.sign:

>>> df["C"] = np.sign(df.A - df.B)
>>> df
   A  B  C
a  2  2  0
b  3  1  1
c  1  3 -1
Isochronal answered 11/2, 2014 at 13:2 Comment(0)
F
12

When you have multiple if conditions, numpy.select is the way to go:

In [4102]: import numpy as np
In [4098]: conditions = [df.A.eq(df.B), df.A.gt(df.B), df.A.lt(df.B)]
In [4096]: choices = [0, 1, -1]

In [4100]: df['C'] = np.select(conditions, choices)

In [4101]: df
Out[4101]: 
   A  B  C
a  2  2  0
b  3  1  1
c  1  3 -1
Francescafrancesco answered 30/11, 2020 at 20:57 Comment(0)
P
8

enter image description here

Lets say above one is your original dataframe and you want to add a new column 'old'

If age greater than 50 then we consider as older=yes otherwise False

step 1: Get the indexes of rows whose age greater than 50

row_indexes=df[df['age']>=50].index

step 2: Using .loc we can assign a new value to column

df.loc[row_indexes,'elderly']="yes"

same for age below less than 50

row_indexes=df[df['age']<50].index

df[row_indexes,'elderly']="no"

Pronounced answered 12/9, 2018 at 20:23 Comment(0)
B
4

You can use the method mask:

df['C'] = np.nan
df['C'] = df['C'].mask(df.A == df.B, 0).mask(df.A > df.B, 1).mask(df.A < df.B, -1)
Bibulous answered 20/9, 2021 at 20:48 Comment(0)
S
2

a one liner solution with list comprehension and zip() is also

df.loc[:,'C']= [0 if d1==d2 else 1 if d1>d2 else -1 for d1,d2 in zip(df.A,df.B)]

which returns the desired output

Snatchy answered 27/5, 2023 at 12:32 Comment(0)
U
1

This is a one line of code that achieves the desired result. This line of code assigns a new column 'C' to the DataFrame 'df'. The new column 'C' will have a value of 0 if the values in columns 'A' and 'B' are equal, a value of 1 if the value in column 'A' is greater than the value in column 'B', and a value of -1 if the value in column 'A' is less than the value in column 'B'.

To achieve this, the apply function is used on the DataFrame 'df'. The apply function applies a function along a given axis of the DataFrame. In this case, the axis is set to 1, which means the function will be applied to each row in the DataFrame.

The function being applied is a lambda function that takes a row as input. It checks the values in columns 'A' and 'B' of the current row and returns the appropriate value (0, 1, or -1) for the new column 'C'.

After running this line of code, the DataFrame 'df' will have an additional column 'C' with the calculated values based on the comparison of columns 'A' and 'B'.

import pandas as pd
df['C'] = df.apply(lambda row: 0 if row['A'] == row['B'] else 1 if row['A'] > row['B'] else -1, axis=1)
Untouchable answered 12/10, 2023 at 18:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.