Set value of one Pandas column based on value in another column
Asked Answered
U

10

144

I need to set the value of one column based on the value of another in a Pandas dataframe. This is the logic:

if df['c1'] == 'Value':
    df['c2'] = 10
else:
    df['c2'] = df['c3']

I am unable to get this to do what I want, which is to simply create a column with new values (or change the value of an existing column: either one works for me).

If I try to run the code above or if I write it as a function and use the apply method, I get the following:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Ulla answered 7/3, 2018 at 21:1 Comment(0)
B
212

one way to do this would be to use indexing with .loc.

Example

In the absence of an example dataframe, I'll make one up here:

import numpy as np
import pandas as pd

df = pd.DataFrame({'c1': list('abcdefg')})
df.loc[5, 'c1'] = 'Value'

>>> df
      c1
0      a
1      b
2      c
3      d
4      e
5  Value
6      g

Assuming you wanted to create a new column c2, equivalent to c1 except where c1 is Value, in which case, you would like to assign it to 10:

First, you could create a new column c2, and set it to equivalent as c1, using one of the following two lines (they essentially do the same thing):

df = df.assign(c2 = df['c1'])
# OR:
df['c2'] = df['c1']

Then, find all the indices where c1 is equal to 'Value' using .loc, and assign your desired value in c2 at those indices:

df.loc[df['c1'] == 'Value', 'c2'] = 10

And you end up with this:

>>> df
      c1  c2
0      a   a
1      b   b
2      c   c
3      d   d
4      e   e
5  Value  10
6      g   g

If, as you suggested in your question, you would perhaps sometimes just want to replace the values in the column you already have, rather than create a new column, then just skip the column creation, and do the following:

df['c1'].loc[df['c1'] == 'Value'] = 10
# or:
df.loc[df['c1'] == 'Value', 'c1'] = 10

Giving you:

>>> df
      c1
0      a
1      b
2      c
3      d
4      e
5     10
6      g
Bottle answered 7/3, 2018 at 21:15 Comment(8)
The second solution nailed it for me. I didn't realize you could use .loc like a WHERE statement in SQL. Makes sense. Thanks!Ulla
How would you apply 10 to multiple columns instead on just one? eg: df.loc[df['c1'] == 'Value', 'c2','c3','c4] = 10.Ave
I think you have to place all the columns you need to update the value with in a list, then loop through that list and changing the column name parameter in it?Sheritasherj
Great Solution, I am facing a similar problem. Additonally to the condition I need to select only the first 5 entries and set the value only on them. kunden_df.loc[kunden_df["Distanz"] == 1].iloc[0:amount_contracts]["Betreuer"] = name can you help me with the syntax?Eloiseloisa
When I assign values with statements like df.loc[df['c1'] == 'Value', 'c1'] = 10 I get the dreaded SettingWithCopyWarning -- even when I use the .loc[] syntax. So far it hasn't been a problem for me, but I find it odd that I get the warning, despite using the suggested .loc[] approach. Any fixes for that? This is with pandas 1.2.3, numpy 1.18.5, and python 3.7.10.Sladen
@Darren, in my experience, the most common reason for a SettingWithCopyWarning is trying to assign values on a view. is it possible that the df you're trying to modify is e.g. a subset of a larger dataframe? Using the method above on such a view would raise the warning. Also worth mentioning it can be hard to predict which actions will result in a view in pandas (see this)Bottle
@sacuL, thanks. The scenarios where I get this and can't figure out why look exactly like the example you described -- not when I'm looking at view of a larger DF.I can effectively prevent the SettingWithCopyWarning when the row subset logic in .loc[] is on the right-hand side of the operation, but not when it's on the left for assignment of values (like in your example). Luckily it has never caused a practical problem for me in anything I've worked on, but I still can't figure out why I can't assign values like the example you gave without the SettingWithCopyWarning.Sladen
Let's say I have an int column, and I want to divide its value by 1000 if its value is more 1000. Using the the first option df['c1'].loc[df['c1'] > 1000] = df['c1'].loc[df['c1'] > 1000]/1000 I got the SettingWithCopyWarning. However, with the 2nd option df['c1'].loc[df['c1'] > 1000, 'c1'] I don't get that warning.Jubbulpore
H
85

You can use np.where() to set values based on a specified condition:

#df
   c1  c2  c3
0   4   2   1
1   8   7   9
2   1   5   8
3   3   3   5
4   3   6   8

Now change values (or set) in column ['c2'] based on your condition.

df['c2'] = np.where(df.c1 == 8,'X', df.c3)

   c1  c2  c3
0   4   1   1
1   8   X   9
2   1   8   8
3   3   5   5
4   3   8   8
Heartsick answered 7/3, 2018 at 22:28 Comment(4)
what if I want to keep all original columnsChallenge
@mLstudent33, using df['newColName'] = ..., you use a column name, not in your dataframe to create a new column, assuming newColName does not already existHeartsick
Superior to all other solutions, as (a) more readable due to less ambiguity of order of terms and (b) it's more future proof as it's easier to modify it to take multiple columns into account and (c) it's fast, no interpreted code in a lambda.Gendarmerie
@Heartsick I guess c2 in your code refers to c4 in the output table? You should better correct one of them.Lett
H
45

try:

df['c2'] = df['c1'].apply(lambda x: 10 if x == 'Value' else x)
Hickerson answered 7/3, 2018 at 21:6 Comment(4)
Thanks @AlexanderHughes. My original post had a typo: there are actually three columns to consider, so this solution wouldn't work.Ulla
should be df.apply(lambda x: 10 if x['c1'] == 'Value' else x['c3'],axis=1)Heartsick
This might have performance issues with large datasets. df.apply() is slower.Bogie
Was looking for the same, found a lambda worked for me with a dataframe. My code is 'ard['Hr'] = ard.apply(lambda x: x['Hr']+1 if x['Mi'] >= 45 and x['Mi'] < 60 else x['Hr'],axis=1)'Krupp
S
24

Note the tilda that reverses the selection. It uses pandas methods (i.e. is faster than if/else).

df.loc[(df['c1'] == 'Value'), 'c2'] = 10
df.loc[~(df['c1'] == 'Value'), 'c2'] = df['c3']
Scansorial answered 22/9, 2020 at 19:4 Comment(0)
S
7

I suggest doing it in two steps:

# set fixed value to 'c2' where the condition is met
df.loc[df['c1'] == 'Value', 'c2'] = 10

# copy value from 'c3' to 'c2' where the condition is NOT met
df.loc[df['c1'] != 'Value', 'c2'] = df[df['c1'] != 'Value', 'c3']
Steffy answered 7/3, 2018 at 22:29 Comment(1)
This is super confusing, can you show some tables to clarify?Challenge
U
6

You can use pandas.DataFrame.mask to add virtually as many conditions as you need:

data = {'a': [1,2,3,4,5], 'b': [6,8,9,10,11]}

d = pd.DataFrame.from_dict(data, orient='columns')
c = {'c1': (2, 'Value1'), 'c2': (3, 'Value2'), 'c3': (5, d['b'])}

d['new'] = np.nan
for value in c.values():
    d['new'].mask(d['a'] == value[0], value[1], inplace=True)

d['new'] = d['new'].fillna('Else')
d

Output:

    a   b   new
0   1   6   Else
1   2   8   Value1
2   3   9   Value2
3   4   10  Else
4   5   11  11
Umbrella answered 8/11, 2019 at 12:57 Comment(0)
R
4

I believe Series.map() to be very readable and efficient, e.g.:

df["c2"] = df["c1"].map(lambda x: 10 if x == 'Value' else x)

I like it because if the conditional logic gets more complex you can move it to a function and just pass in that function instead of the lambda.

If you need to base your conditional logic on more than one column you can use DataFrame.apply() as others suggest.

Reuter answered 27/7, 2021 at 12:55 Comment(0)
T
1

Try out df.apply() if you've a small/medium dataframe,

df['c2'] = df.apply(lambda x: 10 if x['c1'] == 'Value' else x['c1'], axis = 1)

Else, follow the slicing techniques mentioned in the above comments if you've got a big dataframe.

Tsana answered 29/7, 2020 at 4:33 Comment(0)
P
0

Many of the answers provided are great! Just wanted to add 1 extra thing. Building off of the most upvoted answer which uses .loc, if you have a list of values you want to check for you can just modify the line:

df['c1'].loc[df['c1'] == 'Value'] = 10

to

df['c1'].loc[df['c1'].isin(['Value1','Value2','Value3'])] = 10 

Note, as explained in the most upvoted answer, this replaces values in the column.

Paratyphoid answered 21/8, 2023 at 13:8 Comment(0)
N
-1

I had a big dataset and .loc[] was taking too long so I found a vectorized way to do it. Recall that you can set a column to a logical operator, so this works:

file['Flag'] = (file['Claim_Amount'] > 0)

This gives a Boolean, which I wanted, but you can multiply it by, say, 1 to make an Integer.

Northeasterly answered 4/11, 2020 at 19:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.