Maybe late but I think it may help especially for someone who reach this question.
When we use the foo
like:
def foo(row: pd.Series):
row['b'] = '42'
and then use it in:
df.apply(foo, axis=1)
we won't expect to occur any change in df
but it occers. why?
Let's review what happens under the hood:
apply
function calls foo
and pass one row to it. As it is not of type of specific types
in python (like int, float, str, ...) but is an object, so by python rules it is passed by reference not by value. So it is completely equivalent with the row that is sent by apply
function.(Equal in values and both points to same block of ram.)
So any change to row
in foo
function will changes the row
- which its type is pandas.series
and that points to a block of memory that df.row
resides - immediately.
We can rewrite the foo
(I name it bar
) function to not change anything inplace. ( by deep copying row
that means make another row with same value(s) but on another cell of ram). This is what relly happens when we use lambda
in apply
function.
def bar(row: pd.Series):
row_temp=row.copy(deep=True)
row_temp['b'] = '42'
return row_temp
Complete Code
import pandas as pd
#Changes df in place -- not like lamda
def foo(row: pd.Series):
row['b'] = '42'
#Do not change df inplace -- works like lambda
def bar(row: pd.Series):
row_temp = row.copy(deep=True)
row_temp['b'] = '42'
return row_temp
df2 = pd.DataFrame(columns=['a', 'b'])
df2['a'] = ['a0', 'a1']
df2['b'] = ['b0', 'b1']
print(df2)
# No change inplace
df_b = df2.apply(bar, axis=1)
print(df2)
# bar function works
print(df_b)
print(df2)
# Changes inplace
df2.apply(foo, axis=1)
print(df2)
Output
#df2 before any change
a b
0 a0 b0
1 a1 b1
#calling df2.apply(bar, axis=1) not changed df2 inplace
a b
0 a0 b0
1 a1 b1
#df_b = df2.apply(bar, axis=1) #bar is working as expected
a b
0 a0 42
1 a1 42
#print df2 again to assure it is not changed
a b
0 a0 b0
1 a1 b1
#call df2.apply(foo, axis=1) -- as we see foo changed df2 inplace ( to compare with bar)
a b
0 a0 42
1 a1 42
df2['a']
the values['a0','b0']
. But in your df2 output the data is different. why? – Marmite