Why does pandas apply calculate twice

T

3

39

I'm using the apply method on a panda's DataFrame object. When my DataFrame has a single column, it appears that the applied function is being called twice. The questions are why? And, can I stop that behavior?

Code:

import pandas as pd

def mul2(x):
    print ('hello')
    return 2*x

df = pd.DataFrame({'a': [1,2,0.67,1.34]})
df.apply(mul2)

Output:

hello
hello

0  2.00
1  4.00
2  1.34
3  2.68

I'm printing 'hello' from within the function being applied. I know it's being applied twice because 'hello' printed twice. What's more is that if I had two columns, 'hello' prints 3 times. Even more still is when I call applied to just the column 'hello' prints 4 times.

Code:

df.a.apply(mul2)

Output:

hello
hello
hello
hello
0    2.00
1    4.00
2    1.34
3    2.68
Name: a, dtype: float64

Thyroid answered 7/2, 2014 at 19:11 Comment(0)

C

10

This behavior has been fixed with pandas 1.1, please upgrade!

Now, apply and applymap on DataFrame evaluates first row/column only once.

Initially, we had GroupBy.apply and Series/df.apply evaluating the first group twice. The reason the first group is evaluated twice is because apply wants to know whether it can "optimize" the calculation (sometimes this is possible if apply receives a numpy or cythonized function). With pandas 0.25, this behavior was fixed for GroupBy.apply. Now, with pandas 1.1, this will also be fixed for df.apply.

Old Behavior [pandas <= 1.0.X]

pd.__version__ 
# '1.0.4'

df.apply(mul2)
hello
hello

      a
0  2.00
1  4.00
2  1.34
3  2.68

New Behavior [pandas >= 1.1]

pd.__version__
# '1.1.0.dev0+2004.g8d10bfb6f'

df.apply(mul2)
hello

      a
0  2.00
1  4.00
2  1.34
3  2.68

Chrisse answered 14/7, 2020 at 10:28 Comment(7)

I mean look at that reputation score... gratz on 200k! Nice to see you around. – Thyroid 14/7, 2020 at 14:45

@Thyroid yessir, it's been a while! Was hoping to catch you on chat the other day but you're not pingable anymore :( – Chrisse 14/7, 2020 at 15:7

I haven't been on chat. Working from home with only one monitor. I can't afford the real estate for a chat window... so I forget to open it /-: – Thyroid 14/7, 2020 at 15:9

@Thyroid and gratz to you on 200 as well. Really miss seeing those piR-esque answers popping up every now and then. PS when I answered this I didn't notice who the asker was initially. Came as nothing but a pleasant surprise when I did. Stay safe! – Chrisse 14/7, 2020 at 15:11

@cs95, I'm facing an issue which is not solved even in 1.1.0 #63401273 – Phionna 13/8, 2020 at 18:46

See the post from Lucas. It's a bug in apply that updates the row inplace. – Attrahent 13/8, 2020 at 19:5

I'm on version 1.4.4 and groupby.apply is calling the function twice for EACH group! – Generally 11/10, 2023 at 6:24

S

16

This behavior is intended, as an optimization.

See the docs:

In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.

Snowblind answered 22/10, 2016 at 14:54 Comment(5)

Is there a way to avoid this? – Thievery 9/9, 2019 at 9:15

I don't know and I think not. Simply because it's thought of as and optimization. – Snowblind 9/9, 2019 at 11:49

Apparently >= 0.25.0 has fixed this. – Thievery 10/9, 2019 at 4:52

>= 0.25.0 no longer has apply twice on groupby combinator: github.com/pandas-dev/pandas/pull/24748#issuecomment-532148732 But normal apply still does has a double execution. – Thievery 16/3, 2020 at 5:34

@Thievery From 1.1 the behavior has been amended for df.apply as well, see here for more. – Chrisse 14/7, 2020 at 11:31

P

13

Probably related to this issue. With groupby, the applied function is called one extra time to see if certain optimizations can be done. I'd guess something similar is going on here. It doesn't look like there's any way around it at the moment (although I could be wrong about the source of the behavior you're seeing). Is there a reason you need it to not do that extra call.

Also, calling it four times when you apply on the column is normal. When you get one columnm you get a Series, not a DataFrame. apply on a Series applies the function to each element. Since your column has four elements in it, the function is called four times.

Peyter answered 7/2, 2014 at 19:21 Comment(1)

The function I'm using is recursive. I'm trying to avoid it doing the recursive calculation more than it needs to. Right now, its not an issue, but it could be. – Thyroid 7/2, 2014 at 19:27

C

10

This behavior has been fixed with pandas 1.1, please upgrade!

Now, apply and applymap on DataFrame evaluates first row/column only once.

Initially, we had GroupBy.apply and Series/df.apply evaluating the first group twice. The reason the first group is evaluated twice is because apply wants to know whether it can "optimize" the calculation (sometimes this is possible if apply receives a numpy or cythonized function). With pandas 0.25, this behavior was fixed for GroupBy.apply. Now, with pandas 1.1, this will also be fixed for df.apply.

Old Behavior [pandas <= 1.0.X]

pd.__version__ 
# '1.0.4'

df.apply(mul2)
hello
hello

      a
0  2.00
1  4.00
2  1.34
3  2.68

New Behavior [pandas >= 1.1]

pd.__version__
# '1.1.0.dev0+2004.g8d10bfb6f'

df.apply(mul2)
hello

      a
0  2.00
1  4.00
2  1.34
3  2.68

Chrisse answered 14/7, 2020 at 10:28 Comment(7)

I mean look at that reputation score... gratz on 200k! Nice to see you around. – Thyroid 14/7, 2020 at 14:45

@Thyroid yessir, it's been a while! Was hoping to catch you on chat the other day but you're not pingable anymore :( – Chrisse 14/7, 2020 at 15:7

I haven't been on chat. Working from home with only one monitor. I can't afford the real estate for a chat window... so I forget to open it /-: – Thyroid 14/7, 2020 at 15:9

@Thyroid and gratz to you on 200 as well. Really miss seeing those piR-esque answers popping up every now and then. PS when I answered this I didn't notice who the asker was initially. Came as nothing but a pleasant surprise when I did. Stay safe! – Chrisse 14/7, 2020 at 15:11

@cs95, I'm facing an issue which is not solved even in 1.1.0 #63401273 – Phionna 13/8, 2020 at 18:46

See the post from Lucas. It's a bug in apply that updates the row inplace. – Attrahent 13/8, 2020 at 19:5

I'm on version 1.4.4 and groupby.apply is calling the function twice for EACH group! – Generally 11/10, 2023 at 6:24

This behavior has been fixed with pandas 1.1, please upgrade!

This behavior has been fixed with pandas 1.1, please upgrade!

Recommended topics

Hot tags