Apply function to each cell in DataFrame
Asked Answered
D

4

156

I have a dataframe that may look like this:

A        B        C
foo      bar      foo bar
bar foo  foo      bar

I want to look through every element of each row (or every element of each column) and apply the following function to get the subsequent dataframe:

def foo_bar(x):
    return x.replace('foo', 'wow')

After applying the function, my dataframe will look like this:

A        B        C
wow      bar      wow bar
bar wow  wow      bar

Is there a simple one-liner that can apply a function to each cell?

This is a simplistic example so there may be an easier way to execute this specific example other than applying a function, but what I am really asking about is how to apply a function in every cell within a dataframe.

Dira answered 13/9, 2016 at 17:39 Comment(0)
V
234

You can use applymap() which is concise for your case.

df.applymap(foo_bar)

#     A       B       C
#0  wow     bar wow bar
#1  bar wow wow     bar

Another option is to vectorize your function and then use apply method:

import numpy as np
df.apply(np.vectorize(foo_bar))
#     A       B       C
#0  wow     bar wow bar
#1  bar wow wow     bar
Viscera answered 13/9, 2016 at 17:42 Comment(1)
N
3

I guess you could use np.vectorize:

>>> df[:] = np.vectorize(foo_bar)(df)
>>> df
       A    B    C
foo  bar  wow  bar
bar  wow  wow  bar
>>> 

This might be quicker, since it's using numpy.

Nasa answered 14/10, 2021 at 9:58 Comment(0)
H
2

Note: The original version of this answer referred to applymap but since pandas 2.1.0, that method has been renamed to map, so the answer was edited to reflect that change. Every point made in this answer applies to applymap as well.

Expanding on Psidom's answer, if the function you define accepts additional arguments, then you can pass them along using kwargs. For example, to toggle repl of foo_bar() in the OP:

def foo_bar(x, bar=''):
    return x.replace('foo', bar)

df.map(foo_bar, bar='haha')

One of the common cases where map is especially useful is string operations (as in the OP). Since string operations in pandas are not optimized, a loop often performs better than vectorized operations especially if there are many operations. For example, for the following simple task of replacing values in a frame (with 1mil rows) using a condition, map is 2 times faster than an equivalent vectorized pandas code. The following test was run using pandas 2.2.1, numpy 1.26.3 on Python 3.11.5.

def foo_bar(x):
    return x.replace('foo', 'wow') if len(x)>3 else x + ' this'

df = pd.DataFrame([['foo', 'bar', 'foo bar'], ['bar foo', 'foo', 'bar']]*500000, columns=[*'ABC'])

%timeit df.map(foo_bar)
# 1.25 s ± 109 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df.apply(lambda x: np.where(x.str.len()>3, x.str.replace('foo', 'wow'), x + ' this'))
# 2.56 s ± 57.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Healey answered 15/2, 2023 at 21:32 Comment(0)
P
-1

In current versions of Pandas, applymap is deprecated (since version 2.1.0). You can use map instead:

df.applymap(foo_bar)
Postliminy answered 22/2 at 23:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.