Difference between map, applymap and apply methods in Pandas
Asked Answered
F

12

723

Can you tell me when to use these vectorization methods with basic examples?

I see that map is a Series method whereas the rest are DataFrame methods. I got confused about apply and applymap methods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!

Foggy answered 5/11, 2013 at 20:20 Comment(4)
Correct me if I am wrong, but I believe those functions are not vectorizing methods as they are all involving a loop over the elements they are applied on.Gunboat
I can't see a difference here: gist.github.com/MartinThoma/e320cbb937afb4ff766f75988f1c65e6Egidius
Marillion, I provided very reductive and simple examples in my answer below. Hope it helps!Impaction
Should I add DataFrame.pipe() method to the comparison?Pleader
F
356

Comparing map, applymap and apply: Context Matters

The major differences are:

Definition

  • map is defined on Series only
  • applymap is defined on DataFrames only
  • apply is defined on both

Input argument

  • map accepts dict, Series, or callable
  • applymap and apply accept callable only

Behavior

  • map is elementwise for Series
  • applymap is elementwise for DataFrames
  • apply also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.

Use case (the most important difference)

  • map is meant for mapping values from one domain to another, so is optimised for performance, e.g.,

    df['A'].map({1:'a', 2:'b', 3:'c'})
    
  • applymap is good for elementwise transformations across multiple rows/columns, e.g.,

    df[['A', 'B', 'C']].applymap(str.strip)
    
  • apply is for applying any function that cannot be vectorised, e.g.,

    df['sentences'].apply(nltk.sent_tokenize)
    

Also see When should I (not) want to use pandas apply() in my code? for a writeup I made a while back on the most appropriate scenarios for using apply. (Note that there aren't many, but there are a few— apply is generally slow.)


Summarising

map applymap apply
Defined on Series? Yes No Yes
Defined on DataFrame? No Yes Yes
Argument dict, Series, or callable1 callable2 callable
Elementwise? Yes Yes Yes
Aggregation? No No Yes
Use Case Transformation/mapping3 Transformation More complex functions
Returns Series DataFrame scalar, Series, or DataFrame4

Footnotes

  1. map when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.

  2. applymap in more recent versions has been optimised for some operations. You will find applymap slightly faster than apply in some cases. My suggestion is to test them both and use whatever works better.

  3. map is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.

  4. Series.apply returns a scalar for aggregating operations, Series otherwise. Similarly for DataFrame.apply. Note that apply also has fastpaths when called with certain NumPy functions such as mean, sum, etc.

Falconry answered 25/5, 2019 at 1:26 Comment(2)
Why are you saying apply is elementwise? Are you considering columns and rows as the elements of a dataframe?Ardisardisj
pandas 2.1.0 (2023-08-30) renamed df.applymap() to df.map(). The former still works for the time being, but logs a FutureWarning recommending to use the more sensible name.Charolettecharon
A
731

apply works on a row / column basis of a DataFrame
applymap works element-wise on a DataFrame
map works element-wise on a Series


Straight from Wes McKinney's Python for Data Analysis book, pg. 132 (I highly recommended this book):

Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this:

In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [117]: frame
Out[117]: 
               b         d         e
Utah   -0.029638  1.081563  1.280300
Ohio    0.647747  0.831136 -1.549481
Texas   0.513416 -0.884417  0.195343
Oregon -0.485454 -0.477388 -0.309548

In [118]: f = lambda x: x.max() - x.min()

In [119]: frame.apply(f)
Out[119]: 
b    1.133201
d    1.965980
e    2.829781
dtype: float64

Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.

Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:

In [120]: format = lambda x: '%.2f' % x

In [121]: frame.applymap(format)
Out[121]: 
            b      d      e
Utah    -0.03   1.08   1.28
Ohio     0.65   0.83  -1.55
Texas    0.51  -0.88   0.20
Oregon  -0.49  -0.48  -0.31

The reason for the name applymap is that Series has a map method for applying an element-wise function:

In [122]: frame['e'].map(format)
Out[122]: 
Utah       1.28
Ohio      -1.55
Texas      0.20
Oregon    -0.31
Name: e, dtype: object
Abramson answered 5/11, 2013 at 20:40 Comment(9)
strictly speaking, applymap internally is implemented via apply with a little wrap-up over passed function parameter (rougly speaking replacing func to lambda x: [func(y) for y in x], and applying column-wise)Rufus
Thanks for the explanation. Since map and applymap both work element-wise, I would expect a single method (either map or applymap) which would work both for a Series and a DataFrame. Probably there are other design considerations, and Wes McKinney decided to come up with two different methods.Foggy
It's on page 129 in my copy for some reason. There's no label for second edition or anything.Harrow
Is there a way to do applymap along with groupby function in pandas?Angelangela
How to apply a function on grouped columnwise data?Abstract
I would suggest not using format as a function name (as in example 2), since format is already a built-in function.Stipulation
@everestial007 That is the same as just using applymap directly, so what is the point of grouping before an element wise transformation?Falconry
@Abstract use .transform on pp.264.Danicadanice
Since the concept of an axis does not exist in a Series object, what's the difference between Series.map and Series.apply?Horrendous
F
356

Comparing map, applymap and apply: Context Matters

The major differences are:

Definition

  • map is defined on Series only
  • applymap is defined on DataFrames only
  • apply is defined on both

Input argument

  • map accepts dict, Series, or callable
  • applymap and apply accept callable only

Behavior

  • map is elementwise for Series
  • applymap is elementwise for DataFrames
  • apply also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.

Use case (the most important difference)

  • map is meant for mapping values from one domain to another, so is optimised for performance, e.g.,

    df['A'].map({1:'a', 2:'b', 3:'c'})
    
  • applymap is good for elementwise transformations across multiple rows/columns, e.g.,

    df[['A', 'B', 'C']].applymap(str.strip)
    
  • apply is for applying any function that cannot be vectorised, e.g.,

    df['sentences'].apply(nltk.sent_tokenize)
    

Also see When should I (not) want to use pandas apply() in my code? for a writeup I made a while back on the most appropriate scenarios for using apply. (Note that there aren't many, but there are a few— apply is generally slow.)


Summarising

map applymap apply
Defined on Series? Yes No Yes
Defined on DataFrame? No Yes Yes
Argument dict, Series, or callable1 callable2 callable
Elementwise? Yes Yes Yes
Aggregation? No No Yes
Use Case Transformation/mapping3 Transformation More complex functions
Returns Series DataFrame scalar, Series, or DataFrame4

Footnotes

  1. map when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.

  2. applymap in more recent versions has been optimised for some operations. You will find applymap slightly faster than apply in some cases. My suggestion is to test them both and use whatever works better.

  3. map is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.

  4. Series.apply returns a scalar for aggregating operations, Series otherwise. Similarly for DataFrame.apply. Note that apply also has fastpaths when called with certain NumPy functions such as mean, sum, etc.

Falconry answered 25/5, 2019 at 1:26 Comment(2)
Why are you saying apply is elementwise? Are you considering columns and rows as the elements of a dataframe?Ardisardisj
pandas 2.1.0 (2023-08-30) renamed df.applymap() to df.map(). The former still works for the time being, but logs a FutureWarning recommending to use the more sensible name.Charolettecharon
I
97

Quick Summary

  • DataFrame.apply operates on entire rows or columns at a time.

  • DataFrame.applymap, Series.apply, and Series.map operate on one element at time.

Series.apply and Series.map are similar and often interchangeable. Some of their slight differences are discussed in osa's answer below.

Inborn answered 11/8, 2016 at 15:20 Comment(0)
I
45

Adding to the other answers, in a Series there are also map and apply.

Apply can make a DataFrame out of a series; however, map will just put a series in every cell of another series, which is probably not what you want.

In [40]: p=pd.Series([1,2,3])
In [41]: p
Out[31]:
0    1
1    2
2    3
dtype: int64

In [42]: p.apply(lambda x: pd.Series([x, x]))
Out[42]: 
   0  1
0  1  1
1  2  2
2  3  3

In [43]: p.map(lambda x: pd.Series([x, x]))
Out[43]: 
0    0    1
1    1
dtype: int64
1    0    2
1    2
dtype: int64
2    0    3
1    3
dtype: int64
dtype: object

Also if I had a function with side effects, such as "connect to a web server", I'd probably use apply just for the sake of clarity.

series.apply(download_file_for_every_element) 

Map can use not only a function, but also a dictionary or another series. Let's say you want to manipulate permutations.

Take

1 2 3 4 5
2 1 4 5 3

The square of this permutation is

1 2 3 4 5
1 2 5 3 4

You can compute it using map. Not sure if self-application is documented, but it works in 0.15.1.

In [39]: p=pd.Series([1,0,3,4,2])

In [40]: p.map(p)
Out[40]: 
0    0
1    1
2    4
3    2
4    3
dtype: int64
Impediment answered 8/12, 2014 at 23:30 Comment(1)
Also, .apply() lets you pass in kwargs into the function while .map() doesn't.Satan
L
23

@jeremiahbuddha mentioned that apply works on row/columns, while applymap works element-wise. But it seems you can still use apply for element-wise computation....

frame.apply(np.sqrt)
Out[102]: 
               b         d         e
Utah         NaN  1.435159       NaN
Ohio    1.098164  0.510594  0.729748
Texas        NaN  0.456436  0.697337
Oregon  0.359079       NaN       NaN

frame.applymap(np.sqrt)
Out[103]: 
               b         d         e
Utah         NaN  1.435159       NaN
Ohio    1.098164  0.510594  0.729748
Texas        NaN  0.456436  0.697337
Oregon  0.359079       NaN       NaN
Libbi answered 19/12, 2013 at 17:21 Comment(1)
Good catch with this. The reason this works in your example is because np.sqrt is a ufunc, i.e. if you give it an array, it will broadcast the sqrt function onto each element of the array. So when apply pushes np.sqrt on each columns, np.sqrt works itself on each of the elements of the columns, so you are essentially getting the same result as applymap.Abramson
S
16

Probably the simplest explanation the difference between apply and applymap:

apply takes the whole column as a parameter and then assign the result to this column

applymap takes the separate cell value as a parameter and assign the result back to this cell.

NB If apply returns the single value you will have this value instead of the column after assigning and eventually will have just a row instead of matrix.

Schell answered 20/5, 2016 at 2:10 Comment(1)
An example showing each case would have been more helpful. ThanksDepot
S
12

Just wanted to point out, as I struggled with this for a bit

def f(x):
    if x < 0:
        x = 0
    elif x > 100000:
        x = 100000
    return x

df.applymap(f)
df.describe()

this does not modify the dataframe itself, has to be reassigned:

df = df.applymap(f)
df.describe()
Synod answered 26/9, 2015 at 1:30 Comment(3)
I sometimes have trouble in figuring out whether you have to reassign or not after doing something with the df. It's mostly trial and error for me, but I bet there is a logic to how it works (that I am missing out).Foggy
in general, a pandas dataframe is only modified by either reassigning df = modified_df or if you set inplace=True flag. Also dataframe will change if you pass a dataframe to a function by reference and the function modifies the dataframeSynod
This is not entirely true, think of .ix or .where etc. Not sure what the full explanation is for when you need to re-assign and when not.Ricardo
M
7

Based on the answer of cs95

  • map is defined on Series ONLY
  • applymap is defined on DataFrames ONLY
  • apply is defined on BOTH

give some examples

In [3]: frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [4]: frame
Out[4]:
            b         d         e
Utah    0.129885 -0.475957 -0.207679
Ohio   -2.978331 -1.015918  0.784675
Texas  -0.256689 -0.226366  2.262588
Oregon  2.605526  1.139105 -0.927518

In [5]: myformat=lambda x: f'{x:.2f}'

In [6]: frame.d.map(myformat)
Out[6]:
Utah      -0.48
Ohio      -1.02
Texas     -0.23
Oregon     1.14
Name: d, dtype: object

In [7]: frame.d.apply(myformat)
Out[7]:
Utah      -0.48
Ohio      -1.02
Texas     -0.23
Oregon     1.14
Name: d, dtype: object

In [8]: frame.applymap(myformat)
Out[8]:
            b      d      e
Utah     0.13  -0.48  -0.21
Ohio    -2.98  -1.02   0.78
Texas   -0.26  -0.23   2.26
Oregon   2.61   1.14  -0.93

In [9]: frame.apply(lambda x: x.apply(myformat))
Out[9]:
            b      d      e
Utah     0.13  -0.48  -0.21
Ohio    -2.98  -1.02   0.78
Texas   -0.26  -0.23   2.26
Oregon   2.61   1.14  -0.93


In [10]: myfunc=lambda x: x**2

In [11]: frame.applymap(myfunc)
Out[11]:
            b         d         e
Utah    0.016870  0.226535  0.043131
Ohio    8.870453  1.032089  0.615714
Texas   0.065889  0.051242  5.119305
Oregon  6.788766  1.297560  0.860289

In [12]: frame.apply(myfunc)
Out[12]:
            b         d         e
Utah    0.016870  0.226535  0.043131
Ohio    8.870453  1.032089  0.615714
Texas   0.065889  0.051242  5.119305
Oregon  6.788766  1.297560  0.860289
Madlynmadman answered 5/5, 2020 at 3:56 Comment(0)
I
5

Just for additional context and intuition, here's an explicit and concrete example of the differences.

Assume you have the following function seen below. ( This label function, will arbitrarily split the values into 'High' and 'Low', based upon the threshold you provide as the parameter (x). )

def label(element, x):
    if element > x:
        return 'High'
    else:
        return 'Low'

In this example, lets assume our dataframe has one column with random numbers.

Df with one column that has random numbers

If you tried mapping the label function with map:

df['ColumnName'].map(label, x = 0.8)

You will result with the following error:

TypeError: map() got an unexpected keyword argument 'x'

Now take the same function and use apply, and you'll see that it works:

df['ColumnName'].apply(label, x=0.8)

Series.apply() can take additional arguments element-wise, while the Series.map() method will return an error.

Now, if you're trying to apply the same function to several columns in your dataframe simultaneously, DataFrame.applymap() is used.

df[['ColumnName','ColumnName2','ColumnName3','ColumnName4']].applymap(label)

Lastly, you can also use the apply() method on a dataframe, but the DataFrame.apply() method has different capabilities. Instead of applying functions element-wise, the df.apply() method applies functions along an axis, either column-wise or row-wise. When we create a function to use with df.apply(), we set it up to accept a series, most commonly a column.

Here is an example:

df.apply(pd.value_counts)

When we applied the pd.value_counts function to the dataframe, it calculated the value counts for all the columns.

Notice, and this is very important, when we used the df.apply() method to transform multiple columns. This is only possible because the pd.value_counts function operates on a series. If we tried to use the df.apply() method to apply a function that works element-wise to multiple columns, we'd get an error:

For example:

def label(element):
    if element > 1:
        return 'High'
    else:
        return 'Low'

df[['ColumnName','ColumnName2','ColumnName3','ColumnName4']].apply(label)

This will result with the following error:

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', u'occurred at index Economy')

In general, we should only use the apply() method when a vectorized function does not exist. Recall that pandas uses vectorization, the process of applying operations to whole series at once, to optimize performance. When we use the apply() method, we're actually looping through rows, so a vectorized method can perform an equivalent task faster than the apply() method.

apply, applymap, map summarization

Here are some examples of vectorized functions that already exist that you do NOT want to recreate using any type of apply/map methods:

  1. Series.str.split() Splits each element in the Series
  2. Series.str.strip() Strips whitespace from each string in the Series.
  3. Series.str.lower() Converts strings in the Series to lowercase.
  4. Series.str.upper() Converts strings in the Series to uppercase.
  5. Series.str.get() Retrieves the ith element of each element in the Series.
  6. Series.str.replace() Replaces a regex or string in the Series with another string
  7. Series.str.cat() Concatenates strings in a Series.
  8. Series.str.extract() Extracts substrings from the Series matching a regex pattern.
Impaction answered 23/12, 2020 at 19:26 Comment(0)
L
3

My understanding:

From the function point of view:

If the function has variables that need to compare within a column/ row, use apply.

e.g.: lambda x: x.max()-x.mean().

If the function is to be applied to each element:

1> If a column/row is located, use apply

2> If apply to entire dataframe, use applymap

majority = lambda x : x > 17
df2['legal_drinker'] = df2['age'].apply(majority)

def times10(x):
  if type(x) is int:
    x *= 10 
  return x
df2.applymap(times10)
Laylalayman answered 8/6, 2018 at 1:29 Comment(1)
Please provide df2 also for better clarity so that we can test your code.Tremble
E
2

FOMO:

The following example shows apply and applymap applied to a DataFrame.

map function is something you do apply on Series only. You cannot apply map on DataFrame.

The thing to remember is that apply can do anything applymap can, but apply has eXtra options.

The X factor options are: axis and result_type where result_type only works when axis=1 (for columns).

df = DataFrame(1, columns=list('abc'),
                  index=list('1234'))
print(df)

f = lambda x: np.log(x)
print(df.applymap(f)) # apply to the whole dataframe
print(np.log(df)) # applied to the whole dataframe
print(df.applymap(np.sum)) # reducing can be applied for rows only

# apply can take different options (vs. applymap cannot)
print(df.apply(f)) # same as applymap
print(df.apply(sum, axis=1))  # reducing example
print(df.apply(np.log, axis=1)) # cannot reduce
print(df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand')) # expand result

As a sidenote, Series map function, should not be confused with the Python map function.

The first one is applied on Series, to map the values, and the second one to every item of an iterable.


Lastly don't confuse the dataframe apply method with groupby apply method.

Eschatology answered 7/5, 2019 at 17:20 Comment(0)
P
0

Observation: pandas applymap was deprecated with new version, renamed to map https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.map.html#pandas.DataFrame.map pandas updates

Puffball answered 16/11, 2023 at 8:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.