How to deal with SettingWithCopyWarning in Pandas
D

25

1414

Background

I just upgraded my Pandas from 0.11 to 0.13.0rc1. Now, the application is popping out many new warnings. One of them like this:

E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE

I want to know what exactly it means? Do I need to change something?

How should I suspend the warning if I insist to use quote_df['TVol'] = quote_df['TVol']/TVOL_SCALE?

The function that gives warnings

def _decode_stock_quote(list_of_150_stk_str):
    """decode the webpage and return dataframe"""

    from cStringIO import StringIO

    str_of_all = "".join(list_of_150_stk_str)

    quote_df = pd.read_csv(StringIO(str_of_all), sep=',', names=list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefg')) #dtype={'A': object, 'B': object, 'C': np.float64}
    quote_df.rename(columns={'A':'STK', 'B':'TOpen', 'C':'TPCLOSE', 'D':'TPrice', 'E':'THigh', 'F':'TLow', 'I':'TVol', 'J':'TAmt', 'e':'TDate', 'f':'TTime'}, inplace=True)
    quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]
    quote_df['TClose'] = quote_df['TPrice']
    quote_df['RT']     = 100 * (quote_df['TPrice']/quote_df['TPCLOSE'] - 1)
    quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE
    quote_df['TAmt']   = quote_df['TAmt']/TAMT_SCALE
    quote_df['STK_ID'] = quote_df['STK'].str.slice(13,19)
    quote_df['STK_Name'] = quote_df['STK'].str.slice(21,30)#.decode('gb2312')
    quote_df['TDate']  = quote_df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10])
    
    return quote_df

More warning messages

E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE
E:\FinReporter\FM_EXT.py:450: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TAmt']   = quote_df['TAmt']/TAMT_SCALE
E:\FinReporter\FM_EXT.py:453: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TDate']  = quote_df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10])
Dacia answered 17/12, 2013 at 3:48 Comment(7)
Here's a context manager to temporarily set the warning level gist.github.com/notbanker/2be3ed34539c86e22ffdd88fd95ad8bcWhiny
pandas.pydata.org/pandas-docs/stable/… official document explain in detailShoshanashoshanna
@leonprou df.set_value has been deprecated. Pandas now recommends to use .at[] or .iat[] instead. docs here pandas.pydata.org/pandas-docs/stable/generated/…Thruster
Using df.loc[:, foo] avoids SettingWithCopyWarning, whereas df[foo] causes SettingWithCopyWarning.Assortment
Does this answer your question? Set value for particular cell in pandas DataFrame using indexElectrodynamometer
@Assortment df.loc[:, foo] is also giving me SettingWithCopyWarning: asking me to use Try using .loc[row_indexer,col_indexer] = value instead I don't really have any row_indexer since I want to carry out this assignment for all rows. How do I do that?Dignity
Looking up some other related solutions I did a .copy() first to create my dataframe. Then it worked even with df[foo] = ...Dignity
O
1651

The SettingWithCopyWarning was created to flag potentially confusing "chained" assignments, such as the following, which does not always work as expected, particularly when the first selection returns a copy. [see GH5390 and GH5597 for background discussion.]

df[df['A'] > 2]['B'] = new_val  # new_val not set in df

The warning offers a suggestion to rewrite as follows:

df.loc[df['A'] > 2, 'B'] = new_val

However, this doesn't fit your usage, which is equivalent to:

df = df[df['A'] > 2]
df['B'] = new_val

While it's clear that you don't care about writes making it back to the original frame (since you are overwriting the reference to it), unfortunately this pattern cannot be differentiated from the first chained assignment example. Hence the (false positive) warning. The potential for false positives is addressed in the docs on indexing, if you'd like to read further. You can safely disable this new warning with the following assignment.

import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'

Other Resources

Occidentalize answered 17/12, 2013 at 6:20 Comment(6)
I was using a slice of a dataframe, doing modifications in that slice and was getting this error. I created this slice by doing a .copy() on the original dataframe, and it worked.Sponson
How should I deal with df = df[df['A'].notnull()]?Stonehenge
@Occidentalize please consider adding @RishabhAgrahari's comment to your answer. It's a way better idea to tell pandas you meant to manipulate a copy of the slice by calling .copy() on it than outright disabling the warning for all cases in your runtime.Bigler
In my case, the warning was not set on the line to modify. I filtered data on the first line and then created a calculated field on a second line. The warning is displayed for the second line and in fact the solution is to modify the first line by adding a call to .copy(). This is misleading and sometimes the two lines are seperated by quite a big amount of code.Katey
Using .copy() on the slice is a bad workaround, because it introduces an extra, unneeded copy. A subsetting operation like df[df.a > 2] already creates a new dataframe, so there's no algorithmic need for an additional copy.Peltier
Summarizing: if you just want to remove the warning use: pd.options.mode.chained_assignment = None. This isn't a long term fix, but it will get by.Allergic
S
845

How to deal with SettingWithCopyWarning and ChainedAssignmentError in Pandas?

This post is meant for readers who,

  1. Would like to understand what this warning means
  2. Would like to understand different ways of suppressing this warning
  3. Would like to understand how to improve their code and follow good practices to avoid this warning in the future.

Setup

np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (3, 5)), columns=list('ABCDE'))
df
   A  B  C  D  E
0  5  0  3  3  7
1  9  3  5  2  4
2  7  6  8  8  1

What is the SettingWithCopyWarning?

To know how to deal with this warning, it is important to understand what it means and why it is raised in the first place.

When filtering DataFrames, it is possible slice/index a frame to return either a view, or a copy, depending on the internal layout and various implementation details. A "view" is, as the term suggests, a view into the original data, so modifying the view may modify the original object. On the other hand, a "copy" is a replication of data from the original, and modifying the copy has no effect on the original.

As mentioned by other answers, the SettingWithCopyWarning was created to flag "chained assignment" operations. Consider df in the setup above. Suppose you would like to select all values in column "B" where values in column "A" is > 5. Pandas allows you to do this in different ways, some more correct than others. For example,

df[df.A > 5]['B']

1    3
2    6
Name: B, dtype: int64

And,

df.loc[df.A > 5, 'B']

1    3
2    6
Name: B, dtype: int64

These return the same result, so if you are only reading these values, it makes no difference. So, what is the issue? The problem with chained assignment, is that it is generally difficult to predict whether a view or a copy is returned, so this largely becomes an issue when you are attempting to assign values back. To build on the earlier example, consider how this code is executed by the interpreter:

df.loc[df.A > 5, 'B'] = 4
# becomes
df.__setitem__((df.A > 5, 'B'), 4)

With a single __setitem__ call to df. OTOH, consider this code:

df[df.A > 5]['B'] = 4
# becomes
df.__getitem__(df.A > 5).__setitem__('B', 4)

Now, depending on whether __getitem__ returned a view or a copy, the __setitem__ operation may not work.

In general, you should use loc for label-based assignment, and iloc for integer/positional based assignment, as the spec guarantees that they always operate on the original. Additionally, for setting a single cell, you should use at and iat.

More can be found in the documentation.

Note All boolean indexing operations done with loc can also be done with iloc. The only difference is that iloc expects either integers/positions for index or a numpy array of boolean values, and integer/position indexes for the columns.

For example,

df.loc[df.A > 5, 'B'] = 4

Can be written nas

df.iloc[(df.A > 5).values, 1] = 4

And,

df.loc[1, 'A'] = 100

Can be written as

df.iloc[1, 0] = 100

And so on.

from pandas >= 2.0, you can enable Copy-on-write optimizations to save on memory and avoid making copies of data until written to (if possible).

This can be enabled by

pd.options.mode.copy_on_write = True

After this, attempts to make chained assignments will result in

ChainedAssignmentError: A value is trying to be set on a copy of a DataFrame or Series through chained assignment.
When using the Copy-on-Write mode, such chained assignment never works to update the original DataFrame or Series, because the intermediate object on which we are setting values always behaves as a copy.

Try using '.loc[row_indexer, col_indexer] = value' instead, to perform the assignment in a single step.

The error is raised in a similar setting to the SettingWithCopyWarning.


Just tell me how to suppress the warning!

Consider a simple operation on the "A" column of df. Selecting "A" and dividing by 2 will raise the warning, but the operation will work.

df2 = df[['A']]
df2['A'] /= 2
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/__main__.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

df2
     A
0  2.5
1  4.5
2  3.5

There are a couple ways of directly silencing this warning:

  1. (recommended) Use loc to slice subsets:

     df2 = df.loc[:, ['A']]
     df2['A'] /= 2     # Does not raise
    
  2. Change pd.options.mode.chained_assignment Can be set to None, "warn", or "raise". "warn" is the default. None will suppress the warning entirely, and "raise" will throw a SettingWithCopyError, preventing the operation from going through.

     pd.options.mode.chained_assignment = None
     df2['A'] /= 2
    
  3. Make a deepcopy

     df2 = df[['A']].copy(deep=True)
     df2['A'] /= 2
    

@Peter Cotton in the comments, came up with a nice way of non-intrusively changing the mode (modified from this gist) using a context manager, to set the mode only as long as it is required, and the reset it back to the original state when finished.

class ChainedAssignent:
    def __init__(self, chained=None):
        acceptable = [None, 'warn', 'raise']
        assert chained in acceptable, "chained must be in " + str(acceptable)
        self.swcw = chained

    def __enter__(self):
        self.saved_swcw = pd.options.mode.chained_assignment
        pd.options.mode.chained_assignment = self.swcw
        return self

    def __exit__(self, *args):
        pd.options.mode.chained_assignment = self.saved_swcw

The usage is as follows:

# Some code here
with ChainedAssignent():
    df2['A'] /= 2
# More code follows

Or, to raise the exception

with ChainedAssignent(chained='raise'):
    df2['A'] /= 2

SettingWithCopyError:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

The "XY Problem": What am I doing wrong?

A lot of the time, users attempt to look for ways of suppressing this exception without fully understanding why it was raised in the first place. This is a good example of an XY problem, where users attempt to solve a problem "Y" that is actually a symptom of a deeper rooted problem "X". Questions will be raised based on common problems that encounter this warning, and solutions will then be presented.

Question 1 I have a DataFrame

df
       A  B  C  D  E
    0  5  0  3  3  7
    1  9  3  5  2  4
    2  7  6  8  8  1

I want to assign values in col "A" > 5 to 1000. My expected output is

      A  B  C  D  E
0     5  0  3  3  7
1  1000  3  5  2  4
2  1000  6  8  8  1

Wrong way to do this:

df.A[df.A > 5] = 1000         # works, because df.A returns a view
df[df.A > 5]['A'] = 1000      # does not work
df.loc[df.A > 5]['A'] = 1000   # does not work

Right way using loc:

df.loc[df.A > 5, 'A'] = 1000

Question 21 I am trying to set the value in cell (1, 'D') to 12345. My expected output is

   A  B  C      D  E
0  5  0  3      3  7
1  9  3  5  12345  4
2  7  6  8      8  1

I have tried different ways of accessing this cell, such as df['D'][1]. What is the best way to do this?

1. This question isn't specifically related to the warning, but it is good to understand how to do this particular operation correctly so as to avoid situations where the warning could potentially arise in future.

You can use any of the following methods to do this.

df.loc[1, 'D'] = 12345
df.iloc[1, 3] = 12345
df.at[1, 'D'] = 12345
df.iat[1, 3] = 12345

Question 3 I am trying to subset values based on some condition. I have a DataFrame

   A  B  C  D  E
1  9  3  5  2  4
2  7  6  8  8  1

I would like to assign values in "D" to 123 such that "C" == 5. I tried

df2.loc[df2.C == 5, 'D'] = 123

Which seems fine but I am still getting the SettingWithCopyWarning! How do I fix this?

This is actually probably because of code higher up in your pipeline. Did you create df2 from something larger, like

df2 = df[df.A > 5]

? In this case, boolean indexing will return a view, so df2 will reference the original. What you'd need to do is assign df2 to a copy:

df2 = df[df.A > 5].copy()
# Or,
# df2 = df.loc[df.A > 5, :]

Question 4 I'm trying to drop column "C" in-place from

   A  B  C  D  E
1  9  3  5  2  4
2  7  6  8  8  1

But using

df2.drop('C', axis=1, inplace=True)

Throws SettingWithCopyWarning. Why is this happening?

This is because df2 must have been created as a view from some other slicing operation, such as

df2 = df[df.A > 5]

The solution here is to either make a copy() of df, or use loc, as before.

Schlegel answered 28/12, 2018 at 7:18 Comment(14)
P.S.: Let me know if your situation is not covered under section 3's list of questions. I will amend my post.Schlegel
I think it would be helpful for Question 2 to link to a question addressing the differences between loc, iloc, at, and iat. You are probably more aware of such a question than I am, but I'm happy to seek one if it would be helpful.Corporation
This question address the case where you want to use loc and iloc at the same time, iloc for rows and loc for columnsCorporation
@cs95: Could you add an XY description around the case where you are trying to create a new column based on simple math operations on an existing one. As in df['new_col'] = df['old_col']/2. Where 'new_col' does not yet exist. ThxGreatgrandaunt
@BryanP unless I'm mistaken that should more or less be covered under the "Just tell me how to suppress the warning!" section.Schlegel
@cs95: Thanks. making a copy when creating the data frame worked (in my case the structure was simple enough that .copy() was enough. However, this doesn't seem like just suppressing the warning. I had originally skipped that section since my goal was to fix the problem rather than to tell Pandas to ignore it. It seems using copy() to make the original dataframe its own thing (rather than a view and assuming it is derived from other places) seems to actually fix the underlying problem, right? Not just suppress the warning?Greatgrandaunt
is there a way to do boolean indexing that explicitly returns a new object, without calling .copy() ? Because I want to make the new column only after slimming down the dataframeBilliards
@CiprianTomoiagă loc always return a copy when slicing data.Schlegel
@cs95 - thanks. Nearly everyone focusses on your questions 1 and 2. But, I had exactly the subset problem in question 3. I discovered that after the subnet is created, _is_copy is no longer None which lead me to using .copy().Thereupon
Thank a lot! I always use .loc in my code because I was aware of the "chained assignment" problem, but I was still getting nearly random annoying SettingWithCopyWarning now and then. It turns out that my problem is what you cover in your "Question 3". This took me months to realise, so perhaps it should be pointed out more clearly in the documentation... anyway thanks a lot again!Synagogue
Well, it seems kind of random that the "Question 3" throws "SettingWithCopyWarning" because it is a different problem (here you might be surprised that you are changing df, in other cases you might be surprised that your changes get lost). This warning is very annoying.Medievalism
@cs95 what about df["widget"].replace("foo", "bar", inplace=True) ?Hiddenite
What a great explanation. Comments have many references to "Question 3", but it's "Answer 3", that solved my problem. Specifically, "This is actually probably because of code higher up in your pipeline. Did you create df2 from something larger, like df2 = df[df.A > 5]". Doh! That was exactly it. Thanks.Hinshelwood
For pandas Series (that is, not a DataFrame), with pandas 1.5.3, I get this warning which actually now is an error, when trying to use either loc or iloc. All I want to do is add a row in-place to a pandas Series, and it seems impossible without using concat of old Series and new row. Btw, despite an error is thrown, the value is being set, which is weird. Any help is appreciatedMetz
M
176

In general the point of the SettingWithCopyWarning is to show users (and especially new users) that they may be operating on a copy and not the original as they think. There are false positives (IOW if you know what you are doing it could be ok). One possibility is simply to turn off the (by default warn) warning as @Garrett suggest.

Here is another option:

In [1]: df = DataFrame(np.random.randn(5, 2), columns=list('AB'))

In [2]: dfa = df.ix[:, [1, 0]]

In [3]: dfa.is_copy
Out[3]: True

In [4]: dfa['A'] /= 2
/usr/local/bin/ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  #!/usr/local/bin/python

You can set the is_copy flag to False, which will effectively turn off the check, for that object:

In [5]: dfa.is_copy = False

In [6]: dfa['A'] /= 2

If you explicitly copy then no further warning will happen:

In [7]: dfa = df.ix[:, [1, 0]].copy()

In [8]: dfa['A'] /= 2

The code the OP is showing above, while legitimate, and probably something I do as well, is technically a case for this warning, and not a false positive. Another way to not have the warning would be to do the selection operation via reindex, e.g.

quote_df = quote_df.reindex(columns=['STK', ...])

Or,

quote_df = quote_df.reindex(['STK', ...], axis=1)  # v.0.21
Misreport answered 17/12, 2013 at 20:49 Comment(1)
I think it's an understatement to say that there are false positives. I don't think I've ever had this warning help me, and the number of times I've had it clog up my output is insane. It's also bad programming practice: if you start ignoring the warnings in your output because you know they are pure rubbish, you can start to miss real problems. It's also annoying to have to turn off the same warnings all the time.Designation
P
60

Here I answer the question directly. How can we deal with it?

Make a .copy(deep=False) after you slice. See pandas.DataFrame.copy.

Wait, doesn't a slice return a copy? After all, this is what the warning message is attempting to say? Read the long answer:

import pandas as pd
df = pd.DataFrame({'x':[1,2,3]})

This gives a warning:

df0 = df[df.x>2]
df0['foo'] = 'bar'

This does not:

df1 = df[df.x>2].copy(deep=False)
df1['foo'] = 'bar'

Both df0 and df1 are DataFrame objects, but something about them is different that enables pandas to print the warning. Let's find out what it is.

import inspect
slice= df[df.x>2]
slice_copy = df[df.x>2].copy(deep=False)
inspect.getmembers(slice)
inspect.getmembers(slice_copy)

Using your diff tool of choice, you will see that beyond a couple of addresses, the only material difference is this:

|          | slice   | slice_copy |
| _is_copy | weakref | None       |

The method that decides whether to warn is DataFrame._check_setitem_copy which checks _is_copy. So here you go. Make a copy so that your DataFrame is not _is_copy.

The warning is suggesting to use .loc, but if you use .loc on a frame that _is_copy, you will still get the same warning. Misleading? Yes. Annoying? You bet. Helpful? Potentially, when chained assignment is used. But it cannot correctly detect chain assignment and prints the warning indiscriminately.

Poteat answered 27/2, 2019 at 21:26 Comment(3)
Good sleuthing. FWIW I also found that _is_copy is None for the original df and a weakref for the slice. Further, _is_copy() on the slice returns all the rows of the original df. But the reference printed by _is_copy is not the same as the id of the original df. Does the slice somehow make a copy? Also, am wondering if a shallow copy would cause some other issue down the line or with a newer version of pandas?Principalities
This answer surely deserves a separate badge for writing style.Frankfrankalmoign
Hands-down the most concrete and direct answer to the question. Very well put.Warms
G
49

Pandas dataframe copy warning

When you go and do something like this:

quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]

pandas.ix in this case returns a new, stand alone dataframe.

Any values you decide to change in this dataframe, will not change the original dataframe.

This is what pandas tries to warn you about.


Why .ix is a bad idea

The .ix object tries to do more than one thing, and for anyone who has read anything about clean code, this is a strong smell.

Given this dataframe:

df = pd.DataFrame({"a": [1,2,3,4], "b": [1,1,2,2]})

Two behaviors:

dfcopy = df.ix[:,["a"]]
dfcopy.a.ix[0] = 2

Behavior one: dfcopy is now a stand alone dataframe. Changing it will not change df

df.ix[0, "a"] = 3

Behavior two: This changes the original dataframe.


Use .loc instead

The pandas developers recognized that the .ix object was quite smelly[speculatively] and thus created two new objects which helps in the accession and assignment of data. (The other being .iloc)

.loc is faster, because it does not try to create a copy of the data.

.loc is meant to modify your existing dataframe inplace, which is more memory efficient.

.loc is predictable, it has one behavior.


The solution

What you are doing in your code example is loading a big file with lots of columns, then modifying it to be smaller.

The pd.read_csv function can help you out with a lot of this and also make the loading of the file a lot faster.

So instead of doing this

quote_df = pd.read_csv(StringIO(str_of_all), sep=',', names=list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefg')) #dtype={'A': object, 'B': object, 'C': np.float64}
quote_df.rename(columns={'A':'STK', 'B':'TOpen', 'C':'TPCLOSE', 'D':'TPrice', 'E':'THigh', 'F':'TLow', 'I':'TVol', 'J':'TAmt', 'e':'TDate', 'f':'TTime'}, inplace=True)
quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]

Do this

columns = ['STK', 'TPrice', 'TPCLOSE', 'TOpen', 'THigh', 'TLow', 'TVol', 'TAmt', 'TDate', 'TTime']
df = pd.read_csv(StringIO(str_of_all), sep=',', usecols=[0,3,2,1,4,5,8,9,30,31])
df.columns = columns

This will only read the columns you are interested in, and name them properly. No need for using the evil .ix object to do magical stuff.

Glycosuria answered 24/10, 2016 at 9:1 Comment(0)
S
34

This topic is really confusing with Pandas. Luckily, it has a relatively simple solution.

The problem is that it is not always clear whether data filtering operations (e.g. loc) return a copy or a view of the DataFrame. Further use of such filtered DataFrame could therefore be confusing.

The simple solution is (unless you need to work with very large sets of data):

Whenever you need to update any values, always make sure that you explicitly copy the DataFrame before the assignment.

df  # Some DataFrame
df = df.loc[:, 0:2]  # Some filtering (unsure whether a view or copy is returned)
df = df.copy()  # Ensuring a copy is made
df[df["Name"] == "John"] = "Johny"  # Assignment can be done now (no warning)
Say answered 8/6, 2019 at 16:4 Comment(1)
For large datasets you can make a shallow (deep=False) copy. Still it seems too much to suppress a warning.Hawsehole
T
25

Just simply:

import pandas as pd
# ...
pd.set_option('mode.chained_assignment', None)
Tack answered 16/10, 2021 at 11:43 Comment(0)
S
20

I had been getting this issue with .apply() when assigning a new dataframe from a pre-existing dataframe on which I've used the .query() method. For instance:

prop_df = df.query('column == "value"')
prop_df['new_column'] = prop_df.apply(function, axis=1)

Would return this error. The fix that seems to resolve the error in this case is by changing this to:

prop_df = df.copy(deep=True)
prop_df = prop_df.query('column == "value"')
prop_df['new_column'] = prop_df.apply(function, axis=1)

However, this is not efficient especially when using large dataframes, due to having to make a new copy.

If you're using the .apply() method in generating a new column and its values, a fix that resolves the error and is more efficient is by adding .reset_index(drop=True):

prop_df = df.query('column == "value"').reset_index(drop=True)
prop_df['new_column'] = prop_df.apply(function, axis=1)
Sensitivity answered 27/3, 2020 at 12:17 Comment(0)
Z
13

To remove any doubt, my solution was to make a deep copy of the slice instead of a regular copy. This may not be applicable depending on your context (Memory constraints / size of the slice, potential for performance degradation - especially if the copy occurs in a loop like it did for me, etc...)

To be clear, here is the warning I received:

/opt/anaconda3/lib/python3.6/site-packages/ipykernel/__main__.py:54:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Illustration

I had doubts that the warning was thrown because of a column I was dropping on a copy of the slice. While not technically trying to set a value in the copy of the slice, that was still a modification of the copy of the slice.

Below are the (simplified) steps I have taken to confirm the suspicion, I hope it will help those of us who are trying to understand the warning.

Example 1: dropping a column on the original affects the copy

We knew that already but this is a healthy reminder. This is NOT what the warning is about.

>> data1 = {'A': [111, 112, 113], 'B':[121, 122, 123]}
>> df1 = pd.DataFrame(data1)
>> df1

    A    B
0    111    121
1    112    122
2    113    123


>> df2 = df1
>> df2

A    B
0    111    121
1    112    122
2    113    123

# Dropping a column on df1 affects df2
>> df1.drop('A', axis=1, inplace=True)
>> df2
    B
0    121
1    122
2    123

It is possible to avoid changes made on df1 to affect df2. Note: you can avoid importing copy.deepcopy by doing df.copy() instead.

>> data1 = {'A': [111, 112, 113], 'B':[121, 122, 123]}
>> df1 = pd.DataFrame(data1)
>> df1

A    B
0    111    121
1    112    122
2    113    123

>> import copy
>> df2 = copy.deepcopy(df1)
>> df2
A    B
0    111    121
1    112    122
2    113    123

# Dropping a column on df1 does not affect df2
>> df1.drop('A', axis=1, inplace=True)
>> df2
    A    B
0    111    121
1    112    122
2    113    123

Example 2: dropping a column on the copy may affect the original

This actually illustrates the warning.

>> data1 = {'A': [111, 112, 113], 'B':[121, 122, 123]}
>> df1 = pd.DataFrame(data1)
>> df1

    A    B
0    111    121
1    112    122
2    113    123

>> df2 = df1
>> df2

    A    B
0    111    121
1    112    122
2    113    123

# Dropping a column on df2 can affect df1
# No slice involved here, but I believe the principle remains the same?
# Let me know if not
>> df2.drop('A', axis=1, inplace=True)
>> df1

B
0    121
1    122
2    123

It is possible to avoid changes made on df2 to affect df1

>> data1 = {'A': [111, 112, 113], 'B':[121, 122, 123]}
>> df1 = pd.DataFrame(data1)
>> df1

    A    B
0    111    121
1    112    122
2    113    123

>> import copy
>> df2 = copy.deepcopy(df1)
>> df2

A    B
0    111    121
1    112    122
2    113    123

>> df2.drop('A', axis=1, inplace=True)
>> df1

A    B
0    111    121
1    112    122
2    113    123
Zink answered 27/7, 2017 at 22:19 Comment(0)
H
11

This should work:

quote_df.loc[:,'TVol'] = quote_df['TVol']/TVOL_SCALE
Holarctic answered 9/3, 2018 at 9:48 Comment(0)
L
9

Some may want to simply suppress the warning:

class SupressSettingWithCopyWarning:
    def __enter__(self):
        pd.options.mode.chained_assignment = None

    def __exit__(self, *args):
        pd.options.mode.chained_assignment = 'warn'

with SupressSettingWithCopyWarning():
    #code that produces warning
Lustrous answered 17/5, 2019 at 9:47 Comment(0)
F
8

As this question is already fully explained and discussed in existing answers, I will just provide a neat pandas approach to the context manager using pandas.option_context (links to documentation and example) - there is absolutely isn't any need to create a custom class with all the dunder methods and other bells and whistles.

First the context manager code itself:

from contextlib import contextmanager

@contextmanager
def SuppressPandasWarning():
    with pd.option_context("mode.chained_assignment", None):
        yield

Then an example:

import pandas as pd
from string import ascii_letters

a = pd.DataFrame({"A": list(ascii_letters[0:4]), "B": range(0,4)})

mask = a["A"].isin(["c", "d"])
# Even shallow copy below is enough to not raise the warning, but why is a mystery to me.
b = a.loc[mask]  # .copy(deep=False)

# Raises the `SettingWithCopyWarning`
b["B"] = b["B"] * 2

# Does not!
with SuppressPandasWarning():
    b["B"] = b["B"] * 2

It is worth noticing is that both approaches do not modify a, which is a bit surprising to me, and even a shallow df copy with .copy(deep=False) would prevent this warning to be raised (as far as I understand, shallow copy should at least modify a as well, but it doesn't. pandas magic.).

Fulvi answered 3/2, 2020 at 13:41 Comment(0)
A
7

You could avoid the whole problem like this, I believe:

return (
    pd.read_csv(StringIO(str_of_all), sep=',', names=list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefg')) #dtype={'A': object, 'B': object, 'C': np.float64}
    .rename(columns={'A':'STK', 'B':'TOpen', 'C':'TPCLOSE', 'D':'TPrice', 'E':'THigh', 'F':'TLow', 'I':'TVol', 'J':'TAmt', 'e':'TDate', 'f':'TTime'}, inplace=True)
    .ix[:,[0,3,2,1,4,5,8,9,30,31]]
    .assign(
        TClose=lambda df: df['TPrice'],
        RT=lambda df: 100 * (df['TPrice']/quote_df['TPCLOSE'] - 1),
        TVol=lambda df: df['TVol']/TVOL_SCALE,
        TAmt=lambda df: df['TAmt']/TAMT_SCALE,
        STK_ID=lambda df: df['STK'].str.slice(13,19),
        STK_Name=lambda df: df['STK'].str.slice(21,30)#.decode('gb2312'),
        TDate=lambda df: df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10]),
    )
)

Using Assign. From the documentation: Assign new columns to a DataFrame, returning a new object (a copy) with all the original columns in addition to the new ones.

See Tom Augspurger's article on method chaining in pandas: Modern Pandas (Part 2): Method Chaining

Anglian answered 13/10, 2017 at 14:45 Comment(0)
P
7

This might apply to NumPy only, which means you might need to import it, but the data I used for my examples NumPy was not essential with the calculations, but you can simply stop this settingwithcopy warning message, by using this one line of code below:

np.warnings.filterwarnings('ignore')
Pisa answered 31/12, 2021 at 21:12 Comment(1)
This one is the best one! Thanks. The copy warning is really annoying!Bismuthous
P
6

Follow-up beginner question / remark

Maybe a clarification for other beginners like me (I come from R which seems to work a bit differently under the hood). The following harmless-looking and functional code kept producing the SettingWithCopy warning, and I couldn't figure out why. I had both read and understood the issued with "chained indexing", but my code doesn't contain any:

def plot(pdb, df, title, **kw):
    df['target'] = (df['ogg'] + df['ugg']) / 2
    # ...

But then, later, much too late, I looked at where the plot() function is called:

    df = data[data['anz_emw'] > 0]
    pixbuf = plot(pdb, df, title)

So "df" isn't a data frame, but an object that somehow remembers that it was created by indexing a data frame (so is that a view?) which would make the line in plot(),

 df['target'] = ...

equivalent to

 data[data['anz_emw'] > 0]['target'] = ...

which is a chained indexing.

Anyway,

def plot(pdb, df, title, **kw):
    df.loc[:,'target'] = (df['ogg'] + df['ugg']) / 2

fixed it.

Planospore answered 13/2, 2019 at 7:39 Comment(2)
A tad late to the party, but the .loc should probably go to df = data[data['anz_emw'] > 0], not the plot() function.Fulvi
This explanation was the only one that got through to me (maybe because I'm also coming from R). Thanks!Shoestring
M
5

If you have assigned the slice to a variable and want to set using the variable as in the following:

df2 = df[df['A'] > 2]
df2['B'] = value

And you do not want to use Jeff's solution, because your condition computing df2 is to long or for some other reason, then you can use the following:

df.loc[df2.index.tolist(), 'B'] = value

df2.index.tolist() returns the indices from all entries in df2, which will then be used to set column B in the original dataframe.

Martinsen answered 24/6, 2017 at 1:30 Comment(2)
this is 9 time more expensive then df["B"] = valueMarikomaril
Can you explain this more deeply @ClaudiuCreanga?Circumrotate
H
4

I was facing the same warning, while I executed this part of my code:

def scaler(self, numericals):
    scaler = MinMaxScaler()
    self.data.loc[:, numericals[0]] = scaler.fit_transform(self.data.loc[:, numericals[0]])
    self.data.loc[:, numericals[1]] = scaler.fit_transform(self.data.loc[:, numericals[1]])

where scaler is a MinMaxScaler and numericals[0] contains names of three of my numerical columns.

The warning was removed as I changed the code to:

def scaler(self, numericals):
    scaler = MinMaxScaler()
    self.data.loc[:][numericals[0]] = scaler.fit_transform(self.data.loc[:][numericals[0]])
    self.data.loc[:][numericals[1]] = scaler.fit_transform(self.data.loc[:][numericals[1]])

So, just change [:, ~] to [:][~].

Huihuie answered 13/7, 2021 at 12:19 Comment(0)
J
3

In my case, I would create a new column based on the index, but I got the same warning as you:

df_temp["Quarter"] = df_temp.index.quarter

I use insert() instead of direct assignment, and it works for me:

df_temp.insert(loc=0, column='Quarter', value=df_temp.index.quarter)
Jamesy answered 12/2, 2022 at 7:54 Comment(0)
V
3

Why is this happening?

Selecting a list of columns and assigning it to a variable creates a copy. In pandas, slicing or indexing a dataframe creates a copy. But unlike function calls such as filter(), query() etc. that also create a copy, you can assign a value to a sliced or indexed dataframe, which becomes a problem as the new assignments (which are chained assignments) might not work. So the SettingWithCopyWarning is basically reminding you that you're assigning new values to a copy.

What is sometimes confusing is that it is not raised if an assignment on the copy doesn't change the shape of (or makes the copy to have the same shape as) the original dataframe.

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 2, 3]})
df2 = df1[df1['A']<3]        # <--- one row is filtered out
df2.loc[0, 'C'] = 1          # <--- SettingWithCopyWarning (because it is replacing a value on the copy)
df2.loc[3, 'C'] = 1          # <--- no warning (because it adds a new row which makes df2's shape the same as df1)

Same deal with filtering columns; changing a copy that is a subset of the original dataframe raises the warning but if the copy is no longer a subset, there is no warning.

df2 = df1[['A']]               # <--- filters one column out
df2.loc[2, 'A'] = 100          # <--- SettingWithCopyWarning (tries to change a value on the copy)
df2['C'] = 100                 # <--- no warning (adds new column to copy)

Solution: Enable Copy-on-Write

Since pandas 1.5.0, pandas has Copy-on-Write (CoW) mode that makes any dataframe/Series derived from another behave like a copy; so when it is enabled, values in a dataframe/Series can be changed only by modifying the object itself. One consequence is SettingWithCopyWarning will never be raised. Another is chained assignment never works. Also, a copy is created only if data is shared with another object (normally, most pandas methods create a copy which slows down the code), so pandas operations are faster with CoW.

This is planned to be the default behavior by pandas 3.0 but as of now, you have to turn it on.

To enable it globally,

pd.options.mode.copy_on_write = True

or to turn it on locally with a context manager:

with pd.option_context("mode.copy_on_write", True):
    # do operations

Example 1 (SettingWithCopyWarning is silenced):

def func():
    df = pd.DataFrame({'A': range(5), 'B': range(5,0,-1)})
    df1 = df[['B']]           # select a list of columns
    df1.loc[0, 'B'] = 1       # assign a value to the copy
    
func()                        # <---- SettingWithCopyWarning


pd.options.mode.copy_on_write = True
func()                        # <---- no warning

Example 2 (chained assignment doesn't work):

pd.options.mode.copy_on_write = False
df = pd.DataFrame({'A': range(5), 'B': range(5,0,-1)})
df['B'][df['A']<4] = 10     # <---- df changes; no warning
df[df['A']<4]['B'] = 10     # <---- df doesn't change; throws SettingWithCopyWarning



pd.options.mode.copy_on_write = True
df = pd.DataFrame({'A': range(5), 'B': range(5,0,-1)})
df['B'][df['A']<4] = 10     # <---- df doesn't change; no warning
df[df['A']<4]['B'] = 10     # <---- df doesn't change; no warning

Example 3 (views are returned with chained methods, which improves performance a lot):

df = pd.DataFrame({'A': range(1_000_000), 'B': range(1_000_000)})

%%timeit
with pd.option_context('mode.copy_on_write', False):
    df.add_prefix('col ').set_index('col A').rename_axis('index col').reset_index()
# 30.5 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit
with pd.option_context('mode.copy_on_write', True):
    df.add_prefix('col ').set_index('col A').rename_axis('index col').reset_index()
# 18 ms ± 513 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Vizard answered 23/5, 2023 at 8:17 Comment(0)
H
3

This line at the beginning of my 'next steps' fixed the problem for me:

df = df.iloc[:] #To avoid SettingWithCopyWarning 
Horseshoe answered 26/9, 2023 at 5:35 Comment(2)
This is a really clean solution that worked for me as well. Thank you.Clinkstone
You're welcome. Glad you found it useful :-)Horseshoe
G
2

For me this issue occurred in a following simplified example. And I was also able to solve it (hopefully with a correct solution):

Old code with warning:

def update_old_dataframe(old_dataframe, new_dataframe):
    for new_index, new_row in new_dataframe.iterrorws():
        old_dataframe.loc[new_index] = update_row(old_dataframe.loc[new_index], new_row)

def update_row(old_row, new_row):
    for field in [list_of_columns]:
        # line with warning because of chain indexing old_dataframe[new_index][field]
        old_row[field] = new_row[field]
    return old_row

This printed the warning for the line old_row[field] = new_row[field]

Since the rows in update_row method are actually type Series, I replaced the line with:

old_row.at[field] = new_row.at[field]

I.e., a method for accessing/lookups for a Series. Even though both works just fine and the result is same, this way I don't have to disable the warnings (=keep them for other chain indexing issues somewhere else).

Gesso answered 27/11, 2017 at 9:39 Comment(0)
F
1

I use the .loc indexer (property) to subset DataFrames in order to avoid SettingWithCopyWarnings when manipulating the resulting subsets (such as new_df below):

# GOOD way to select a columns subset:
new_df = df.loc[:, cols_subset]

#.. vs. bad way:
new_df = df[cols_subset]
# works but gives this warning:
# A value is trying to be set on a copy of a slice from a DataFrame.
# Try using .loc[row_indexer,col_indexer] = value instead
Frankfrankalmoign answered 11/11, 2023 at 12:8 Comment(0)
W
0

Just create a copy of your dataframe(s) using the .copy() method before the warning appears, to remove all of your warnings.

This happens, because we do not want to make changes to the original quote_df. In other words, we do not want to play with the reference of the object of the quote_df which we have created for quote_df.

quote_df = quote_df.copy()
Warfare answered 13/10, 2021 at 15:13 Comment(1)
This is needlessly a deep copy (default option is deep=True)Frankfrankalmoign
D
0

In my case, I just use PDCsv.loc[index, name] = NewVal to realise the function:

PDCsv.loc[0, 'Name'] = 'Anthony Dave'
Desmarais answered 30/10, 2023 at 3:18 Comment(0)
F
0

I'm using .copy() to create new df to avoid the warning.

df_new = quote_df.copy()
df_new['TVol'] = quote_df['TVol']/TVOL_SCALE
Faltboat answered 2/2 at 3:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.