How to deal with SettingWithCopyWarning
and ChainedAssignmentError
in Pandas?
This post is meant for readers who,
- Would like to understand what this warning means
- Would like to understand different ways of suppressing this warning
- Would like to understand how to improve their code and follow good practices to avoid this warning in the future.
Setup
np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (3, 5)), columns=list('ABCDE'))
df
A B C D E
0 5 0 3 3 7
1 9 3 5 2 4
2 7 6 8 8 1
What is the SettingWithCopyWarning
?
To know how to deal with this warning, it is important to understand what it means and why it is raised in the first place.
When filtering DataFrames, it is possible slice/index a frame to return either a view, or a copy, depending on the internal layout and various implementation details. A "view" is, as the term suggests, a view into the original data, so modifying the view may modify the original object. On the other hand, a "copy" is a replication of data from the original, and modifying the copy has no effect on the original.
As mentioned by other answers, the SettingWithCopyWarning
was created to flag "chained assignment" operations. Consider df
in the setup above. Suppose you would like to select all values in column "B" where values in column "A" is > 5. Pandas allows you to do this in different ways, some more correct than others. For example,
df[df.A > 5]['B']
1 3
2 6
Name: B, dtype: int64
And,
df.loc[df.A > 5, 'B']
1 3
2 6
Name: B, dtype: int64
These return the same result, so if you are only reading these values, it makes no difference. So, what is the issue? The problem with chained assignment, is that it is generally difficult to predict whether a view or a copy is returned, so this largely becomes an issue when you are attempting to assign values back. To build on the earlier example, consider how this code is executed by the interpreter:
df.loc[df.A > 5, 'B'] = 4
# becomes
df.__setitem__((df.A > 5, 'B'), 4)
With a single __setitem__
call to df
. OTOH, consider this code:
df[df.A > 5]['B'] = 4
# becomes
df.__getitem__(df.A > 5).__setitem__('B', 4)
Now, depending on whether __getitem__
returned a view or a copy, the __setitem__
operation may not work.
In general, you should use loc
for label-based assignment, and iloc
for integer/positional based assignment, as the spec guarantees that they always operate on the original. Additionally, for setting a single cell, you should use at
and iat
.
More can be found in the documentation.
Note
All boolean indexing operations done with loc
can also be done with iloc
. The only difference is that iloc
expects either
integers/positions for index or a numpy array of boolean values, and
integer/position indexes for the columns.
For example,
df.loc[df.A > 5, 'B'] = 4
Can be written nas
df.iloc[(df.A > 5).values, 1] = 4
And,
df.loc[1, 'A'] = 100
Can be written as
df.iloc[1, 0] = 100
And so on.
from pandas >= 2.0, you can enable Copy-on-write optimizations to save on memory and avoid making copies of data until written to (if possible).
This can be enabled by
pd.options.mode.copy_on_write = True
After this, attempts to make chained assignments will result in
ChainedAssignmentError: A value is trying to be set on a copy of a DataFrame or Series through chained assignment.
When using the Copy-on-Write mode, such chained assignment never works to update the original DataFrame or Series, because the intermediate object on which we are setting values always behaves as a copy.
Try using '.loc[row_indexer, col_indexer] = value' instead, to perform the assignment in a single step.
The error is raised in a similar setting to the SettingWithCopyWarning
.
Just tell me how to suppress the warning!
Consider a simple operation on the "A" column of df
. Selecting "A" and dividing by 2 will raise the warning, but the operation will work.
df2 = df[['A']]
df2['A'] /= 2
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/__main__.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
df2
A
0 2.5
1 4.5
2 3.5
There are a couple ways of directly silencing this warning:
(recommended) Use loc
to slice subsets:
df2 = df.loc[:, ['A']]
df2['A'] /= 2 # Does not raise
Change pd.options.mode.chained_assignment
Can be set to None
, "warn"
, or "raise"
. "warn"
is the default. None
will suppress the warning entirely, and "raise"
will throw a SettingWithCopyError
, preventing the operation from going through.
pd.options.mode.chained_assignment = None
df2['A'] /= 2
Make a deepcopy
df2 = df[['A']].copy(deep=True)
df2['A'] /= 2
@Peter Cotton in the comments, came up with a nice way of non-intrusively changing the mode (modified from this gist) using a context manager, to set the mode only as long as it is required, and the reset it back to the original state when finished.
class ChainedAssignent:
def __init__(self, chained=None):
acceptable = [None, 'warn', 'raise']
assert chained in acceptable, "chained must be in " + str(acceptable)
self.swcw = chained
def __enter__(self):
self.saved_swcw = pd.options.mode.chained_assignment
pd.options.mode.chained_assignment = self.swcw
return self
def __exit__(self, *args):
pd.options.mode.chained_assignment = self.saved_swcw
The usage is as follows:
# Some code here
with ChainedAssignent():
df2['A'] /= 2
# More code follows
Or, to raise the exception
with ChainedAssignent(chained='raise'):
df2['A'] /= 2
SettingWithCopyError:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
The "XY Problem": What am I doing wrong?
A lot of the time, users attempt to look for ways of suppressing this exception without fully understanding why it was raised in the first place. This is a good example of an XY problem, where users attempt to solve a problem "Y" that is actually a symptom of a deeper rooted problem "X". Questions will be raised based on common problems that encounter this warning, and solutions will then be presented.
Question 1
I have a DataFrame
df
A B C D E
0 5 0 3 3 7
1 9 3 5 2 4
2 7 6 8 8 1
I want to assign values in col "A" > 5 to 1000. My expected output is
A B C D E
0 5 0 3 3 7
1 1000 3 5 2 4
2 1000 6 8 8 1
Wrong way to do this:
df.A[df.A > 5] = 1000 # works, because df.A returns a view
df[df.A > 5]['A'] = 1000 # does not work
df.loc[df.A > 5]['A'] = 1000 # does not work
Right way using loc
:
df.loc[df.A > 5, 'A'] = 1000
Question 21
I am trying to set the value in cell (1, 'D') to 12345. My expected output is
A B C D E
0 5 0 3 3 7
1 9 3 5 12345 4
2 7 6 8 8 1
I have tried different ways of accessing this cell, such as
df['D'][1]
. What is the best way to do this?
1. This question isn't specifically related to the warning, but
it is good to understand how to do this particular operation correctly
so as to avoid situations where the warning could potentially arise in
future.
You can use any of the following methods to do this.
df.loc[1, 'D'] = 12345
df.iloc[1, 3] = 12345
df.at[1, 'D'] = 12345
df.iat[1, 3] = 12345
Question 3
I am trying to subset values based on some condition. I have a
DataFrame
A B C D E
1 9 3 5 2 4
2 7 6 8 8 1
I would like to assign values in "D" to 123 such that "C" == 5. I
tried
df2.loc[df2.C == 5, 'D'] = 123
Which seems fine but I am still getting the
SettingWithCopyWarning
! How do I fix this?
This is actually probably because of code higher up in your pipeline. Did you create df2
from something larger, like
df2 = df[df.A > 5]
? In this case, boolean indexing will return a view, so df2
will reference the original. What you'd need to do is assign df2
to a copy:
df2 = df[df.A > 5].copy()
# Or,
# df2 = df.loc[df.A > 5, :]
Question 4
I'm trying to drop column "C" in-place from
A B C D E
1 9 3 5 2 4
2 7 6 8 8 1
But using
df2.drop('C', axis=1, inplace=True)
Throws SettingWithCopyWarning
. Why is this happening?
This is because df2
must have been created as a view from some other slicing operation, such as
df2 = df[df.A > 5]
The solution here is to either make a copy()
of df
, or use loc
, as before.
df.set_value
has been deprecated. Pandas now recommends to use.at[]
or.iat[]
instead. docs here pandas.pydata.org/pandas-docs/stable/generated/… – Thrusterdf.loc[:, foo]
avoidsSettingWithCopyWarning
, whereasdf[foo]
causesSettingWithCopyWarning
. – Assortment