It's well known (and understandable) that pandas behavior is essentially unpredictable when assigning to a slice. But I'm used to being warned about it by SettingWithCopy
warning.
Why is the warning not generated in either of the following two code snippets, and what techniques could reduce the chance of writing such code unintentionally?
# pandas 0.18.1, python 3.5.1
import pandas as pd
data = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
new_data = data[['a', 'b']]
data = data['a']
new_data.loc[0, 'a'] = 100 # no warning, doesn't propagate to data
data[0] == 1
True
data = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
new_data = data['a']
data = data['a']
new_data.loc[0] = 100 # no warning, propagates to data
data[0] == 100
True
I thought the explanation was that pandas only produces the warning when the parent DataFrame is still reachable from the current context. (This would be a weakness of the detection algorithm, as my previous examples show.)
In the next snippet, AFAIK the original two-column DataFrame is no longer reachable, and yet pandas warning mechanism manages to trigger (luckily):
data = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
new_data = data['a']
data = data[['a']]
new_data.loc[0] = 100 # warning, so we're safe
Edit:
While investigating this, I found another case of a missing warning:
data = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
data = data.groupby('a')
new_data = data.filter(lambda g: len(g)==1)
new_data.loc[0, 'a'] = 100 # no warning, does not propagate to data
assert data.filter(lambda g: True).loc[0, 'a'] == 1
Even though an almost identical example does trigger a warning:
data = pd.DataFrame({'a': [1, 2, 2], 'b': ['a', 'b', 'c']})
data = data.groupby('a')
new_data = data.filter(lambda g: len(g)==1)
new_data.loc[0, 'a'] = 100 # warning, does not propagate to data
assert data.filter(lambda g: True).loc[0, 'a'] == 1
Update: I'm responding to the answer by @firelynx here because it's hard to put it in the comment.
In the answer, @firelynx says that the first code snippet results in no warning because I'm taking the entire dataframe. But even if I took part of it, I still don't get a warning:
# pandas 0.18.1, python 3.5.1
import pandas as pd
data = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c'], c: range(3)})
new_data = data[['a', 'b']]
data = data['a']
new_data.loc[0, 'a'] = 100 # no warning, doesn't propagate to data
data[0] == 1
True
new_data
anddata
are the same object. The assignment doesn't "propagate"; it just occurs in one object that has two names pointing at it. More generally, I don't think there is any guarantee that SettingWithCopyWarning will or won't be raised in any particular situation (especially more complicated situations like thegroupby
examples you added). It's just a rough safeguard to prevent the most easily-catchable errors. – Fionnulanew_data
anddata
are the same object?new_data
has typeDataFrame
, anddata
has typeDataFrameGroupBy
. – Bonaventuredata
andnew_data
are set todata['a']
. – FionnulaDataFrame
when assigning to a slice, this examples is the least problematic - after all, the parent is properly modified and no warning is issued. The real issue is that there's no warning when the parent is not modified. – Bonaventurecopy()
anytime you want to be sure a "copy" is really a copy. E.g.new_data = data['a'].copy()
– Ivied.copy
on it would be very inefficient: it would result in a second copy being made. – Bonaventure