I have switched from R to pandas. I routinely get SettingWithCopyWarnings, when I do something like
df_a = pd.DataFrame({'col1': [1,2,3,4]})
# Filtering step, which may or may not return a view
df_b = df_a[df_a['col1'] > 1]
# Add a new column to df_b
df_b['new_col'] = 2 * df_b['col1']
# SettingWithCopyWarning!!
I think I understand the problem, though I'll gladly learn what I got wrong. In the given example, it is undefined whether df_b
is a view on df_a
or not. Thus, the effect of assigning to df_b
is unclear: does it affect df_a
? The problem can be solved by explicitly making a copy when filtering:
df_a = pd.DataFrame({'col1': [1,2,3,4]})
# Filtering step, definitely a copy now
df_b = df_a[df_a['col1'] > 1].copy()
# Add a new column to df_b
df_b['new_col'] = 2 * df_b['col1']
# No Warning now
I think there is something that I am missing: if we can never really be sure whether we create a view or not, what are views good for? From the pandas documentation (http://pandas-docs.github.io/pandas-docs-travis/indexing.html?highlight=view#indexing-view-versus-copy)
Outside of simple cases, it’s very hard to predict whether it [getitem] will return a view or a copy (it depends on the memory layout of the array, about which pandas makes no guarantees)
Similar warnings can be found for different indexing methods.
I find it very cumbersome and errorprone to sprinkle .copy() calls throughout my code. Am I using the wrong style for manipulating my DataFrames? Or is the performance gain so high that it justifies the apparent awkwardness?
pd.options.mode.chained_assignment = None
– Jadadf_b = df_a[df_a['col1'] > 1].reset_index(drop=True)
. – Rectitude