# pandas 0.18.1, python 3.5.1 import pandas as pd data = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}) new_data = data[['a', 'b']] data = data['a'] new_data.loc[0, 'a'] = 100 # no warning, doesn't propagate to data data[0] == 1 True data = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}) new_data = data['a'] data = data['a'] new_data.loc[0] = 100 # no warning, propagates to data data[0] == 100 True

data = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}) data = data.groupby('a') new_data = data.filter(lambda g: len(g)==1) new_data.loc[0, 'a'] = 100 # no warning, does not propagate to data assert data.filter(lambda g: True).loc[0, 'a'] == 1

data = pd.DataFrame({'a': [1, 2, 2], 'b': ['a', 'b', 'c']}) data = data.groupby('a') new_data = data.filter(lambda g: len(g)==1) new_data.loc[0, 'a'] = 100 # warning, does not propagate to data assert data.filter(lambda g: True).loc[0, 'a'] == 1

# pandas 0.18.1, python 3.5.1 import pandas as pd data = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c'], c: range(3)}) new_data = data[['a', 'b']] data = data['a'] new_data.loc[0, 'a'] = 100 # no warning, doesn't propagate to data data[0] == 1 True

Explaining what you're doing, step by step

The Dataframe you create, is not a view

data = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
data._is_view
False

new_data is also not a view, because you are taking all columns

new_data = data[['a', 'b']]
new_data._is_view
False

now you are assigning data to be the Series 'a'

data = data['a']
type(data)
pandas.core.series.Series

Which is a view

data._is_view
True

Now you update a value in the non-copy new_data

new_data.loc[0, 'a'] = 100 # no warning, doesn't propagate to data

This should not give a warning. It is the whole dataframe.

The Series you've created flags itself as a view, but it's not a DataFrame and does not behave as a DataFrame view.

Avoiding writing code like this

The Series vs. Dataframe problem is a very common one in pandas[citation not needed if you've worked with pandas for a while]

The problem is really that you should always be writing

data[['a']] not data['a']

Left creates a dataframe view, right creates a series.

Some people may argue to never ever write data['a'] but do data.a instead. Thus you can add warnings to your environment for data['a'] code.

This does not work. First of all using data.a syntax causes cognitive dissonance.

A dataframe is a collection of columns. In python we access members of collections with the [] operator. We access attributes by the . operator. Switching these around causes cognitive dissonance for anyone who is a python programmer. Especially when you start doing things like del data.a and notice that it does not work. See this answer for more extensive explaination

Clean code to the rescue

It is hard to see the difference between data[['a']] and data['a']

This is a smell. We should be doing neither.

The proper way using clean code principles and the zen of python "Explicit is better than implicit"

is this:

columns = ['a']
data[columns]

This may not be so mind boggling, but take a look at the following example:

data[['ad', 'cpc', 'roi']]

What does this mean? What are these columns? What data are you getting here?

These are the first questions to arrive in anyone's head when reading this line of code.

How to solve it? Don't say a comment.

ad_performance_columns = ['ad', 'cpc', 'roi']
data[ad_performance_columns]

More explicit is always better.

For more, please consider buying a book on clean code. Maybe this one

Explaining what you're doing, step by step

Avoiding writing code like this

Clean code to the rescue

Recommended topics

Hot tags