why should I make a *shallow* copy of a dataframe?
Asked Answered
Q

1

6

related to why should I make a copy of a data frame in pandas

I noticed that in the popular backtesting library,

def __init__(self, data: pd.DataFrame)
    data = data.copy(False)

in row 631. What's the purpose of such a copy?

Quasi answered 14/8, 2019 at 12:43 Comment(4)
how is your question different from the one you linked to?Drenthe
@Drenthe the question OP links to uses deep copy, while this example uses shallow copy.Rectory
You make a shallow copy when you want the underlying items to change as the original item is updated, if you want modifications in the copy to be reflected in the original, or if the work you are going to do with the copy will not impact the original and you want to save space by referencing the same underlying items.Alembic
@Alembic - ok, but why would I make such a copy in the beginning of my function, the way the author of backtesting did? it's weird that he will want the modifications within the library to be reflected in the dataframe the user sent to the library, no? And even if that's the purpose, why not simply use data? why bother to call shallow copy?Quasi
I
4

A shallow copy allows you

  1. have access to frames data without copying it (memory optimization, etc.)
  2. modify frames structure without reflecting it to the original dataframe

In backtesting the developer tries to change the index to datetime format (line 640) and adds a new column 'Volume' with np.nan values if it's not already in dataframe. And those changes won't reflect on the original dataframe.

Example

>>> a = pd.DataFrame([[1, 'a'], [2, 'b']], columns=['i', 's'])
>>> b = a.copy(False)
>>> a
    i  s
 0  1  a
 1  2  b
>>> b
    i  s
 0  1  a
 1  2  b
>>> b.index = pd.to_datetime(b.index)
>>> b['volume'] = 0
>>> b
                               i  s  volume
1970-01-01 00:00:00.000000000  1  a       0
1970-01-01 00:00:00.000000001  2  b       0
>>> a
    i  s
 0  1  a
 1  2  b

Of course, if you won't create a shallow copy, those changes to dataframe structure will reflect in the original one.

Immotile answered 15/8, 2019 at 15:13 Comment(2)
great explanation! just for completeness, if you do b['i'] += 1 it will reflect on the original dataframe a.Quasi
I think you confused shallow copies with deep copies. A shallow copy stores references, allowing you to modify the original by modifying the shallow copy. A deep copy makes a xerox of the original that is completely separate.Mia

© 2022 - 2024 — McMap. All rights reserved.