why should I make a *shallow* copy of a dataframe?

About

Asked 14/8, 2019 at 12:43 Answered 15/8, 2019 at 15:13

Solved python pandas deep-copy shallow-copy

I noticed that in the popular backtesting library,

def __init__(self, data: pd.DataFrame)
    data = data.copy(False)

in row 631. What's the purpose of such a copy?

Quasi answered 14/8, 2019 at 12:43 Comment(4)

how is your question different from the one you linked to? – Drenthe 14/8, 2019 at 12:51

@Drenthe the question OP links to uses deep copy, while this example uses shallow copy. – Rectory 14/8, 2019 at 12:54

You make a shallow copy when you want the underlying items to change as the original item is updated, if you want modifications in the copy to be reflected in the original, or if the work you are going to do with the copy will not impact the original and you want to save space by referencing the same underlying items. – Alembic 14/8, 2019 at 13:6

@Alembic - ok, but why would I make such a copy in the beginning of my function, the way the author of backtesting did? it's weird that he will want the modifications within the library to be reflected in the dataframe the user sent to the library, no? And even if that's the purpose, why not simply use data? why bother to call shallow copy? – Quasi 14/8, 2019 at 18:8

A shallow copy allows you

have access to frames data without copying it (memory optimization, etc.)
modify frames structure without reflecting it to the original dataframe

In backtesting the developer tries to change the index to datetime format (line 640) and adds a new column 'Volume' with np.nan values if it's not already in dataframe. And those changes won't reflect on the original dataframe.

Example

>>> a = pd.DataFrame([[1, 'a'], [2, 'b']], columns=['i', 's'])
>>> b = a.copy(False)
>>> a
    i  s
 0  1  a
 1  2  b
>>> b
    i  s
 0  1  a
 1  2  b
>>> b.index = pd.to_datetime(b.index)
>>> b['volume'] = 0
>>> b
                               i  s  volume
1970-01-01 00:00:00.000000000  1  a       0
1970-01-01 00:00:00.000000001  2  b       0
>>> a
    i  s
 0  1  a
 1  2  b

Of course, if you won't create a shallow copy, those changes to dataframe structure will reflect in the original one.

Immotile answered 15/8, 2019 at 15:13 Comment(2)

great explanation! just for completeness, if you do b['i'] += 1 it will reflect on the original dataframe a. – Quasi 15/8, 2019 at 15:22

I think you confused shallow copies with deep copies. A shallow copy stores references, allowing you to modify the original by modifying the shallow copy. A deep copy makes a xerox of the original that is completely separate. – Mia 27/11, 2022 at 14:21

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags