Late to the party, but I faced a similar problem and I'd like to share my solution with you, in case it's helpful for anybody.
TL; DR
def get_variation(values: pd.Series) -> np.float64:
base = values.iloc[0] # first element in window iteration
current = values.iloc[-1] # last element in window iteration
return (current - base) / base if base else 0 # avoid ZeroDivisionError
variations = df.groupby('security')['price'].expanding(min_periods=2).apply(get_variation)
df = df.assign(change=variations.droplevel(0))
|
security |
date |
price |
change |
0 |
IBM |
1/1/2016 |
100 |
NaN |
1 |
IBM |
1/2/2016 |
102 |
0.02 |
2 |
IBM |
1/3/2016 |
108 |
0.08 |
3 |
AAPL |
1/1/2016 |
1000 |
NaN |
4 |
AAPL |
1/2/2016 |
980 |
-0.02 |
5 |
AAPL |
1/3/2016 |
1050 |
0.05 |
6 |
AAPL |
1/4/2016 |
1070 |
0.07 |
You can do what you are asking by:
- defining your own variation function,
- grouping
pd.DataFrame
by security
column,
- using an expanding window operation on the resulting
price
column,
- calling
apply
on the expanding window operation and passing your custom variation function defined in step 1 as argument,
- dropping outer index of resulting series,
- assigning previous result to original
pd.DataFrame
.
Optionally, you can replace the expanding window operation in step 3 with a rolling window operation by calling .rolling(window=2, min_periods=2)
, to get a step-by-step variation on each security price. window=2
defines the size of the window in order to return two elements on each iteration and min_periods=2
sets the minimum data needed for calculations (will result in NaN
, if else).
Step by step
1. Defining your own variation function
Your variation function should accept a pd.Series
as argument and use the first and last items in the collection to calculate the variation. Here, I'm using a standard variation function used in finance to calculate interest rates. The last line has an if/else
statement to avoid ZeroDivisionError
.
def get_variation(values: pd.Series) -> np.float64:
base = values.iloc[0]
current = values.iloc[-1]
return (current - base) / base if base else 0
2. Grouping pd.DataFrame
by security
column
Next, you should call .groupby('security')
to group pd.DataFrame
by security
column, in order to prepare data for group calculations.
grouped_df = df.groupby('security')
|
|
security |
date |
price |
change |
security |
|
|
|
|
|
AAPL |
3 |
AAPL |
1/1/2016 |
1000 |
NaN |
|
4 |
AAPL |
1/2/2016 |
980 |
-0.02 |
|
5 |
AAPL |
1/3/2016 |
1050 |
0.05 |
|
6 |
AAPL |
1/4/2016 |
1070 |
0.07 |
IBM |
0 |
IBM |
1/1/2016 |
100 |
NaN |
|
1 |
IBM |
1/2/2016 |
102 |
0.02 |
|
2 |
IBM |
1/3/2016 |
108 |
0.08 |
3. Using a expanding window operation on the resulting price
column
Next, you should use a expanding window operation by calling .expanding(min_periods=2)
on the price groups. This will iterate each price
group and get you all data up to the current iteration as a pd.Series
. You call .expanding(min_periods=n)
to set n
as the minimum number of observations required to return a value on each iteration (or Nan
, if else). In your case, given that you required NaN
on the first record as it is not compared to anything else, you should pass min_periods=2
; if you rather have 0
as a result instead, pass min_periods=1
.
windows = grouped_df['price'].expanding(min_periods=2)
4. Calling apply
on the expanding window operation and passing your custom variation function defined in step 1 as argument
Calling .apply(get_variation)
will apply your custom variation formula to each resulting window and return the result.
grouped_variations = windows.apply(get_variation)
security |
|
|
AAPL |
3 |
NaN |
|
4 |
-0.02 |
|
5 |
0.05 |
|
6 |
0.07 |
IBM |
0 |
NaN |
|
1 |
0.02 |
|
2 |
0.08 |
5. Dropping outer index of resulting series
As you can see on .4, data is presented with a multi-index. We get rid of the outer index level ('AAPL', 'IBM') by calling .droplevel(0)
, to prepare the data for merging it correctly into the original dataframe.
variations = grouped_variations.droplevel(0)
6. Assigning previous result to original pd.DataFrame
Finally, we assign the price variations into the original dataframe by calling df.assign
. Data will be joined into the destination on its index.
df = df.assign(change=variations)
|
security |
date |
price |
change |
0 |
IBM |
1/1/2016 |
100 |
NaN |
1 |
IBM |
1/2/2016 |
102 |
0.02 |
2 |
IBM |
1/3/2016 |
108 |
0.08 |
3 |
AAPL |
1/1/2016 |
1000 |
NaN |
4 |
AAPL |
1/2/2016 |
980 |
-0.02 |
5 |
AAPL |
1/3/2016 |
1050 |
0.05 |
6 |
AAPL |
1/4/2016 |
1070 |
0.07 |