How to calculate percent change compared to the beginning value using pandas?
Asked Answered
G

4

8

I have a DataFrame and need to calculate percent change compared to the beginning of the year by companies. Is there any way to use pct_change() or other method to perform this task? Thanks!

df looks like

security    date        price
IBM         1/1/2016    100
IBM         1/2/2016    102
IBM         1/3/2016    108
AAPL        1/1/2016    1000
AAPL        1/2/2016    980
AAPL        1/3/2016    1050
AAPL        1/4/2016    1070

results I want

security    date        price   change
IBM         1/1/2016    100     NA
IBM         1/2/2016    102     2%
IBM         1/3/2016    108     8%
AAPL        1/1/2016    1000    NA
AAPL        1/2/2016    980     -2%
AAPL        1/3/2016    1050    5%
AAPL        1/4/2016    1070    7%
Gusti answered 29/1, 2016 at 17:28 Comment(2)
First split your data frame and then use pct_change() to calculate the percent change for each date.Skater
If I do df.groupby('security')['price'].pct_change(), that will give me the percent change compare to the previous days. However, what I need is percent change compare to 100 and 1000. Any ideas?Gusti
C
17

Sounds like you are looking for an expanding_window version of pct_change(). This doesn't exist out of the box AFAIK, but you could roll your own:

df.groupby('security')['price'].apply(lambda x: x.div(x.iloc[0]).subtract(1).mul(100))
Coumarin answered 29/1, 2016 at 17:46 Comment(2)
I like your one liner! I think this "compared to the beginning" could be a nice feature to have for pct_change()Gusti
Either that or adding an expanding_pct_change() to the expanding window functions mentioned in the docs. There is also expanding_apply, by the way, see Cookbook: pandas.pydata.org/pandas-docs/stable/cookbook.htmlCoumarin
A
6

Late to the party, but I faced a similar problem and I'd like to share my solution with you, in case it's helpful for anybody.

TL; DR

def get_variation(values: pd.Series) -> np.float64:
    base = values.iloc[0]  # first element in window iteration
    current = values.iloc[-1]  # last element in window iteration

    return (current - base) / base if base else 0  # avoid ZeroDivisionError

variations = df.groupby('security')['price'].expanding(min_periods=2).apply(get_variation)
df = df.assign(change=variations.droplevel(0))
security date price change
0 IBM 1/1/2016 100 NaN
1 IBM 1/2/2016 102 0.02
2 IBM 1/3/2016 108 0.08
3 AAPL 1/1/2016 1000 NaN
4 AAPL 1/2/2016 980 -0.02
5 AAPL 1/3/2016 1050 0.05
6 AAPL 1/4/2016 1070 0.07

You can do what you are asking by:

  1. defining your own variation function,
  2. grouping pd.DataFrame by security column,
  3. using an expanding window operation on the resulting price column,
  4. calling apply on the expanding window operation and passing your custom variation function defined in step 1 as argument,
  5. dropping outer index of resulting series,
  6. assigning previous result to original pd.DataFrame.

Optionally, you can replace the expanding window operation in step 3 with a rolling window operation by calling .rolling(window=2, min_periods=2), to get a step-by-step variation on each security price. window=2 defines the size of the window in order to return two elements on each iteration and min_periods=2 sets the minimum data needed for calculations (will result in NaN, if else).

Step by step

1. Defining your own variation function

Your variation function should accept a pd.Series as argument and use the first and last items in the collection to calculate the variation. Here, I'm using a standard variation function used in finance to calculate interest rates. The last line has an if/else statement to avoid ZeroDivisionError.

def get_variation(values: pd.Series) -> np.float64:
    base = values.iloc[0]
    current = values.iloc[-1]

    return (current - base) / base if base else 0

2. Grouping pd.DataFrame by security column

Next, you should call .groupby('security') to group pd.DataFrame by security column, in order to prepare data for group calculations.

grouped_df = df.groupby('security')
security date price change
security
AAPL 3 AAPL 1/1/2016 1000 NaN
4 AAPL 1/2/2016 980 -0.02
5 AAPL 1/3/2016 1050 0.05
6 AAPL 1/4/2016 1070 0.07
IBM 0 IBM 1/1/2016 100 NaN
1 IBM 1/2/2016 102 0.02
2 IBM 1/3/2016 108 0.08

3. Using a expanding window operation on the resulting price column

Next, you should use a expanding window operation by calling .expanding(min_periods=2) on the price groups. This will iterate each price group and get you all data up to the current iteration as a pd.Series. You call .expanding(min_periods=n) to set n as the minimum number of observations required to return a value on each iteration (or Nan, if else). In your case, given that you required NaN on the first record as it is not compared to anything else, you should pass min_periods=2; if you rather have 0 as a result instead, pass min_periods=1.

windows = grouped_df['price'].expanding(min_periods=2)

4. Calling apply on the expanding window operation and passing your custom variation function defined in step 1 as argument

Calling .apply(get_variation) will apply your custom variation formula to each resulting window and return the result.

grouped_variations = windows.apply(get_variation)
security
AAPL 3 NaN
4 -0.02
5 0.05
6 0.07
IBM 0 NaN
1 0.02
2 0.08

5. Dropping outer index of resulting series

As you can see on .4, data is presented with a multi-index. We get rid of the outer index level ('AAPL', 'IBM') by calling .droplevel(0), to prepare the data for merging it correctly into the original dataframe.

variations = grouped_variations.droplevel(0)

6. Assigning previous result to original pd.DataFrame

Finally, we assign the price variations into the original dataframe by calling df.assign. Data will be joined into the destination on its index.

df = df.assign(change=variations)
security date price change
0 IBM 1/1/2016 100 NaN
1 IBM 1/2/2016 102 0.02
2 IBM 1/3/2016 108 0.08
3 AAPL 1/1/2016 1000 NaN
4 AAPL 1/2/2016 980 -0.02
5 AAPL 1/3/2016 1050 0.05
6 AAPL 1/4/2016 1070 0.07
Appose answered 5/3, 2022 at 5:21 Comment(1)
My only regret is that I have but one upvote to give for this very thorough explanationCorder
T
2

This works, assuming you're already ordered by date within each possible grouping.

def pct_change(df):
    df['pct'] = 100 * (1 - df.iloc[0].price / df.price)
    return df

df.groupby('security').apply(pct_change)
Torus answered 29/1, 2016 at 17:47 Comment(0)
G
0

I had the same problem, but solved it his way:

(only difference was that the columns would be your company and not the row.)

for each column of my dataframe I did:

df[column] = df[column].pct_change().cumsum()

pct_change() calculates the change between now and the last value, and cumcum() adds it all together.

Gramnegative answered 20/10, 2019 at 13:4 Comment(2)
x.pct_change().cumsum() for series of 1000, 500, 300 would be equal to [-0.5, -0.9], which is obviously incorrect. cumum will again add pct of previous prices, not of the first priceEpigraphic
but to continue on that idea, for some cases x.pct_change().cumprod().cumsum() might work (you'll have to set initial value to zero and add one to have cumulative changeEpigraphic

© 2022 - 2025 — McMap. All rights reserved.