How to calculate percent change compared to the beginning value using pandas?

Asked 29/1, 2016 at 17:28 Answered 5/3, 2022 at 5:21

Solved python pandas dataframe percentage

I have a DataFrame and need to calculate percent change compared to the beginning of the year by companies. Is there any way to use pct_change() or other method to perform this task? Thanks!

df looks like

security    date        price
IBM         1/1/2016    100
IBM         1/2/2016    102
IBM         1/3/2016    108
AAPL        1/1/2016    1000
AAPL        1/2/2016    980
AAPL        1/3/2016    1050
AAPL        1/4/2016    1070

results I want

security    date        price   change
IBM         1/1/2016    100     NA
IBM         1/2/2016    102     2%
IBM         1/3/2016    108     8%
AAPL        1/1/2016    1000    NA
AAPL        1/2/2016    980     -2%
AAPL        1/3/2016    1050    5%
AAPL        1/4/2016    1070    7%

Gusti answered 29/1, 2016 at 17:28 Comment(2)

First split your data frame and then use pct_change() to calculate the percent change for each date. – Skater 29/1, 2016 at 17:36

If I do df.groupby('security')['price'].pct_change(), that will give me the percent change compare to the previous days. However, what I need is percent change compare to 100 and 1000. Any ideas? – Gusti 29/1, 2016 at 17:41

Sounds like you are looking for an expanding_window version of pct_change(). This doesn't exist out of the box AFAIK, but you could roll your own:

df.groupby('security')['price'].apply(lambda x: x.div(x.iloc[0]).subtract(1).mul(100))

Coumarin answered 29/1, 2016 at 17:46 Comment(2)

I like your one liner! I think this "compared to the beginning" could be a nice feature to have for pct_change() – Gusti 29/1, 2016 at 18:5

Either that or adding an expanding_pct_change() to the expanding window functions mentioned in the docs. There is also expanding_apply, by the way, see Cookbook: pandas.pydata.org/pandas-docs/stable/cookbook.html – Coumarin 29/1, 2016 at 18:30

Late to the party, but I faced a similar problem and I'd like to share my solution with you, in case it's helpful for anybody.

TL; DR

def get_variation(values: pd.Series) -> np.float64:
    base = values.iloc[0]  # first element in window iteration
    current = values.iloc[-1]  # last element in window iteration

    return (current - base) / base if base else 0  # avoid ZeroDivisionError

variations = df.groupby('security')['price'].expanding(min_periods=2).apply(get_variation)
df = df.assign(change=variations.droplevel(0))

	security	date	price	change
0	IBM	1/1/2016	100	NaN
1	IBM	1/2/2016	102	0.02
2	IBM	1/3/2016	108	0.08
3	AAPL	1/1/2016	1000	NaN
4	AAPL	1/2/2016	980	-0.02
5	AAPL	1/3/2016	1050	0.05
6	AAPL	1/4/2016	1070	0.07

You can do what you are asking by:

defining your own variation function,
grouping pd.DataFrame by security column,
using an expanding window operation on the resulting price column,
calling apply on the expanding window operation and passing your custom variation function defined in step 1 as argument,
dropping outer index of resulting series,
assigning previous result to original pd.DataFrame.

Optionally, you can replace the expanding window operation in step 3 with a rolling window operation by calling .rolling(window=2, min_periods=2), to get a step-by-step variation on each security price. window=2 defines the size of the window in order to return two elements on each iteration and min_periods=2 sets the minimum data needed for calculations (will result in NaN, if else).

Step by step

1. Defining your own variation function

Your variation function should accept a pd.Series as argument and use the first and last items in the collection to calculate the variation. Here, I'm using a standard variation function used in finance to calculate interest rates. The last line has an if/else statement to avoid ZeroDivisionError.

def get_variation(values: pd.Series) -> np.float64:
    base = values.iloc[0]
    current = values.iloc[-1]

    return (current - base) / base if base else 0

2. Grouping `pd.DataFrame` by `security` column

Next, you should call .groupby('security') to group pd.DataFrame by security column, in order to prepare data for group calculations.

grouped_df = df.groupby('security')

		security	date	price	change
security
AAPL	3	AAPL	1/1/2016	1000	NaN
	4	AAPL	1/2/2016	980	-0.02
	5	AAPL	1/3/2016	1050	0.05
	6	AAPL	1/4/2016	1070	0.07
IBM	0	IBM	1/1/2016	100	NaN
	1	IBM	1/2/2016	102	0.02
	2	IBM	1/3/2016	108	0.08

3. Using a expanding window operation on the resulting `price` column

Next, you should use a expanding window operation by calling .expanding(min_periods=2) on the price groups. This will iterate each price group and get you all data up to the current iteration as a pd.Series. You call .expanding(min_periods=n) to set n as the minimum number of observations required to return a value on each iteration (or Nan, if else). In your case, given that you required NaN on the first record as it is not compared to anything else, you should pass min_periods=2; if you rather have 0 as a result instead, pass min_periods=1.

windows = grouped_df['price'].expanding(min_periods=2)

4. Calling `apply` on the expanding window operation and passing your custom variation function defined in step 1 as argument

Calling .apply(get_variation) will apply your custom variation formula to each resulting window and return the result.

grouped_variations = windows.apply(get_variation)

security
AAPL	3	NaN
	4	-0.02
	5	0.05
	6	0.07
IBM	0	NaN
	1	0.02
	2	0.08

5. Dropping outer index of resulting series

As you can see on .4, data is presented with a multi-index. We get rid of the outer index level ('AAPL', 'IBM') by calling .droplevel(0), to prepare the data for merging it correctly into the original dataframe.

variations = grouped_variations.droplevel(0)

6. Assigning previous result to original `pd.DataFrame`

Finally, we assign the price variations into the original dataframe by calling df.assign. Data will be joined into the destination on its index.

df = df.assign(change=variations)

	security	date	price	change
0	IBM	1/1/2016	100	NaN
1	IBM	1/2/2016	102	0.02
2	IBM	1/3/2016	108	0.08
3	AAPL	1/1/2016	1000	NaN
4	AAPL	1/2/2016	980	-0.02
5	AAPL	1/3/2016	1050	0.05
6	AAPL	1/4/2016	1070	0.07

Appose answered 5/3, 2022 at 5:21 Comment(1)

My only regret is that I have but one upvote to give for this very thorough explanation – Corder 23/9, 2022 at 15:15

This works, assuming you're already ordered by date within each possible grouping.

def pct_change(df):
    df['pct'] = 100 * (1 - df.iloc[0].price / df.price)
    return df

df.groupby('security').apply(pct_change)

Torus answered 29/1, 2016 at 17:47 Comment(0)

I had the same problem, but solved it his way:

(only difference was that the columns would be your company and not the row.)

for each column of my dataframe I did:

df[column] = df[column].pct_change().cumsum()

pct_change() calculates the change between now and the last value, and cumcum() adds it all together.

Gramnegative answered 20/10, 2019 at 13:4 Comment(2)

x.pct_change().cumsum() for series of 1000, 500, 300 would be equal to [-0.5, -0.9], which is obviously incorrect. cumum will again add pct of previous prices, not of the first price – Epigraphic 15/2, 2021 at 2:32

but to continue on that idea, for some cases x.pct_change().cumprod().cumsum() might work (you'll have to set initial value to zero and add one to have cumulative change – Epigraphic 16/2, 2021 at 15:49

TL; DR

Step by step

1. Defining your own variation function

2. Grouping `pd.DataFrame` by `security` column

3. Using a expanding window operation on the resulting `price` column

4. Calling `apply` on the expanding window operation and passing your custom variation function defined in step 1 as argument

5. Dropping outer index of resulting series

6. Assigning previous result to original `pd.DataFrame`

Recommended topics

Hot tags

TL; DR

Step by step

1. Defining your own variation function

2. Grouping pd.DataFrame by security column

3. Using a expanding window operation on the resulting price column

4. Calling apply on the expanding window operation and passing your custom variation function defined in step 1 as argument

5. Dropping outer index of resulting series

6. Assigning previous result to original pd.DataFrame

Recommended topics

Hot tags

2. Grouping `pd.DataFrame` by `security` column

3. Using a expanding window operation on the resulting `price` column

4. Calling `apply` on the expanding window operation and passing your custom variation function defined in step 1 as argument

6. Assigning previous result to original `pd.DataFrame`