How to calculate vwap (volume weighted average price) using groupby and apply?
Asked Answered
D

3

10

I have read multiple post similar to my question, but I still can't figure it out. I have a pandas df that looks like the following (for multiple days):

Out[1]: 
                     price  quantity
time                                
2016-06-08 09:00:22  32.30    1960.0
2016-06-08 09:00:22  32.30     142.0
2016-06-08 09:00:22  32.30    3857.0
2016-06-08 09:00:22  32.30    1000.0
2016-06-08 09:00:22  32.35     991.0
2016-06-08 09:00:22  32.30     447.0
...

To calculate the vwap I could do:

df['vwap'] = (np.cumsum(df.quantity * df.price) / np.cumsum(df.quantity))

However, I would like to start over every day (groupby), but I can't figure out how to make it work with a (lambda?) function.

df['vwap_day'] = df.groupby(df.index.date)['vwap'].apply(lambda ...

Speed is of essence. Would appreciate any help:)

Dioxide answered 30/6, 2017 at 20:39 Comment(3)
This is not the correct formula for VWAP. you need to calculate the typical price (Average if Hi, Lo, Close). Your price data should have this.Annabelannabela
@Annabelannabela So please contribute and add your answer with code...Dioxide
Not sure I understand, the comment was about the calculation formula and the need for more data (the high and low of the period, the average then goes into your calc); the code is trivial, I assume you know how to calculate the average of 3 columns.Annabelannabela
U
29

Option 0
plain vanilla approach

def vwap(df):
    q = df.quantity.values
    p = df.price.values
    return df.assign(vwap=(p * q).cumsum() / q.cumsum())

df = df.groupby(df.index.date, group_keys=False).apply(vwap)
df

                     price  quantity       vwap
time                                           
2016-06-08 09:00:22  32.30    1960.0  32.300000
2016-06-08 09:00:22  32.30     142.0  32.300000
2016-06-08 09:00:22  32.30    3857.0  32.300000
2016-06-08 09:00:22  32.30    1000.0  32.300000
2016-06-08 09:00:22  32.35     991.0  32.306233
2016-06-08 09:00:22  32.30     447.0  32.305901

Option 1
Throwing in a little eval

df = df.assign(
    vwap=df.eval(
        'wgtd = price * quantity', inplace=False
    ).groupby(df.index.date).cumsum().eval('wgtd / quantity')
)
df

                     price  quantity       vwap
time                                           
2016-06-08 09:00:22  32.30    1960.0  32.300000
2016-06-08 09:00:22  32.30     142.0  32.300000
2016-06-08 09:00:22  32.30    3857.0  32.300000
2016-06-08 09:00:22  32.30    1000.0  32.300000
2016-06-08 09:00:22  32.35     991.0  32.306233
2016-06-08 09:00:22  32.30     447.0  32.305901
Unsettled answered 30/6, 2017 at 20:57 Comment(7)
Fancy play syndromeMagner
@TedPetrou guiltyUnsettled
@Unsettled When I run Option 0 as above, it works fine. However I need a new column in my df, so I did df['vwap'] = df.groupby(df.index.date, group_keys=False).apply(vwap) That throws me the following error: ValueError: Wrong number of items passed 7, placement implies 1, why is that?Dioxide
What I'm returning is a new dataframe with the column included. You want to reassign the whole thing to df. Not df['vwap']Unsettled
A-ha! Working like a charm now, tnx vm. I'll stick to option 0, as even after reading up on it it, I'm not at all clear on what eval realy does..Dioxide
Works as desired, wanted to ask whether we can further find vwap for week or monthDevy
Hello, do you know why this doesn't work when 1) using the command "resample"? and could you please help me how to use this exact command and group the data per MINUTE with the time column of the form 2016-06-08 09:00:22? So group the data by minute and then apply the vwap? I'm stuck hereParaprofessional
B
9

I also used this method before but it's not working quite accurately if you're trying to limit the window period. Instead I found the TA python library to work really well: https://technical-analysis-library-in-python.readthedocs.io/en/latest/index.html

from ta.volume import VolumeWeightedAveragePrice

# ...
def vwap(dataframe, label='vwap', window=3, fillna=True):
        dataframe[label] = VolumeWeightedAveragePrice(high=dataframe['high'], low=dataframe['low'], close=dataframe["close"], volume=dataframe['volume'], window=window, fillna=fillna).volume_weighted_average_price()
        return dataframe
Berrie answered 28/3, 2021 at 19:30 Comment(0)
W
0

I used HLC3 method for vwap. The formula works for me. This is upgraded to HLC3 rather than close which was shared by someone on this platform.

import yfinance as yf
data = yf.download('AAPL', start='2020-01-01', end='2024-08-15',interval = '1d')

def vwap(df):
    # Calculate HLC3 (average of High, Low, and Close)
    hlc3 = (df['High'] + df['Low'] + df['Adj Close']) / 3
    
    q = df['Volume'].values   # Use 'Volume' column for quantity
    p = hlc3.values  # Use HLC3 for price
    
    # VWAP calculation using HLC3
    vwap = (p * q).cumsum() / q.cumsum()
    
    # Assign the calculated VWAP as a new column
    return df.assign(VWAP=vwap)

# Apply the VWAP calculation to the data
data = data.groupby(data.index.date, group_keys=False).apply(vwap)

data.head()
Workbook answered 15/8, 2024 at 9:38 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.