How to calculate vwap (volume weighted average price) using groupby and apply?

Asked 30/6, 2017 at 20:39 Answered 15/8, 2024 at 9:38

Solved python pandas lambda pandas-groupby

I have read multiple post similar to my question, but I still can't figure it out. I have a pandas df that looks like the following (for multiple days):

Out[1]: 
                     price  quantity
time                                
2016-06-08 09:00:22  32.30    1960.0
2016-06-08 09:00:22  32.30     142.0
2016-06-08 09:00:22  32.30    3857.0
2016-06-08 09:00:22  32.30    1000.0
2016-06-08 09:00:22  32.35     991.0
2016-06-08 09:00:22  32.30     447.0
...

To calculate the vwap I could do:

df['vwap'] = (np.cumsum(df.quantity * df.price) / np.cumsum(df.quantity))

However, I would like to start over every day (groupby), but I can't figure out how to make it work with a (lambda?) function.

df['vwap_day'] = df.groupby(df.index.date)['vwap'].apply(lambda ...

Speed is of essence. Would appreciate any help:)

Dioxide answered 30/6, 2017 at 20:39 Comment(3)

This is not the correct formula for VWAP. you need to calculate the typical price (Average if Hi, Lo, Close). Your price data should have this. – Annabelannabela 7/5, 2021 at 1:25

@Annabelannabela So please contribute and add your answer with code... – Dioxide 7/5, 2021 at 7:5

Not sure I understand, the comment was about the calculation formula and the need for more data (the high and low of the period, the average then goes into your calc); the code is trivial, I assume you know how to calculate the average of 3 columns. – Annabelannabela 8/5, 2021 at 12:17

Option 0
plain vanilla approach

def vwap(df):
    q = df.quantity.values
    p = df.price.values
    return df.assign(vwap=(p * q).cumsum() / q.cumsum())

df = df.groupby(df.index.date, group_keys=False).apply(vwap)
df

                     price  quantity       vwap
time                                           
2016-06-08 09:00:22  32.30    1960.0  32.300000
2016-06-08 09:00:22  32.30     142.0  32.300000
2016-06-08 09:00:22  32.30    3857.0  32.300000
2016-06-08 09:00:22  32.30    1000.0  32.300000
2016-06-08 09:00:22  32.35     991.0  32.306233
2016-06-08 09:00:22  32.30     447.0  32.305901

Option 1
Throwing in a little eval

df = df.assign(
    vwap=df.eval(
        'wgtd = price * quantity', inplace=False
    ).groupby(df.index.date).cumsum().eval('wgtd / quantity')
)
df

                     price  quantity       vwap
time                                           
2016-06-08 09:00:22  32.30    1960.0  32.300000
2016-06-08 09:00:22  32.30     142.0  32.300000
2016-06-08 09:00:22  32.30    3857.0  32.300000
2016-06-08 09:00:22  32.30    1000.0  32.300000
2016-06-08 09:00:22  32.35     991.0  32.306233
2016-06-08 09:00:22  32.30     447.0  32.305901

Unsettled answered 30/6, 2017 at 20:57 Comment(7)

Fancy play syndrome – Magner 30/6, 2017 at 20:59

@TedPetrou guilty – Unsettled 30/6, 2017 at 21:2

@Unsettled When I run Option 0 as above, it works fine. However I need a new column in my df, so I did df['vwap'] = df.groupby(df.index.date, group_keys=False).apply(vwap) That throws me the following error: ValueError: Wrong number of items passed 7, placement implies 1, why is that? – Dioxide 1/7, 2017 at 8:7

What I'm returning is a new dataframe with the column included. You want to reassign the whole thing to df. Not df['vwap'] – Unsettled 1/7, 2017 at 10:52

A-ha! Working like a charm now, tnx vm. I'll stick to option 0, as even after reading up on it it, I'm not at all clear on what eval realy does.. – Dioxide 1/7, 2017 at 12:51

Works as desired, wanted to ask whether we can further find vwap for week or month – Devy 16/7, 2021 at 11:16

Hello, do you know why this doesn't work when 1) using the command "resample"? and could you please help me how to use this exact command and group the data per MINUTE with the time column of the form 2016-06-08 09:00:22? So group the data by minute and then apply the vwap? I'm stuck here – Paraprofessional 10/6, 2022 at 13:44

I also used this method before but it's not working quite accurately if you're trying to limit the window period. Instead I found the TA python library to work really well: https://technical-analysis-library-in-python.readthedocs.io/en/latest/index.html

from ta.volume import VolumeWeightedAveragePrice

# ...
def vwap(dataframe, label='vwap', window=3, fillna=True):
        dataframe[label] = VolumeWeightedAveragePrice(high=dataframe['high'], low=dataframe['low'], close=dataframe["close"], volume=dataframe['volume'], window=window, fillna=fillna).volume_weighted_average_price()
        return dataframe

Berrie answered 28/3, 2021 at 19:30 Comment(0)

I used HLC3 method for vwap. The formula works for me. This is upgraded to HLC3 rather than close which was shared by someone on this platform.

import yfinance as yf
data = yf.download('AAPL', start='2020-01-01', end='2024-08-15',interval = '1d')

def vwap(df):
    # Calculate HLC3 (average of High, Low, and Close)
    hlc3 = (df['High'] + df['Low'] + df['Adj Close']) / 3
    
    q = df['Volume'].values   # Use 'Volume' column for quantity
    p = hlc3.values  # Use HLC3 for price
    
    # VWAP calculation using HLC3
    vwap = (p * q).cumsum() / q.cumsum()
    
    # Assign the calculated VWAP as a new column
    return df.assign(VWAP=vwap)

# Apply the VWAP calculation to the data
data = data.groupby(data.index.date, group_keys=False).apply(vwap)

data.head()

Workbook answered 15/8, 2024 at 9:38 Comment(0)

Recommended topics

Hot tags