Dataframe cell to be locked and used for a running balance calculation (follow up question)

Asked 19/3, 2020 at 5:37 Answered 19/3, 2020 at 19:59

Solved python pandas dataframe pandas-groupby cumsum

(This is a follow up question to my previous question which was answered correctly).

Say I have the following dataframe

import pandas as pd
df = pd.DataFrame()

df['E'] = ('SIT','SCLOSE', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL','SHODL','SCLOSE_BUY','BCLOSE_SELL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL','BUY','SIT','SIT')

df['F'] = (0.00,1.00,10.00, 5.00,6.00,-6.00, 6.00, 2.00,10.00,10.00,-8.00,33.00,-15.00,6.00,-1.00,5.00,10.00,0.00,0.00,0.00)

df.loc[19, 'G'] = 100.0000

With column G starting at 100 the same rules apply per my previous question whereby if a BUY or SELL occurs on col E the corresponding balance on column G is locked and continually used as the base amount to calculate a running balance with col F being the % increase/decrease for each row in the running balance until a BCLOSE or SCLOSE on col E is shown.

I have explained the rules in the previous question however new to this question is if a SCLOSE_BUY is shown the SELL is closed and a BUY is opened and vise versa for a BCLOSE_SELL. A BCLOSE, SCLOSE, SCLOSE_BUY or BCLOSE_SELL row all become the final row for the running balance calculation and will be used as the base when a BUY or SELL is shown next

FYI the a successful response to my previous question was made by Andy L. as follows however this response cannot handle the new scenario when a BCLOSE_SELL and a SCLOSE_BUY occur after one another

df1 = df[::-1]
s = df1.B.isin(['BCLOSE','SCLOSE']).shift(fill_value=False).cumsum()
grps = df1.groupby(s)
init_val= 100
l = []
for _, grp in grps:
    s = grp.C * 0.01 * init_val
    s.iloc[0] = init_val
    s = s.cumsum()
    init_val = s.iloc[-1]
    l.append(s)

The above answer does not address the problem I encounter in real life whereby instead of occuring a BCLOSE I instead receive a BCLOSE_SELL which basically turns the BUY into a SELL (ie. I close the BUY and open a SELL) which becomes the base amount for the ongoing rows.

If the rows continued as SHODL's I am able to adjust the code so that the running balance is correctly calculated however if I subsequently receive a SCLOSE_BUY (as seen in row 9 of my dataframe) I need to make this row close the SELL and reopen a BUY and this row will also be the new base amount for my running balance.

I understand this all sounds confusing as such the below column added to my above dataframe is what the result should look like.

df['G'] = (191.62,191.62,190.19,175.89,168.74,160.16,168.74,160.16,157.3,143,130,138,105,120,114,115,110,100,100,100)

Tanbark answered 19/3, 2020 at 5:37 Comment(0)

I have a well-documented answer on a similar question posted here, however let me tweak it a little bit so that it can be applicable to the question you have just asked. Essentially, all you need to do is to add two new breakpoints at BCLOSE_SELL and SCLOSE_BUY in the following way:

df.index[df[type_col].isin(['BCLOSE', 'SCLOSE', 'BCLOSE_SELL', 'SCLOSE_BUY'])][::-1]

In the above line the type_col is the name of the column that specifies the action (e.g. SHOLD or BCLOSE), or in your case the column E.

You can find the complete and updated piece of code that works with both of your questions below:

# basic setup
type_col = 'E'  # the name of the action type column
change_col = 'F'  # the name of the delta change column
res_col = 'G'  # the name of the resulting column
value = 100  # you can specify any initial value here
PERCENTAGE_CONST = 100

endpoints = [df.first_valid_index(), df.last_valid_index()]
# occurrences of 'BCLOSE', 'SCLOSE', 'BCLOSE_SELL' and 'SCLOSE_BUY' that break the sequence
breakpoints = df.index[df[type_col].isin(['BCLOSE','SCLOSE', 'BCLOSE_SELL', 'SCLOSE_BUY'])][::-1]
# removes the endpoints of the dataframe that do not break the structure
breakpoints = breakpoints.drop(endpoints, errors='ignore')

for i in range(len(breakpoints) + 1):

    prv = breakpoints[i - 1] - 1 if i else -1  # previous or first breakpoint
    try:
        nex = breakpoints[i] - 1  # next breakpoint
    except IndexError:
        nex = None  # last breakpoint

    # cumulative sum of values adjusted for the percentage change appended to the resulting column
    res = value + (df[change_col][prv: nex: -1] * value / PERCENTAGE_CONST).cumsum()[::-1]
    df.loc[res.index, res_col] = res

    # saving the value that will be the basis for percentage calculations
    # for the next breakpoint
    value = res.iloc[0]

The produced output is line with your expected result:

>>> df
              E     F       G
0           SIT   0.0  191.62
1        SCLOSE   1.0  191.62
2         SHODL  10.0  190.19
3         SHODL   5.0  175.89
4         SHODL   6.0  168.74
5         SHODL  -6.0  160.16
6         SHODL   6.0  168.74
7         SHODL   2.0  160.16
8         SHODL  10.0  157.30
9    SCLOSE_BUY  10.0  143.00
10  BCLOSE_SELL  -8.0  130.00
11        BHODL  33.0  138.00
12        BHODL -15.0  105.00
13        BHODL   6.0  120.00
14        BHODL  -1.0  114.00
15        BHODL   5.0  115.00
16        BHODL  10.0  110.00
17          BUY   0.0  100.00
18          SIT   0.0  100.00
19          SIT   0.0  100.00

Infinitude answered 19/3, 2020 at 14:22 Comment(3)

Thank you. Your answer and Andy's answer both worked for me irl also. Is it possible to split a bounty? Do you know how to do this? – Tanbark 21/3, 2020 at 10:50

You're welcome :) Unfortunately, you cannot split one bounty. However, you can award one bounty first and then another on the same question. – Infinitude 21/3, 2020 at 10:57

Done. Sorry was busy with other things – Tanbark 27/3, 2020 at 13:9

You just need to change my solution on 2 places. Add 'BCLOSE_SELL', 'SCLOSE_BUY' to isin checking and change the assigment of init_val

df1 = df[::-1]
s = (df1.E.isin(['BCLOSE', 'SCLOSE', 'BCLOSE_SELL', 'SCLOSE_BUY'])
          .shift(fill_value=False).cumsum())  ###add 'BCLOSE_SELL', 'SCLOSE_BUY'
grps = df1.groupby(s)
init_val= 100
l = []
for _, grp in grps:
    s = grp.F * 0.01 * init_val
    s.iloc[0] += init_val    ###change here. Use `+=` instead of `=`
    s = s.cumsum()
    init_val = s.iloc[-1]
    l.append(s)

df['G'] = pd.concat(l)

Out[96]:
              E     F       G
0           SIT   0.0  191.62
1        SCLOSE   1.0  191.62
2         SHODL  10.0  190.19
3         SHODL   5.0  175.89
4         SHODL   6.0  168.74
5         SHODL  -6.0  160.16
6         SHODL   6.0  168.74
7         SHODL   2.0  160.16
8         SHODL  10.0  157.30
9    SCLOSE_BUY  10.0  143.00
10  BCLOSE_SELL  -8.0  130.00
11        BHODL  33.0  138.00
12        BHODL -15.0  105.00
13        BHODL   6.0  120.00
14        BHODL  -1.0  114.00
15        BHODL   5.0  115.00
16        BHODL  10.0  110.00
17          BUY   0.0  100.00
18          SIT   0.0  100.00
19          SIT   0.0  100.00

Quadroon answered 19/3, 2020 at 19:59 Comment(3)

Thank you. Your answer and Arn's answer both worked for me irl also. Is it possible to split a bounty? – Tanbark 21/3, 2020 at 10:50

As Am said, you may reward bounty twice, but not split one bounty. Otherwise, it is your prerogative to award bounty. Just choose the one you like. The last resort is flipping the coin :) – Quadroon 21/3, 2020 at 18:1

In this case I will choose to award the bounty to Arn as he did help with my original question and his current post also solves my problem as well as both answering the question at the same time. Although I could have spent infinite hours trying to figure it out I will admit the only thing I missed with your answer was adding the '+' to the 's.iloc[0] += init_val' line. Many thanks with your help, no doubt I will need it for future questions – Tanbark 22/3, 2020 at 6:11

Recommended topics

Hot tags