Dataframe cell to be locked and used for a running balance calculation conditional of result on another cell on same row

Asked 28/2, 2020 at 4:1 Answered 3/3, 2020 at 3:20

Solved python pandas dataframe

Say I have the following dataframe:

import pandas as pd
df = pd.DataFrame()

df['A'] = ('1/05/2019','2/05/2019','3/05/2019','4/05/2019','5/05/2019','6/05/2019','7/05/2019','8/05/2019','9/05/2019','10/05/2019','11/05/2019','12/05/2019','13/05/2019','14/05/2019','15/05/2019','16/05/2019','17/05/2019','18/05/2019','19/05/2019','20/05/2019')

df['B'] = ('SIT','SCLOSE', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SELL','SIT','SIT','BCLOSE', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL','BUY','SIT','SIT')

df['C'] = (0.00,1.00,10.00, 5.00,6.00,-6.00, 6.00, 0.00,0.00,0.00,-8.00,33.00,-15.00,6.00,-1.00,5.00,10.00,0.00,0.00,0.00)

df.loc[19, 'D'] = 100.0000

As can be seen I am starting column D with 100 at the last row.

I am trying to code a calculation for column D so starting from the bottom row (row 19) when a BUY or SELL is shown on column B then the number on column D is locked (eg the 100) and used for a calculation based on col C for each SHODL or BHODL until the row after a BCLOSE or an SCLOSE is shown.

The locked number is used to calculate a running balance based on the percentages that are in column C. As you can see on row 16 column C has '10' representing 10%. As 10% of 100 = 10 the new runnning balance is 110.

Row 15 column C has 5% as such 5 is added to the running balance to result in 115.

The next row 14 column C has a -1% change as such 1% of 100 is = 1 and therefore the new running balance is 114 and so on.

The following are the results that should be returned in col D of the dataframe once the right code is run

df['D'] = ('158.60','158.60', '157.30', '144.30', '137.80', '130.00', '137.80', '130.00','130.00','130.00','130.00', '138.00', '105.00', '120.00', '114.00', '115.00', '110.00','100.00','100.00','100.00')

This continues until after a SCLOSE or a BCLOSE is shown as a BCLOSE or SCLOSE row is the final row where the running balance is calculated.

As you can see this process is restarted when either a new BUY or SELL is shown.

Felder answered 28/2, 2020 at 4:1 Comment(7)

let me know if the above doesn't make sense or if you have any questions – Felder 28/2, 2020 at 5:7

The chronology of calculation you mentioned goes from row 19 to row 0. Is that right or should the DataFrame be reversed? – Toluene 28/2, 2020 at 5:43

Yeah mate thats how it is irl so may need to reverse df to perform calculation then reverse back to the state so it looks like col D – Felder 28/2, 2020 at 7:13

Also, is it necessary to use pandas? seems like a easy solution using iteration when using simple lists or dicts. – Toluene 28/2, 2020 at 9:40

Hi nishant I started a bounty so hope you can take another look – Felder 1/3, 2020 at 5:28

I only see "SIT", not "SELL"? – Demitria 1/3, 2020 at 17:57

Hi cs95, there is a 'SELL' on row 7 of my example dataframe – Felder 2/3, 2020 at 5:35

Next starting value depends on the last value of previous group, so I think it can't be vectorized. It requires some kind of iterative process. I came up with solution doing iteratively on groups of groupby. Reverse df and assign to df1. Working on each group of df1 and assign the final list of groups to the original df

df1 = df[::-1]
s = df1.B.isin(['BCLOSE','SCLOSE']).shift(fill_value=False).cumsum()
grps = df1.groupby(s)
init_val= 100
l = []
for _, grp in grps:
    s = grp.C * 0.01 * init_val
    s.iloc[0] = init_val
    s = s.cumsum()
    init_val = s.iloc[-1]
    l.append(s)

df['D'] = pd.concat(l)

Out[50]:
             A       B     C      D
0    1/05/2019     SIT   0.0  158.6
1    2/05/2019  SCLOSE   1.0  158.6
2    3/05/2019   SHODL  10.0  157.3
3    4/05/2019   SHODL   5.0  144.3
4    5/05/2019   SHODL   6.0  137.8
5    6/05/2019   SHODL  -6.0  130.0
6    7/05/2019   SHODL   6.0  137.8
7    8/05/2019    SELL   0.0  130.0
8    9/05/2019     SIT   0.0  130.0
9   10/05/2019     SIT   0.0  130.0
10  11/05/2019  BCLOSE  -8.0  130.0
11  12/05/2019   BHODL  33.0  138.0
12  13/05/2019   BHODL -15.0  105.0
13  14/05/2019   BHODL   6.0  120.0
14  15/05/2019   BHODL  -1.0  114.0
15  16/05/2019   BHODL   5.0  115.0
16  17/05/2019   BHODL  10.0  110.0
17  18/05/2019     BUY   0.0  100.0
18  19/05/2019     SIT   0.0  100.0
19  20/05/2019     SIT   0.0  100.0

Saretta answered 3/3, 2020 at 3:20 Comment(4)

Thanks Andy your answer works straight out of the box. I prefer the format of Arn's answer below however it had an error when the top row was a BCLOSE or an SCLOSE, your answer didn't have this problem – Felder 7/3, 2020 at 19:10

Hi Andy, I have a follow up question to this one. I will give you 50 bounty points upon successful answer. See #60751991 – Felder 19/3, 2020 at 5:39

Let me know if you have any questions – Felder 19/3, 2020 at 5:40

@ds882: I posted an answer to the new question – Saretta 19/3, 2020 at 20:2

I think there is a more optimized and pythonic way to solve this. But a solution with iteration:

df['D'] = pd.to_numeric(df['D'])
df['C'] = pd.to_numeric(df['C'])
D_val = None
for i in range(len(df)-1, 0, -1):
    if df.loc[i, 'B'] == 'BUY':
        D_val = df.loc[i, 'D']
        continue
    if D_val is None:
        continue
    df.loc[i, 'D'] = df.loc[i+1, 'D'] + (D_val * df.loc[i, 'C']/100)

Everytime you encounter a BUY in column D, you update the D_val. We can also have a condition to stop as mentioned by OP such as SCLOSE or BCLOSE.

Toluene answered 28/2, 2020 at 10:10 Comment(5)

Thanks mate, but I am a little confused with your code or am not sure if it was I require. Bascially I provided col D as only an example of what the result I require should end up looking like. How do Iget to col D based on only col B and C – Felder 28/2, 2020 at 12:59

Hi @Toluene I have edited my question. Maybe it now makes more sense – Felder 1/3, 2020 at 2:33

The initial value from where we start calculating D must be stored somewhere. If there is no column D, we can start from initial variable and then progressively populate D. The above code would still work. – Toluene 2/3, 2020 at 12:52

Can you once try the above piece of code and let me know the issues/differences in result from the expected result. – Toluene 2/3, 2020 at 13:2

Hi nishant if I simply add your code under my code above I just get an error message. Is there something else I need to do? – Felder 3/3, 2020 at 11:17

This piece below should help you out. It produces the expected output and it's relatively fast because it avoids the direct iteration on the dataframe's rows.

endpoints = [df.first_valid_index(), df.last_valid_index()]
# occurrences of 'BCLOSE' or 'SCLOSE'
breakpoints = df.index[(df.B =='BCLOSE') | (df.B == 'SCLOSE')][::-1]
# remove the endpoints of the dataframe that do not break the structure
breakpoints = breakpoints.drop(endpoints, errors='ignore')

PERCENTAGE_CONST = 100
top = 100  # you can specify any initial value here

for i in range(len(breakpoints) + 1):

    prv = breakpoints[i - 1] - 1 if i else -1  # previous or first breakpoint
    try:
        nex = breakpoints[i] - 1  # next breakpoint
    except IndexError:
        nex = None  # last breakpoint

    # cumulative sum of appended to 'D' column
    res = top + (df['C'][prv: nex: -1] * top / PERCENTAGE_CONST).cumsum()[::-1]
    df.loc[res.index, 'D'] = res

    # saving the value that will be the basis for percentage calculations
    # for the next breakpoint
    top = res.iloc[0]

Lianeliang answered 2/3, 2020 at 18:9 Comment(6)

Hi Arn. This looks good and I think we are almost there, however there is 1 issue. Column C figures are meant to be percentages as I describe in my outline not the actual amount of gain or loss. If you change the top/begin variable from 100 to 1000 the result of row 0 should be '1586.0' (ie 10x current value) however it is currently 1052.66 with your code. Cheers mate – Felder 3/3, 2020 at 0:17

Hi, I've just edited my answer. It works with other values now. – Lianeliang 3/3, 2020 at 11:13

Hi Arn sorry for the late reply just other things on. Yes your answer is great, however I have found 1 more issue. If you delete the top row of my example dataframe so the new top row has col B as SCLOSE then run with your code I get the following error. 'raise IndexError("single positional indexer is out-of-bounds") IndexError: single positional indexer is out-of-bounds'. It is in regards to the following line 'top = res.iloc[0]'. It looks as though your code gets and error if the top row is and SCLOSE or BCLOSE – Felder 5/3, 2020 at 3:26

Its just in real life when I run your code I will end up with a BCLOSE's and SCLOSE's with a project and cannot have and error. Other than that I really like how readable you have prepared the code – Felder 5/3, 2020 at 3:43

Hi Arn, I have a follow up question to this one. I will give you 50 bounty points upon successful answer. See #60751991 – Felder 19/3, 2020 at 5:39

Let me know if you have any questions – Felder 19/3, 2020 at 5:39

Recommended topics

Hot tags