I have a Dataframe which has three columns: nums
with some values to work with, b
which is always either 1
or 0
and the result
column which is currently zero everywhere except in the first row (because we must have an initial value to work with).
The dataframe looks like this:
nums b result
0 20.0 1 20.0
1 22.0 0 0
2 30.0 1 0
3 29.1 1 0
4 20.0 0 0
...
The Problem
I'd like to go over each row in the dataframe starting with the second row, do some calculation and store the result in the result
column. Since I'm working with large files, I need a way to make this operation fast so that's why I want something like apply
.
The calculation I want to do is to take the value in nums
and in result
from the previous row, and if in the current row the b
col is 0
then I want (for example) to add the num
and the result
from that previous row. If b
in that row is 1
I'd like to substract them for example.
What have I tried?
I tried using apply
but I couldn't access the previous row and sadly it seems that if I do manage to access the previous row, the dataframe won't update the result column until the end.
I also tried using a loop like so, but it's too slow for the large filews I'm working with:
for i in range(1, len(df.index)):
row = df.index[i]
new_row = df.index[i - 1] # get index of previous row for "nums" and "result"
df.loc[row, 'result'] = some_calc_func(prev_result=df.loc[new_row, 'result'], prev_num=df.loc[new_row, 'nums'], \
current_b=df.loc[row, 'b'])
some_calc_func
looks like this (just a general example):
def some_calc_func(prev_result, prev_num, current_b):
if current_b == 1:
return prev_result * prev_num / 2
else:
return prev_num + 17
Please answer with respect to some_calc_func
apply
is not the first thing you should look for when you want speed. – Emelia