A lot of times, I have a big dataframe df
to hold the basic data, and need to create many more columns to hold the derivative data calculated by basic data columns.
I can do that in Pandas like:
df['derivative_col1'] = df['basic_col1'] + df['basic_col2']
df['derivative_col2'] = df['basic_col1'] * df['basic_col2']
....
df['derivative_coln'] = func(list_of_basic_cols)
etc. Pandas will calculate and allocate the memory for all derivative columns all at once.
What I want now is to have a lazy evaluation mechanism to postpone the calculation and memory allocation of derivative columns to the actual need moment. Somewhat define the lazy_eval_columns as:
df['derivative_col1'] = pandas.lazy_eval(df['basic_col1'] + df['basic_col2'])
df['derivative_col2'] = pandas.lazy_eval(df['basic_col1'] * df['basic_col2'])
That will save the time/memory like Python 'yield' generator, for if I issue df['derivative_col2']
command will only triger the specific calculation and memory allocation.
So how to do lazy_eval()
in Pandas ? Any tip/thought/ref are welcome.