How to run Ta-Lib on multiple columns of a Pandas dataframe?
Asked Answered
P

1

5

I have a data frame with the price of several securities as columns and I can't find a solution to run TA-Lib in one shot because it needs numpy.ndarray.

How can I run TA-Lib over multiple securities and get a data frame in return?

import talib as ta
d = {'security1': [1,2,8,9,8,5], 'security2': [3,8,5,4,3,5]}
df = pd.DataFrame(data=d)
df
Out[518]: 
   security1  security2
0          1          3
1          2          8
2          8          5
3          9          4
4          8          3
5          5          5

ta.EMA(df, 2)
TypeError: Argument 'real' has incorrect type (expected numpy.ndarray, got DataFrame)

ta.EMA(df['security1'], 2)
Out[520]: 
0         NaN
1    1.500000
2    5.833333
3    7.944444
4    7.981481
5    5.993827
dtype: float64

type(df['security1'])
Out[524]: pandas.core.series.Series

When I convert the data frame to a numpy.ndarray it still throws an exception:

ta.EMA(df.values, 2)
Out[528]: Exception: input array type is not double

Thank you.

Perseverance answered 6/8, 2018 at 16:47 Comment(0)
I
8

TA-Lib is expecting floating point data, whereas yours is integral.

As such, when constructing your dataframe you need to coerce the input data by specifying dtype=numpy.float64:

import pandas
import numpy
import talib

d = {'security1': [1,2,8,9,8,5], 'security2': [3,8,5,4,3,5]}
df = pandas.DataFrame(data=d, dtype=numpy.float64)         # note numpy.float64 here

TA-Lib expects 1D arrays, which means it can operate on pandas.Series but not pandas.DataFrame.

You can, however, use pandas.DataFrame.apply to apply a function on each column of your dataframe

df.apply(lambda c: talib.EMA(c, 2))

    security1   security2
0         NaN         NaN
1    1.500000    5.500000
2    5.833333    5.166667
3    7.944444    4.388889
4    7.981481    3.462963
5    5.993827    4.487654
Innate answered 6/8, 2018 at 17:54 Comment(2)
Thank you Steve, I was close to a solution but yours is more elegant. I did s = [ta.EMA(df[c], 2) for c in df] and then pd.DataFrame(pd.DataFrame(s).T.values, columns = df.columns,index = df.index) but it's not that elegant. Thanks again!Perseverance
thanks this fixed it for me. i ahve a function that dynamically creates data frames, so adding that in was the fixKrucik

© 2022 - 2024 — McMap. All rights reserved.