I have a DataFrame, say a volatility surface with index as time and column as strike. How do I do two dimensional interpolation? I can reindex
but how do i deal with NaN
? I know we can fillna(method='pad')
but it is not even linear interpolation. Is there a way we can plug in our own method to do interpolation?
Interpolation on DataFrame in pandas
You can use DataFrame.interpolate
to get a linear interpolation.
In : df = pandas.DataFrame(numpy.random.randn(5,3), index=['a','c','d','e','g'])
In : df
Out:
0 1 2
a -1.987879 -2.028572 0.024493
c 2.092605 -1.429537 0.204811
d 0.767215 1.077814 0.565666
e -1.027733 1.330702 -0.490780
g -1.632493 0.938456 0.492695
In : df2 = df.reindex(['a','b','c','d','e','f','g'])
In : df2
Out:
0 1 2
a -1.987879 -2.028572 0.024493
b NaN NaN NaN
c 2.092605 -1.429537 0.204811
d 0.767215 1.077814 0.565666
e -1.027733 1.330702 -0.490780
f NaN NaN NaN
g -1.632493 0.938456 0.492695
In : df2.interpolate()
Out:
0 1 2
a -1.987879 -2.028572 0.024493
b 0.052363 -1.729055 0.114652
c 2.092605 -1.429537 0.204811
d 0.767215 1.077814 0.565666
e -1.027733 1.330702 -0.490780
f -1.330113 1.134579 0.000958
g -1.632493 0.938456 0.492695
For anything more complex, you need to roll-out your own function that will deal with a Series
object and fill NaN
values as you like and return another Series
object.
It would be a good idea to incorporate this as an option in fillna. –
Slifka
What if there is another dimension (or category) to hold constant (separate) in the interpolation step? ie, how can I combine your wonderful solution with a groupby? Right now, if there are repeated values of the index (e.g. they are identical across the different categories I wish to group by), the reindex() step fails, claiming "Reindexing only valid with uniquely valued Index objects". (Maybe this should be a new question?) –
Bernhard
That's a great and somewhat obscure answer. It would be nice to have a convenience function for this where you can pick the axes to interpolate over –
Koerner
Could also use DataFrame's interpolate method?
df2.interpolate()
because df2.interpolate() == df2.apply(pandas.Series.interpolate)
(at least for me, pandas.__version__ == 0.14
) –
Biceps Old thread but thought I would share my solution with 2d extrapolation/interpolation, respecting index values, which also works on demand. Code ended up a bit weird so let me know if there is a better solution:
import pandas
from numpy import nan
import numpy
dataGrid = pandas.DataFrame({1: {1: 1, 3: 2},
2: {1: 3, 3: 4}})
def getExtrapolatedInterpolatedValue(x, y):
global dataGrid
if x not in dataGrid.index:
dataGrid.ix[x] = nan
dataGrid = dataGrid.sort()
dataGrid = dataGrid.interpolate(method='index', axis=0).ffill(axis=0).bfill(axis=0)
if y not in dataGrid.columns.values:
dataGrid = dataGrid.reindex(columns=numpy.append(dataGrid.columns.values, y))
dataGrid = dataGrid.sort_index(axis=1)
dataGrid = dataGrid.interpolate(method='index', axis=1).ffill(axis=1).bfill(axis=1)
return dataGrid[y][x]
print getExtrapolatedInterpolatedValue(2, 1.4)
>>2.3
Beautiful solution. Works very well for me. Thank you for posting! –
Impertinence
© 2022 - 2024 — McMap. All rights reserved.