Combine two Pandas dataframes, resample on one time column, interpolate
Asked Answered
B

3

21

This is my first question on stackoverflow. Go easy on me!

I have two data sets acquired simultaneously by different acquisition systems with different sampling rates. One is very regular, and the other is not. I would like to create a single dataframe containing both data sets, using the regularly spaced timestamps (in seconds) as the reference for both. The irregularly sampled data should be interpolated on the regularly spaced timestamps.

Here's some toy data demonstrating what I'm trying to do:

import pandas as pd
import numpy as np

# evenly spaced times
t1 = np.array([0,0.5,1.0,1.5,2.0])
y1 = t1

# unevenly spaced times
t2 = np.array([0,0.34,1.01,1.4,1.6,1.7,2.01])
y2 = 3*t2

df1 = pd.DataFrame(data={'y1':y1,'t':t1})
df2 = pd.DataFrame(data={'y2':y2,'t':t2})

df1 and df2 look like this:

df1:
    t   y1
0  0.0  0.0
1  0.5  0.5
2  1.0  1.0
3  1.5  1.5
4  2.0  2.0

df2:
    t    y2
0  0.00  0.00
1  0.34  1.02
2  1.01  3.03
3  1.40  4.20
4  1.60  4.80
5  1.70  5.10
6  2.01  6.03

I'm trying to merge df1 and df2, interpolating y2 on df1.t. The desired result is:

df_combined:
     t   y1   y2
0  0.0  0.0  0.0
1  0.5  0.5  1.5
2  1.0  1.0  3.0
3  1.5  1.5  4.5
4  2.0  2.0  6.0

I've been reading documentation for pandas.resample, as well as searching previous stackoverflow questions, but haven't been able to find a solution to my particular problem. Any ideas? Seems like it should be easy.

UPDATE: I figured out one possible solution: interpolate the second series first, then append to the first data frame:

from scipy.interpolate import interp1d
f2 = interp1d(t2,y2,bounds_error=False)
df1['y2'] = f2(df1.t)

which gives:

df1:
    t   y1   y2
0  0.0  0.0  0.0
1  0.5  0.5  1.5
2  1.0  1.0  3.0
3  1.5  1.5  4.5
4  2.0  2.0  6.0

That works, but I'm still open to other solutions if there's a better way.

Brasserie answered 22/10, 2014 at 21:6 Comment(3)
You should add your edits as an answer so others can find themLiebfraumilch
It took me a while Googling for this. Great answer to your own post. Thank you! I think this should be a basic functionality on DataFrames in Pandas, but I guess there's no internal method.Stonework
I just saw the answer from @K3---rnc below. Seems like a cleaner solution than my own answer to myself, but it looks like either would work fine. Thanks!Brasserie
S
11

If you construct a single DataFrame from Series, using time values as index, like this:

>>> t1 = np.array([0, 0.5, 1.0, 1.5, 2.0])
>>> y1 = pd.Series(t1, index=t1)

>>> t2 = np.array([0, 0.34, 1.01, 1.4, 1.6, 1.7, 2.01])
>>> y2 = pd.Series(3*t2, index=t2)

>>> df = pd.DataFrame({'y1': y1, 'y2': y2})
>>> df
       y1    y2
0.00  0.0  0.00
0.34  NaN  1.02
0.50  0.5   NaN
1.00  1.0   NaN
1.01  NaN  3.03
1.40  NaN  4.20
1.50  1.5   NaN
1.60  NaN  4.80
1.70  NaN  5.10
2.00  2.0   NaN
2.01  NaN  6.03

You can simply interpolate it, and select only the part where y1 is defined:

>>> df.interpolate('index').reindex(y1)
      y1   y2
0.0  0.0  0.0
0.5  0.5  1.5
1.0  1.0  3.0
1.5  1.5  4.5
2.0  2.0  6.0
Shutter answered 12/7, 2017 at 15:36 Comment(1)
What if the t column has duplicates?Amain
C
0

It's not exactly clear to me how you're getting rid of some of the values in y2, but it seems like if there is more than one for a given timepoint, you only want the first one. Also, it seems like your time values should be in the index. I also added column labels. It looks like this:

import pandas as pd

# evenly spaced times
t1 = [0,0.5,1.0,1.5,2.0]
y1 = t1

# unevenly spaced times
t2 = [0,0.34,1.01,1.4,1.6,1.7,2.01]

# round t2 values to the nearest half
new_t2 = [round(num * 2)/2 for num in t2]

# set y2 values
y2 = [3*z for z in new_t2]

# eliminate entries that have the same index value
for x in range(1, len(new_t2), -1):
    if new_t2[x] == new_t2[x-1]:
        new_t2.delete(x)
        y2.delete(x)


ser1 = pd.Series(y1, index=t1)
ser2 = pd.Series(y2, index=new_t2)

df = pd.concat((ser1, ser2), axis=1)
df.columns = ('Y1', 'Y2')

print df

This prints:

      Y1   Y2
0.0  0.0  0.0
0.5  0.5  1.5
1.0  1.0  3.0
1.5  1.5  4.5
1.5  1.5  4.5
1.5  1.5  4.5
2.0  2.0  6.0
Congruency answered 1/9, 2015 at 16:39 Comment(0)
O
0
import math
idx1 = [10, 30, 45, 60, 90, 130, 145, 160, 180] 
d1 = pd.DataFrame(data=[math.sin(math.radians(x)) for x in idx1], 
                  columns=['A'], index=idx1)
idx2 = [20, 30, 50, 70, 100, 140, 155, 170, 180] 
d2 = pd.DataFrame(data={'B':[math.sin(math.radians(x)) for x in idx2],
                        'C':[math.cos(math.radians(x)) for x in idx2]}, 
                  columns=['B','C'], index=idx2)
d3 = d1.merge(d2,left_index=True,right_index=True,how='outer')
# print(d3)
d3 = d3[d3.columns].interpolate(method='polynomial',order=2).reindex(d1.index)
# print(d3)
d3.rename(columns={'B':'B_interpolated',
                   'C':'C_interpolated'},inplace=True)
fig,ax = plt.subplots()
d2.plot(ax=ax,style='.-')
d3.plot(ax=ax,style='.-')
plt.legend(loc='best')
plt.show()

enter image description here

Otes answered 29/6, 2023 at 22:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.