Here's one using the correlation defintion with NumPy tools meant for performance with corr2_coeff_rowwise
-
pd.Series(corr2_coeff_rowwise(dfa.values,dfb.values))
Sample run -
In [74]: dfa
Out[74]:
a b c d
0 2.0 6.0 8.0 12.0
In [75]: dfb
Out[75]:
a b c d
0 2 6 8 12
1 1 3 4 6
2 -1 -3 -4 -6
In [76]: pd.Series(corr2_coeff_rowwise(dfa.values,dfb.values))
Out[76]:
0 1.0
1 1.0
2 -1.0
dtype: float64
Runtime test
Case #1 : Large number of rows in dfb
and 4
columns -
In [77]: dfa = pd.DataFrame(np.random.randint(1,100,(1,4)))
In [78]: dfb = pd.DataFrame(np.random.randint(1,100,(30000,4)))
# @sera's soln
In [79]: %timeit dfb.corrwith(dfa.iloc[0], axis=1)
1 loop, best of 3: 4.09 s per loop
In [80]: %timeit pd.Series(corr2_coeff_rowwise(dfa.values,dfb.values))
1000 loops, best of 3: 1.53 ms per loop
Case #2 : Decent number of rows in dfb
and 400
columns -
In [83]: dfa = pd.DataFrame(np.random.randint(1,100,(1,400)))
In [85]: dfb = pd.DataFrame(np.random.randint(1,100,(300,400)))
In [86]: %timeit dfb.corrwith(dfa.iloc[0], axis=1)
10 loops, best of 3: 44.8 ms per loop
In [87]: %timeit pd.Series(corr2_coeff_rowwise(dfa.values,dfb.values))
1000 loops, best of 3: 635 µs per loop