Pairwise correlation of Pandas DataFrame columns with custom function
Asked Answered
A

3

12

Pandas pairwise correlation on a DataFrame comes handy in many cases. However, in my specific case I would like to use a method not provided by Pandas (something other than (pearson, kendall or spearman) to correlate two columns. Is it possible to explicitly define the correlation function to use in this case?

The syntax I would like looks like this:

def my_method(x,y): return something
frame.corr(method=my_method)
Ardeth answered 14/8, 2013 at 14:25 Comment(4)
can you give an example of what your method is?Flipflop
It doesn't really matter. Given two series x and y it returns a coefficient in [0,1] indicating the correlation between the two variables just like Spearman does.Ardeth
Not an issue for the question, but Spearman's rank correlation returns a coefficient in [-1, 1].Penneypenni
Besides doing it in cython as Jeff mentions, you could also consider numpy or numba for speedLatinalatinate
F
2

You would need to do this in cython for any kind of perf (with a cythonizable function)

l = len(df.columns)
results = np.zeros((l,l))
for i, ac in enumerate(df):
    for j, bc in enumerate(df):
           results[j,i] = func(ac,bc)
results = DataFrame(results,index=df.columns,columns=df.columns)
Flipflop answered 14/8, 2013 at 14:47 Comment(0)
R
0

Check out the documentation for DataFrame.corr()

Parameters
----------
    method : {'pearson', 'kendall', 'spearman'} or callable
        * pearson : standard correlation coefficient
        * kendall : Kendall Tau correlation coefficient
        * spearman : Spearman rank correlation
        * callable: callable with input two 1d ndarrays
            and returning a float. Note that the returned matrix from corr
            will have 1 along the diagonals and will be symmetric
            regardless of the callable's behavior
            .. versionadded:: 0.24.0

Check out also DataFrame.corrwith()

Warning: This calculates a symmetric correlation matrix, eg. CramrsV, but this method is not suitable for TheilsU and other asymmetric corr matrix.

Rosel answered 29/12, 2019 at 18:30 Comment(0)
A
0
def spearman_rank_pandas(rank_series1: np.ndarray, rank_series2: np.ndarray):
    if np.isnan(rank_series1).all() or np.isnan(rank_series2).all():
        return np.nan
    
    rank_diff = rank_series1 - rank_series2
    
    top = 6 * ((rank_diff**2).sum())
    bottom = len(rank_diff) * (len(rank_diff)**2 - 1)

    rho = 1 - (top/bottom)

    assert ((rho >= -1) and (rho <= 1)), "Error in your stats"
    return rho
frame = frame[["x1", "x2", "y"]]
def my_method(frame): return something
    return frame.corr(method=spearman_rank_pandas)
Araby answered 29/3, 2023 at 23:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.