I have a dataframe with total sales of around 500 product categories in each row. So there are 500 columns in my dataframe. I am trying to find the highest correlated category with my another dataframe columns. So I will use Pearson correlation method for this. But the Total sales for all the categories are highly skewed data, with the skewness level ranging from 10 to 40 for all the category columns. So I want to log transform this sales data using boxcox transformation. Since, my sales data has 0 values as well, I want to use boxcox1p function. Can somebody help me, how do I calculate lambda for boxcox1p function, since it is a mandatory parameter for this function? Also, Is this the correct approach for my problem statement to find highly correlated categories?
How do I calculate lambda to use scipy.special.boxcox1p function for my entire dataframe of 500 columns?
Asked Answered
Assume df
is Your dataframe with many columns containing numeric values, and lambda parameter of box-cox transformation equals 0.25, then:
from scipy.special import boxcox1p
df_boxcox = df.apply(lambda x: boxcox1p(x,0.25))
Now transformed values are in df_boxcox
.
Unfortunately there is no built-in method to find lambda of boxcox1p
but we can use PowerTransformer
from sklearn.preprocessing
instead:
import numpy as np
from sklearn.preprocessing import PowerTransformer
pt = PowerTransformer(method='yeo-johnson')
Note method 'yeo-johnson' is used because it works with both positive and negative values. Method 'box-cox' will raise error: ValueError: The Box-Cox transformation can only be applied to strictly positive data
.
data = pd.DataFrame({'x':[-2,-1,0,1,2,3,4,5]}) #just sample data to explain
pt.fit(data)
print(pt.lambdas_)
[0.89691707]
then apply calculated lambda:
print(pt.transform(data))
result:
[[-1.60758267]
[-1.09524803]
[-0.60974999]
[-0.16141745]
[ 0.26331586]
[ 0.67341476]
[ 1.07296428]
[ 1.46430326]]
But, how did you calculate 0.25? –
Chelsae
I've updated answer.
PowerTransformer
seems to do the job. –
Paterson © 2022 - 2024 — McMap. All rights reserved.