I have dataframe with shape (335539, 26). So I have 26 features. But when i use
data.corr()
I get a 12 x 12 matrix.
What can be wrong? `
I have dataframe with shape (335539, 26). So I have 26 features. But when i use
data.corr()
I get a 12 x 12 matrix.
What can be wrong? `
Pearson co-relation can only be used with continuous data. There is no point of changing the categorical features to numerate between 1 to n for various reasons. You can change them to numerical using one hot encoding technique or dummy variables technique. It is not clear as to between what type of data features you are trying to find a co-relation. If you are trying to find co-relation between nominal variable and continuous variable, it is better called measure of association and you can calculate that using ANOVA which has built in implementation in scipy library. If its between ordinal variable and continuous variable you can use Spearman's co-relation method.
If still you want to find co-relation using corr() try converting your data with the above methods I mentioned, although I am not sure if you will get correct results.
Its better to first formulate your question properly and then look for the specific test which support your sample space.
corr() takes only numerical data and thus you only find the co-relation between your numerical features.
It appears that there are some non-numeric values in the 'data' column that have an 'object' data type, which will not show in corr().
data.dtypes
To solve this, you can handle the categorical features with either get_dummies or one-hot encoding approaches. Additionally, convert other numerical features that are of 'object' data type using the following code:
data['x'] = pd.to_numeric(data['x'], errors='coerce')
keep in mind to convert to numeric before replacing any missing values with np.na:
data['x'] = pd.to_numeric(df_['x'], errors='coerce').astype('float64')
data['Tenure'] = data['x'].apply(lambda x: x if x >= 0 else np.nan)
© 2022 - 2024 — McMap. All rights reserved.