categorical-data Questions

3

Solved

There is a great solution in R. My df.column looks like: Windows Windows Mac Mac Mac Linux Windows ... I want to replace low frequency categories with 'Other' in this df.column vector. For exam...
Unfriendly asked 21/11, 2017 at 16:43

4

I am trying to run some Machine learning algo on a dataset using scikit-learn. My dataset has some features which are like categories. Like one feature is A, which has values 1,2,3 specifying the q...

1

I recently found this answer which provides the code of an unbiased version of Cramer's V for computing the correlation of two categorical variables: import scipy.stats as ss def cramers_corrected...
Mireille asked 12/4, 2024 at 22:20

2

Solved

I have tried passing the dtype parameter with read_csv as dtype={n: pandas.Categorical} but this does not work properly (the result is an Object). The manual is unclear. Is it possible to read cate...
Sexlimited asked 16/5, 2015 at 5:49

6

Solved

I have the following dataframe: import pandas as pd df = pd.DataFrame({'id': [2967, 5335, 13950, 6141, 6169], 'Player': ['Cedric Hunter', 'Maurice Baker', 'Ratko Varda', 'Ryan Bowen', 'Adrian Ca...
Hate asked 25/4, 2018 at 0:44

4

Solved

I have a data set made of 22 categorical variables (non-ordered). I would like to visualize their correlation in a nice heatmap. Since the Pandas built-in function DataFrame.corr(method='pearson',...
Davina asked 30/12, 2017 at 15:43

10

I have a series like: df['ID'] = ['ABC123', 'IDF345', ...] I'm using scikit's LabelEncoder to convert it to numerical values to be fed into the RandomForestClassifier. During the training, I'm ...

11

Solved

I am converting strings to categorical values in my dataset using the following piece of code. data['weekday'] = pd.Categorical.from_array(data.weekday).labels For eg, index weekday 0 Sunday 1...
Narrate asked 13/2, 2017 at 4:14

12

Solved

I need to transform the independent field from string to arithmetical notation. I am using OneHotEncoder for the transformation. My dataset has many independent columns of which some are as: Count...

5

My question concerns optimizing memory usage for pandas Series. The docs note, The memory usage of a Categorical is proportional to the number of categories plus the length of the data. In contr...
Chandos asked 15/1, 2018 at 2:58

4

Solved

I am trying to tweak contrast coding on a linear model where I want to know if each level of a factor is significantly different from the grand mean. Let’s say the factor has levels "A", ...
Epigraph asked 30/6, 2022 at 18:6

3

Solved

I have a dataset including categorical variables(binary) and continuous variables. I'm trying to apply a linear regression model for predicting a continuous variable. Can someone please let me know...

1

Solved

I have a column in my Used cars price prediction dataset named "Owner_Type". It has four unique values which are ['First', 'Second', 'Third', 'Fourth']. Now the order that makes the most ...

5

What is the difference between the two? It seems that both create new columns, which their number is equal to the number of unique categories in the feature. Then they assign 0 and 1 to data points...

5

Solved

I have a pandas DataFrame with a column representing a categorical variable. How can I get a list of the categories? I tried .values on the column but that does not return the unique levels.
Pointillism asked 19/9, 2018 at 11:36

3

Solved

Hi have a pandas dataframe df containing categorical variables. df=pandas.DataFrame(data=[['male','blue'],['female','brown'], ['male','black']],columns=['gender','eyes']) df Out[16]: gender eye...
Ehr asked 4/5, 2018 at 13:27

2

Solved

I have a pandas dataframe with a categorical series that has missing categories. In the example shown below, group has the categories "a", "b", and "c", but there are ...
Glassworks asked 1/12, 2021 at 15:46

4

I am not able to import category_encoders module in jupyter notebook in python 3 virtual environment. Error --------------------------------------------------------------------------- ModuleNot...
Nicholnichola asked 19/1, 2019 at 9:29

2

The main goals are as follows: Apply StandardScaler to continuous variables Apply LabelEncoder and OnehotEncoder to categorical variables The continuous variables need to be scaled, but at the ...

1

Solved

IF we are not sure about the nature of categorical features like whether they are nominal or ordinal, which encoding should we use? Ordinal-Encoding or One-Hot-Encoding? Is there a clearly defined ...

2

I cannot merge dataframes and cannot understand why: Simple dataframe df1 = pd.DataFrame({'id': np.random.randint(1,5,100), 'c': np.random.random(100), 's': np.random.random(100)}) grouped to 3 ...
Cupcake asked 20/1, 2016 at 13:43

3

Solved

Seaborn's catplot does not seem to be able to work with plt.subplots(). Am not sure whats the issue here but i dont seem to be able to put them side by side. #Graph 1 plt.subplot(121) sns.ca...
Sampson asked 27/6, 2019 at 9:41

3

Solved

I would like to use the inverse_transform function for LabelEncoder on multiple columns. This is the code I use for more than one columns when applying LabelEncoder on a dataframe: class MultiCol...
Ripping asked 3/10, 2019 at 10:17

3

Solved

What is the perfect way to convert a categorical array to a simple numeric array? For example: using CategoricalArrays a = CategoricalArray(["X", "X", "Y", "Z&quo...
Spry asked 10/5, 2021 at 11:16

4

When using XGBoost we need to convert categorical variables into numeric. Would there be any difference in performance/evaluation metrics between the methods of: dummifying your categorical vari...
Connelly asked 14/12, 2015 at 10:48

© 2022 - 2025 — McMap. All rights reserved.