feature-selection Questions

1

Solved

There are various methods for doing automated feature selection in Scikit-learn. E.g. my_feature_selector = SelectKBest(score_func=f_regression, k=3) my_feature_selector.fit_transform(X, y) The se...
Bounty asked 22/9, 2023 at 19:20

2

I'm new in machine learning. I'm preparing my data for classification using Scikit Learn SVM. In order to select the best features I have used the following method: SelectKBest(chi2, k=10).fit_tran...

9

Solved

I'm trying to conduct a supervised machine-learning experiment using the SelectKBest feature of scikit-learn, but I'm not sure how to create a new dataframe after finding the best features: Let's a...

3

Solved

I am implementing a pipeline using important features selection and then using the same features to train my random forest classifier. Following is my code. m = ExtraTreesClassifier(n_estimators = ...
Dulciana asked 13/2, 2018 at 1:59

1

What is difference between xgboost.plot_importance() and model.feature_importances_ in XGBclassifier. so here I make some dummy data import numpy as np import pandas as pd # generate some random da...

1

Solved

For the code given below, I am getting different bar plots for the shap values. In this example, I have a dataset of 1000 train samples with 9 classes and 500 test samples. I then use the random fo...
Adit asked 12/8, 2022 at 4:17

2

Solved

I am working on UCI breast cancer dataset and trying to find the top 3 features that have highest weights. I was able to find the weight of all features using logmodel.coef_ but how can I get the f...

5

iw ould like to get a dataframe of important features. With the code below i have got the shap_values and i am not sure, what do the values mean. In my df are 142 features and 67 experiments, but g...
Tollhouse asked 1/1, 2021 at 22:15

2

Solved

I have a dataframe with over 280 features. I ran correlation map to detect groups of features that are highly correlated: Now, I want to divide the features to groups, such that each group will be...

7

I have a classification task with a time-series as the data input, where each attribute (n=23) represents a specific point in time. Besides the absolute classification result I would like to find o...
Somersomers asked 4/4, 2013 at 11:53

8

I'm trying to perform feature selection by evaluating my regressions coefficient outputs, and select the features with the highest magnitude coefficients. The problem is, I don't know how to get th...
Dustproof asked 15/11, 2014 at 23:14

2

Solved

I have a data-set that contains among other variables the time-stamp of the transaction in the format 26-09-2017 15:29:32. I need to find possible correlations and predictions of the sales (l...
Semidiurnal asked 26/9, 2017 at 14:15

3

Solved

In my classification scheme, there are several steps including: SMOTE (Synthetic Minority Over-sampling Technique) Fisher criteria for feature selection Standardization (Z-score normalisation) SVC...

4

I am trying to predict a binary (categorical) target from many continuous features, and would like to narrow your feature space before heading into model fitting. I noticed that the SelectKBest cla...
Silicious asked 15/4, 2018 at 22:44

3

Solved

When I plot the feature importance, I get this messy plot. I have more than 7000 variables. I understand the built-in function only selects the most important, although the final graph is unreadabl...

1

I'm currently using xgb.train(...) which returns a booster but I'd like to use RFE to select the best 100 features. The returned booster cannot be used in RFE as it's not a sklearn estimator. XGBCl...
Essentiality asked 22/2, 2021 at 1:30

2

Solved

I have fit a logistic regression model to my data. Imagine, I have four features: 1) which condition the participant received, 2) whether the participant had any prior knowledge/background about th...

6

Solved

After running a Variance Threshold from Scikit-Learn on a set of data, it removes a couple of features. I feel I'm doing something simple yet stupid, but I'd like to retain the names of the remaini...

2

I'm a bit confused - creating an ML model here. I'm at the step where I'm trying to take categorical features from a "large" dataframe (180 columns) and one-hot them so that I can find the correla...

3

Solved

While constructing each tree in the random forest using bootstrapped samples, for each terminal node, we select m variables at random from p variables to find the best split (p is the total number ...

3

Solved

For a project I am comparing a number of decision trees, using the regression algorithms (Random Forest, Extra Trees, Adaboost and Bagging) of scikit-learn. To compare and interpret them I use the ...

3

Solved

I am using Scikit-learn for text classification. I want to calculate the Information Gain for each attribute with respect to a class in a (sparse) document-term matrix. the Information Gain is def...

8

Solved

I am working with RandomForestRegressor in python and I want to create a chart that will illustrate the ranking of feature importance. This is the code I used: from sklearn.ensemble import RandomF...
Imbrue asked 21/5, 2017 at 20:26

2

As far as I know, in Bag Of Words method, features are a set of words and their frequency counts in a document. In another hand, N-grams, for example unigrams does exactly the same, but it does not...

3

Solved

I have performed a PCA analysis over my original dataset and from the compressed dataset transformed by the PCA I have also selected the number of PC I want to keep (they explain almost the 94% of ...

© 2022 - 2024 — McMap. All rights reserved.