feature-selection

1

Solved

How to manually select features for Scikit-Learn model regression?

There are various methods for doing automated feature selection in Scikit-learn. E.g. my_feature_selector = SelectKBest(score_func=f_regression, k=3) my_feature_selector.fit_transform(X, y) The se...

python scikit-learn pipeline feature-selection

Bounty asked 22/9, 2023 at 19:20

2

Feature selection using scikit-learn

I'm new in machine learning. I'm preparing my data for classification using Scikit Learn SVM. In order to select the best features I have used the following method: SelectKBest(chi2, k=10).fit_tran...

python machine-learning scikit-learn feature-selection chi-squared

Consolation asked 11/9, 2014 at 15:53

9

Solved

The easiest way for getting feature names after running SelectKBest in Scikit Learn

I'm trying to conduct a supervised machine-learning experiment using the SelectKBest feature of scikit-learn, but I'm not sure how to create a new dataframe after finding the best features: Let's a...

python pandas scikit-learn feature-extraction feature-selection

Fifine asked 3/10, 2016 at 19:35

3

Solved

All intermediate steps should be transformers and implement fit and transform

I am implementing a pipeline using important features selection and then using the same features to train my random forest classifier. Following is my code. m = ExtraTreesClassifier(n_estimators = ...

python machine-learning scikit-learn feature-selection

Dulciana asked 13/2, 2018 at 1:59

1

What is difference between xgboost.plot_importance() and model.feature_importances_ XGBclassifier

What is difference between xgboost.plot_importance() and model.feature_importances_ in XGBclassifier. so here I make some dummy data import numpy as np import pandas as pd # generate some random da...

python xgboost feature-selection dimensionality-reduction xgbclassifier

Gigi asked 11/8, 2022 at 9:4

1

Solved

Difference between shap.TreeExplainer and shap.Explainer bar charts

For the code given below, I am getting different bar plots for the shap values. In this example, I have a dataset of 1000 train samples with 9 classes and 500 test samples. I then use the random fo...

python-3.x random-forest feature-selection shap

Adit asked 12/8, 2022 at 4:17

2

Solved

Logistic Regression: How to find top three feature that have highest weights?

I am working on UCI breast cancer dataset and trying to find the top 3 features that have highest weights. I was able to find the weight of all features using logmodel.coef_ but how can I get the f...

python machine-learning scikit-learn logistic-regression feature-selection

Lowminded asked 23/4, 2017 at 21:20

5

Get a feature importance from SHAP Values

iw ould like to get a dataframe of important features. With the code below i have got the shap_values and i am not sure, what do the values mean. In my df are 142 features and 67 experiments, but g...

python random-forest feature-selection

Tollhouse asked 1/1, 2021 at 22:15

2

Solved

Pandas dataframe divide features to group of high correlation

I have a dataframe with over 280 features. I ran correlation map to detect groups of features that are highly correlated: Now, I want to divide the features to groups, such that each group will be...

pandas dataframe data-science feature-selection yellowbrick

Remiss asked 19/10, 2020 at 9:34

7

How are feature_importances in RandomForestClassifier determined?

I have a classification task with a time-series as the data input, where each attribute (n=23) represents a specific point in time. Besides the absolute classification result I would like to find o...

scikit-learn random-forest feature-selection

Somersomers asked 4/4, 2013 at 11:53

8

Scikit-Learn Linear Regression how to get coefficient's respective features?

I'm trying to perform feature selection by evaluating my regressions coefficient outputs, and select the features with the highest magnitude coefficients. The problem is, I don't know how to get th...

scikit-learn linear-regression feature-selection

Dustproof asked 15/11, 2014 at 23:14

2

Solved

How to handle date variable in machine learning data pre-processing [closed]

I have a data-set that contains among other variables the time-stamp of the transaction in the format 26-09-2017 15:29:32. I need to find possible correlations and predictions of the sales (l...

python r machine-learning logistic-regression feature-selection

Semidiurnal asked 26/9, 2017 at 14:15

3

Solved

Put customized functions in Sklearn pipeline

In my classification scheme, there are several steps including: SMOTE (Synthetic Minority Over-sampling Technique) Fisher criteria for feature selection Standardization (Z-score normalisation) SVC...

machine-learning scikit-learn pipeline cross-validation feature-selection

Doi asked 7/7, 2015 at 4:44

4

Using chi2 test for feature selection with continuous features (Scikit Learn)

I am trying to predict a binary (categorical) target from many continuous features, and would like to narrow your feature space before heading into model fitting. I noticed that the SelectKBest cla...

python scikit-learn feature-selection chi-squared

Silicious asked 15/4, 2018 at 22:44

3

Solved

Plot feature importance with xgboost

When I plot the feature importance, I get this messy plot. I have more than 7000 variables. I understand the built-in function only selects the most important, although the final graph is unreadabl...

python matplotlib machine-learning xgboost feature-selection

Hodman asked 18/8, 2018 at 5:22

1

How to use RFE with xgboost Booster?

I'm currently using xgb.train(...) which returns a booster but I'd like to use RFE to select the best 100 features. The returned booster cannot be used in RFE as it's not a sklearn estimator. XGBCl...

python scikit-learn xgboost feature-selection lightgbm

Essentiality asked 22/2, 2021 at 1:30

2

Solved

Interpreting logistic regression feature coefficient values in sklearn

I have fit a logistic regression model to my data. Imagine, I have four features: 1) which condition the participant received, 2) whether the participant had any prior knowledge/background about th...

python scikit-learn logistic-regression feature-selection coefficients

Alit asked 24/6, 2018 at 1:7

6

Solved

Retain feature names after Scikit Feature Selection

After running a Variance Threshold from Scikit-Learn on a set of data, it removes a couple of features. I feel I'm doing something simple yet stupid, but I'd like to retain the names of the remaini...

python pandas scikit-learn output feature-selection

Kr asked 2/10, 2016 at 0:56

2

SciKit-Learn Label Encoder resulting in error 'argument must be a string or number'

I'm a bit confused - creating an ML model here. I'm at the step where I'm trying to take categorical features from a "large" dataframe (180 columns) and one-hot them so that I can find the correla...

python machine-learning scikit-learn feature-selection one-hot-encoding

Fatma asked 14/11, 2019 at 23:47

3

Solved

Understanding max_features parameter in RandomForestRegressor

While constructing each tree in the random forest using bootstrapped samples, for each terminal node, we select m variables at random from p variables to find the best split (p is the total number ...

machine-learning scikit-learn random-forest feature-selection

Apogee asked 29/5, 2014 at 17:52

3

Solved

Feature importances - Bagging, scikit-learn

For a project I am comparing a number of decision trees, using the regression algorithms (Random Forest, Extra Trees, Adaboost and Bagging) of scikit-learn. To compare and interpret them I use the ...

machine-learning scikit-learn decision-tree feature-selection

Solarium asked 2/6, 2017 at 16:29

3

Solved

Information Gain calculation with Scikit-learn

I am using Scikit-learn for text classification. I want to calculate the Information Gain for each attribute with respect to a class in a (sparse) document-term matrix. the Information Gain is def...

python machine-learning scikit-learn text-classification feature-selection

Gadmann asked 15/10, 2017 at 7:17

8

Solved

Random Forest Feature Importance Chart using Python

I am working with RandomForestRegressor in python and I want to create a chart that will illustrate the ranking of feature importance. This is the code I used: from sklearn.ensemble import RandomF...

python plot random-forest feature-selection

Imbrue asked 21/5, 2017 at 20:26

2

Bag of Words (BOW) vs N-gram (sklearn CountVectorizer) - text documents classification

As far as I know, in Bag Of Words method, features are a set of words and their frequency counts in a document. In another hand, N-grams, for example unigrams does exactly the same, but it does not...

python scikit-learn feature-extraction feature-selection n-gram

Lesko asked 31/7, 2018 at 20:10

3

Solved

Feature/Variable importance after a PCA analysis

I have performed a PCA analysis over my original dataset and from the compressed dataset transformed by the PCA I have also selected the number of PC I want to keep (they explain almost the 94% of ...

python machine-learning scikit-learn pca feature-selection

Vapor asked 11/6, 2018 at 10:49

feature-selection Questions

Recommended topics

Hot tags