feature-selection Questions
1
Solved
There are various methods for doing automated feature selection in Scikit-learn.
E.g.
my_feature_selector = SelectKBest(score_func=f_regression, k=3)
my_feature_selector.fit_transform(X, y)
The se...
Bounty asked 22/9, 2023 at 19:20
2
I'm new in machine learning. I'm preparing my data for classification using Scikit Learn SVM. In order to select the best features I have used the following method:
SelectKBest(chi2, k=10).fit_tran...
Consolation asked 11/9, 2014 at 15:53
9
Solved
I'm trying to conduct a supervised machine-learning experiment using the SelectKBest feature of scikit-learn, but I'm not sure how to create a new dataframe after finding the best features:
Let's a...
Fifine asked 3/10, 2016 at 19:35
3
Solved
I am implementing a pipeline using important features selection and then using the same features to train my random forest classifier. Following is my code.
m = ExtraTreesClassifier(n_estimators = ...
Dulciana asked 13/2, 2018 at 1:59
1
What is difference between xgboost.plot_importance() and model.feature_importances_ in XGBclassifier.
so here I make some dummy data
import numpy as np
import pandas as pd
# generate some random da...
Gigi asked 11/8, 2022 at 9:4
1
Solved
For the code given below, I am getting different bar plots for the shap values.
In this example, I have a dataset of 1000 train samples with 9 classes and 500 test samples. I then use the random fo...
Adit asked 12/8, 2022 at 4:17
2
Solved
I am working on UCI breast cancer dataset and trying to find the top 3 features that have highest weights. I was able to find the weight of all features using logmodel.coef_ but how can I get the f...
Lowminded asked 23/4, 2017 at 21:20
5
iw ould like to get a dataframe of important features. With the code below i have got the shap_values and i am not sure, what do the values mean. In my df are 142 features and 67 experiments, but g...
Tollhouse asked 1/1, 2021 at 22:15
2
Solved
I have a dataframe with over 280 features.
I ran correlation map to detect groups of features that are highly correlated:
Now, I want to divide the features to groups, such that each group will be...
Remiss asked 19/10, 2020 at 9:34
7
I have a classification task with a time-series as the data input, where each attribute (n=23) represents a specific point in time. Besides the absolute classification result I would like to find o...
Somersomers asked 4/4, 2013 at 11:53
8
I'm trying to perform feature selection by evaluating my regressions coefficient outputs, and select the features with the highest magnitude coefficients. The problem is, I don't know how to get th...
Dustproof asked 15/11, 2014 at 23:14
2
Solved
I have a data-set that contains among other variables the time-stamp of the transaction in the format 26-09-2017 15:29:32. I need to find possible correlations and predictions of the sales (l...
Semidiurnal asked 26/9, 2017 at 14:15
3
Solved
In my classification scheme, there are several steps including:
SMOTE (Synthetic Minority Over-sampling Technique)
Fisher criteria for feature selection
Standardization (Z-score normalisation)
SVC...
Doi asked 7/7, 2015 at 4:44
4
I am trying to predict a binary (categorical) target from many continuous features, and would like to narrow your feature space before heading into model fitting. I noticed that the SelectKBest cla...
Silicious asked 15/4, 2018 at 22:44
3
Solved
When I plot the feature importance, I get this messy plot. I have more than 7000 variables. I understand the built-in function only selects the most important, although the final graph is unreadabl...
Hodman asked 18/8, 2018 at 5:22
1
I'm currently using xgb.train(...) which returns a booster but I'd like to use RFE to select the best 100 features. The returned booster cannot be used in RFE as it's not a sklearn estimator. XGBCl...
Essentiality asked 22/2, 2021 at 1:30
2
Solved
I have fit a logistic regression model to my data. Imagine, I have four features: 1) which condition the participant received, 2) whether the participant had any prior knowledge/background about th...
Alit asked 24/6, 2018 at 1:7
6
Solved
After running a Variance Threshold from Scikit-Learn on a set of data, it removes a couple of features. I feel I'm doing something simple yet stupid, but I'd like to retain the names of the remaini...
Kr asked 2/10, 2016 at 0:56
2
I'm a bit confused - creating an ML model here.
I'm at the step where I'm trying to take categorical features from a "large" dataframe (180 columns) and one-hot them so that I can find the correla...
Fatma asked 14/11, 2019 at 23:47
3
Solved
While constructing each tree in the random forest using bootstrapped samples, for each terminal node, we select m variables at random from p variables to find the best split (p is the total number ...
Apogee asked 29/5, 2014 at 17:52
3
Solved
For a project I am comparing a number of decision trees, using the regression algorithms (Random Forest, Extra Trees, Adaboost and Bagging) of scikit-learn.
To compare and interpret them I use the ...
Solarium asked 2/6, 2017 at 16:29
3
Solved
I am using Scikit-learn for text classification. I want to calculate the Information Gain for each attribute with respect to a class in a (sparse) document-term matrix.
the Information Gain is def...
Gadmann asked 15/10, 2017 at 7:17
8
Solved
I am working with RandomForestRegressor in python and I want to create a chart that will illustrate the ranking of feature importance. This is the code I used:
from sklearn.ensemble import RandomF...
Imbrue asked 21/5, 2017 at 20:26
2
As far as I know, in Bag Of Words method, features are a set of words and their frequency counts in a document. In another hand, N-grams, for example unigrams does exactly the same, but it does not...
Lesko asked 31/7, 2018 at 20:10
3
Solved
I have performed a PCA analysis over my original dataset and from the compressed dataset transformed by the PCA I have also selected the number of PC I want to keep (they explain almost the 94% of ...
Vapor asked 11/6, 2018 at 10:49
1 Next >
© 2022 - 2024 — McMap. All rights reserved.