feature-engineering Questions
3
Solved
I was trying to figure out key differences between using GCP Vertex AI feature store and Saving preprocessed features to BigQuery and loading whenever it gets necessary.
I still cannot understand w...
Nympholepsy asked 15/12, 2022 at 5:5
4
Problem
Let's say we have a dataframe that looks like this:
age job friends label
23 'engineer' ['World of Warcraft', 'Netflix', '9gag'] 1
35 'manager' NULL 0
...
If we are interested in training ...
Exponible asked 16/6, 2020 at 13:8
1
Solved
I have a column in my Used cars price prediction dataset named "Owner_Type". It has four unique values which are ['First', 'Second', 'Third', 'Fourth']. Now the order that makes the most ...
Dubitable asked 9/5, 2022 at 11:5
3
The docs for sklearn.LabelEncoder start with
This transformer should be used to encode target values, i.e. y, and not the input X.
Why is this?
I post just one example of this recommendation...
Burse asked 25/1, 2020 at 23:13
3
Solved
I have some categorical features in my data along with continuous ones. Is it a good or absolutely bad idea to hot encode category features to find correlation of it to labels along with other cont...
Meanie asked 30/9, 2017 at 0:37
1
This might be a beginner question but I have seen a lot of people using LabelEncoder() to replace categorical variables with ordinality. A lot of people using this feature by passing multiple colum...
Ukase asked 14/4, 2020 at 21:40
2
I am facing a binary prediction task and have a set of features of which all are categorical. A key challenge is therefore to encode those categorical features to numbers and I was looking for smar...
Unshakable asked 13/11, 2019 at 10:7
3
Solved
I have a dataframe
Date repair
<date> <dbl>
2018-07-01 4420
2018-07-02 NA
2018-07-03 NA
2018-07-04 NA
2018-07-05 NA
Where 4420 is time in minutes. I'm trying to get this:
...
Finney asked 6/2, 2019 at 13:52
0
I am trying to make data preparation using pyspark involving among others steps such as string indexing, one hot encoding and quantile discretising. My data frame has quite many columns (1 thousand...
Azpurua asked 16/11, 2017 at 13:42
1
Solved
In the Mllib version of Random Forest there was a possibility to specify the columns with nominal features (numerical but still categorical variables) with parameter categoricalFeaturesInfo
What's...
Tm asked 15/10, 2017 at 20:42
1
Solved
So I have two sets of features that I wish to bin (classify) and then combine to create a new feature. It is not unlike classifying coordinates into grids on a map.
The issue is that the features ...
Person asked 15/4, 2017 at 6:26
2
Solved
I'm starting to use the scikit-learn to do some NLP. I've already used some classifiers from NLTK and now I want to try the ones implemented in scikit-learn.
My data is basically sentences, and I...
Silken asked 24/8, 2012 at 2:0
1
© 2022 - 2024 — McMap. All rights reserved.