feature-engineering Questions

3

Solved

I was trying to figure out key differences between using GCP Vertex AI feature store and Saving preprocessed features to BigQuery and loading whenever it gets necessary. I still cannot understand w...

4

Problem Let's say we have a dataframe that looks like this: age job friends label 23 'engineer' ['World of Warcraft', 'Netflix', '9gag'] 1 35 'manager' NULL 0 ... If we are interested in training ...

1

Solved

I have a column in my Used cars price prediction dataset named "Owner_Type". It has four unique values which are ['First', 'Second', 'Third', 'Fourth']. Now the order that makes the most ...

3

The docs for sklearn.LabelEncoder start with This transformer should be used to encode target values, i.e. y, and not the input X. Why is this? I post just one example of this recommendation...
Burse asked 25/1, 2020 at 23:13

3

Solved

I have some categorical features in my data along with continuous ones. Is it a good or absolutely bad idea to hot encode category features to find correlation of it to labels along with other cont...

1

This might be a beginner question but I have seen a lot of people using LabelEncoder() to replace categorical variables with ordinality. A lot of people using this feature by passing multiple colum...

2

I am facing a binary prediction task and have a set of features of which all are categorical. A key challenge is therefore to encode those categorical features to numbers and I was looking for smar...

3

Solved

I have a dataframe Date repair <date> <dbl> 2018-07-01 4420 2018-07-02 NA 2018-07-03 NA 2018-07-04 NA 2018-07-05 NA Where 4420 is time in minutes. I'm trying to get this: ...
Finney asked 6/2, 2019 at 13:52

0

I am trying to make data preparation using pyspark involving among others steps such as string indexing, one hot encoding and quantile discretising. My data frame has quite many columns (1 thousand...
Azpurua asked 16/11, 2017 at 13:42

1

Solved

In the Mllib version of Random Forest there was a possibility to specify the columns with nominal features (numerical but still categorical variables) with parameter categoricalFeaturesInfo What's...

1

Solved

So I have two sets of features that I wish to bin (classify) and then combine to create a new feature. It is not unlike classifying coordinates into grids on a map. The issue is that the features ...
Person asked 15/4, 2017 at 6:26

2

Solved

I'm starting to use the scikit-learn to do some NLP. I've already used some classifiers from NLTK and now I want to try the ones implemented in scikit-learn. My data is basically sentences, and I...
1

© 2022 - 2024 — McMap. All rights reserved.