data-science - 3

1

Interpreting XGB feature importance and SHAP values

For a particular prediction problem, I observed that a certain variable ranks high in the XGBoost feature importance that gets generated (on the basis of Gain) while it ranks quite low in the SHAP ...

machine-learning data-science classification xgboost shap

Gangue asked 15/6, 2022 at 6:0

2

Solved

Using scikit Pipeline for testing models but preprocessing data only once

Suppose I have a pipeline for my data which does preprocessing and has an estimator at the end. Now if I want to just change the estimator/model at the last step of the pipeline, how do I do it wit...

python machine-learning scikit-learn deep-learning data-science

Supplant asked 20/11, 2017 at 5:52

3

Solved

Remove special characters from entire dataframe in R

Question: How can you use R to remove all special characters from a dataframe, quickly and efficiently? Progress: This SO post details how to remove special characters. I can apply the gsub fu...

r data-science data-cleaning

Memphis asked 17/4, 2018 at 20:18

2

Solved

Pandas dataframe divide features to group of high correlation

I have a dataframe with over 280 features. I ran correlation map to detect groups of features that are highly correlated: Now, I want to divide the features to groups, such that each group will be...

pandas dataframe data-science feature-selection yellowbrick

Remiss asked 19/10, 2020 at 9:34

1

Training New AutoTokenizer Hugging Face

Getting this error: AttributeError: 'GPT2Tokenizer' object has no attribute 'train_new_from_iterator' Very similar to hugging face documentation. I changed the input and that's it (shouldn't affe...

python nlp data-science huggingface-transformers transformer-model

Harlen asked 22/4, 2022 at 20:43

1

SageMaker Endpoint stuck at "Creating"

I'm trying to deploy a SageMaker endpoint and it gets stuck in "Creating" stage indefinitely. Below is my Dockerfile and training / serving script. The model trains without any issue. Onl...

amazon-web-services data-science amazon-sagemaker

Boabdil asked 12/1, 2021 at 4:57

5

Scikit-learn's LabelBinarizer vs. OneHotEncoder

What is the difference between the two? It seems that both create new columns, which their number is equal to the number of unique categories in the feature. Then they assign 0 and 1 to data points...

python encoding scikit-learn data-science categorical-data

Puduns asked 22/5, 2018 at 17:25

3

Solved

GridSearchCV - XGBoost - Early Stopping

i am trying to do hyperparemeter search with using scikit-learn's GridSearchCV on XGBoost. During gridsearch i'd like it to early stop, since it reduce search time drastically and (expecting to) ha...

python-3.x scikit-learn regression data-science xgboost

Wrens asked 24/3, 2017 at 7:15

6

How to read an ORC file stored locally in Python Pandas?

Can I think of an ORC file as similar to a CSV file with column headings and row labels containing data? If so, can I somehow read it into a simple pandas dataframe? I am not that familiar with too...

python pandas pyspark data-science orc

Dreamy asked 19/10, 2018 at 9:33

3

Solved

Using cross_val_predict against test data set

I'm confused about using cross_val_predict in a test data set. I created a simple Random Forest model and used cross_val_predict to make predictions: from sklearn.ensemble import RandomForestClassi...

python machine-learning scikit-learn data-science

Glyptography asked 10/1, 2017 at 2:28

6

Solved

How to load a model from an HDF5 file in Keras?

How to load a model from an HDF5 file in Keras? What I tried: model = Sequential() model.add(Dense(64, input_dim=14, init='uniform')) model.add(LeakyReLU(alpha=0.3)) model.add(BatchNormalization...

python machine-learning keras data-science

Unmistakable asked 29/1, 2016 at 0:3

3

what does black lines on a seaborn barplot mean?

I plotted data on a barplot using seaborn library. But on the top of the bars, I can see some black lines. Can someone explain me what does it mean? Note : the last bar does not have this line as ...

python bar-chart visualization seaborn data-science

Shuffleboard asked 13/10, 2019 at 9:58

4

Solved

SVC classifier taking too much time for training

I am using SVC classifier with Linear kernel to train my model. Train data: 42000 records model = SVC(probability=True) model.fit(self.features_train, self.labels_train) y_pred = model.predict(...

machine-learning deep-learning data-science

Woolpack asked 27/12, 2018 at 5:43

3

Solved

Expected 2D array, got 1D array instead error

Iam getting the error as "ValueError: Expected 2D array, got 1D array instead: array=[ 45000. 50000. 60000. 80000. 110000. 150000. 200000. 300000. 500000. 1000000.]. Reshape your data either...

python machine-learning data-science

Lessen asked 24/10, 2018 at 5:52

1

Solved

What is different between DataLoader and DataLoader2 in PyTorch?

I developed a custom dataset by using the PyTorch dataset class. The code is like that: class CustomDataset(torch.utils.data.Dataset): def __init__(self, root_path, transform=None): self.path = ...

python deep-learning pytorch data-science

Refrigeration asked 26/1, 2022 at 15:10

1

Solved

split geometric progression efficiently in Python (Pythonic way)

I am trying to achieve a calculation involving geometric progression (split). Is there any effective/efficient way of doing it. The data set has millions of rows. I need the column "Traded_qua...

python math data-science

Moen asked 22/1, 2022 at 7:31

2

Solved

Is there a way to return float or integer from a conditional True/False

n_level = range(1, steps + 2) steps is user input, using multi-index dataframe df = {'crest': [754, 755, 762, 785], 'trough': [752, 725, 759, 765], 'L1T': [761, 761, 761, 761], 'L2T': [772, 772, ...

pandas data-science python-3.7 physics

Haven asked 6/1, 2022 at 0:55

8

Solved

ValueError: Wrong number of items passed - Meaning and suggestions?

I am receiving the error: ValueError: Wrong number of items passed 3, placement implies 1, and I am struggling to figure out where, and how I may begin addressing the problem. I don't really under...

python pandas prediction data-science

Beaux asked 4/4, 2017 at 1:35

1

"AssertionError: Cannot handle batch sizes > 1 if no padding token is > defined" and pad_token = eos_token

I am trying to finetune a pre-trained GPT2-model. When applying the respective tokenizer, I originally got the error message: Using pad_token, but it is not set yet. Thus, I changed my code to: G...

python neural-network pytorch data-science

Southeaster asked 22/6, 2021 at 13:19

3

Pandera validate get all valid rows

I am trying to use pandera library (I am very new with this) for pandas dataframe validation. What I want to do is to ignore the rows which are not valid as per the schema. How can I do that? for e...

python pandas data-science pandera

Optimism asked 12/11, 2021 at 19:11

4

Cannot import category_encoders module

I am not able to import category_encoders module in jupyter notebook in python 3 virtual environment. Error --------------------------------------------------------------------------- ModuleNot...

python encoding data-science categorical-data

Nicholnichola asked 19/1, 2019 at 9:29

2

Axis must have `freq` set to convert to Periods | Seasonal_Decompose

I have a temp DF that has the following data in it Quarter 2016Q3 146660510.0 2016Q4 123641451.0 2017Q1 125905843.0 2017Q2 129656327.0 2017Q3 126586708.0 2017Q4 116804168.0 2018Q1 118167263.0 2018Q...

python matplotlib data-science data-analysis decomposition

Reuter asked 28/1, 2021 at 15:0

3

Solved

Equivalent Python code for mutate_if from tidyverse

I'm an avid R user and am learning python along the way. One of the example code that I can easily run in R is perplexing me in Python. Here's the original data (constructed within R): library(ti...

python r data-science

Alms asked 26/2, 2019 at 18:31

3

Solved

Removing non-English words from text using Python

I am doing a data cleaning exercise on python and the text that I am cleaning contains Italian words which I would like to remove. I have been searching online whether I would be able to do this on...

python data-science data-cleaning

Mikkimiko asked 22/12, 2016 at 19:0

1

Solved

TypeError: import_optional_dependency() got an unexpected keyword argument 'errors'

I am trying to work with Featuretools to develop an automated feature engineering workflow for the customer churn dataset. The end outcome is a function that takes in a dataset and label times for ...

python matplotlib data-science

Medlock asked 12/9, 2021 at 4:40

data-science Questions

Recommended topics

Hot tags