scikit-learn Questions

3

Solved

I have been trying to use the scikit-learn library to solve this problem. Roughly: from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression # Make or ...

3

I have a dataset which has a DateTime index and I'm using PCA from sklearn to reduce the number of dimensions. The following question bugs me - will PCA keep the order of the points in my series s...
Tour asked 1/2, 2017 at 13:50

4

I'm attempting to do a grid search to optimize my model but it's taking far too long to execute. My total dataset is only about 15,000 observations with about 30-40 variables. I was successfully ab...

11

Solved

I am trying to import imblearn into my python notebook after installing the required modules. However, I am getting the following error: Additional info: I am using a virtual environment in Visual...
Malvoisie asked 1/7, 2023 at 8:52

8

I wrote a text classification program. When I run the program it crashes with an error as seen in this screenshot: ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting t...
Passably asked 3/2, 2020 at 16:25

6

Solved

UPDATED: In the end, the solution I opted to use for clustering my large dataset was one suggested by Anony-Mousse below. That is, using ELKI's DBSCAN implimentation to do my clustering rather than...

3

Solved

I am trying in Amazon Sagemaker to deploy an existing Scikit-Learn model. So a model that wasn't trained on SageMaker, but locally on my machine. On my local (windows) machine I've saved my model a...
Tamelatameless asked 25/1, 2021 at 9:3

2

I am trying to implement bag of word model from kaggle site with a twitter sentiments data which has around 1M raw. I already clean it but in last part when I applied my features vectors and sentim...
Lustring asked 26/4, 2017 at 17:9

3

Solved

When passing x,y in fit, I am getting the following error: Traceback (most recent call last): File "C:/Classify/classifier.py", line 95, in train_avg, test_avg, cms = train_model(X, y, "cep...
Calycine asked 24/11, 2016 at 7:12

2

Solved

There are standard ways of predicting proportions such as logistic regression (without thresholding) and beta regression. There have already been discussions about this: http://scikit-learn-genera...
Yovonnda asked 29/5, 2017 at 4:38

4

One can create a multivariate kernel density estimate (KDE) with scikitlearn (https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html#sklearn.neighbors.KernelDensity)...
Dropwort asked 14/2, 2020 at 9:34

2

I would like in sklearn package, Find the gini coefficients for each feature on a class of paths such as in iris data. like Iris-virginica Petal length gini:0.4 ,Petal width gini:0.4.
Levitical asked 13/7, 2017 at 11:42

3

Solved

I am trying to make my git repository pip-installable. In preparation for that I am restructuring the repo to follow the right conventions. My understanding from looking at other repositories is th...
Sang asked 8/2, 2019 at 17:7

4

Solved

In scikit-learn, some clustering algorithms have both predict(X) and fit_predict(X) methods, like KMeans and MeanShift, while others only have the latter, like SpectralClustering. According to the ...
Jonette asked 9/5, 2016 at 2:25

2

Solved

I have this dataset with target LULUS, it's an imbalance dataset. I'm trying to print roc auc score if I could for each fold of my data but in every fold somehow it's always raise error saying Valu...
Antwanantwerp asked 29/5, 2021 at 16:7

4

I want to implement a custom loss function in scikit learn. I use the following code snippet: def my_custom_loss_func(y_true,y_pred): diff3=max((abs(y_true-y_pred))*y_true) return diff3 score=m...
Carmeliacarmelina asked 19/1, 2019 at 13:47

3

Solved

I got this from the sklearn webpage: Pipeline: Pipeline of transforms with a final estimator Make_pipeline: Construct a Pipeline from the given estimators. This is a shorthand for the Pipeline co...
Harriettharrietta asked 20/11, 2016 at 18:56

4

I understand that random_state is used in various sklearn algorithms to break tie between different predictors (trees) with same metric value (say for example in GradientBoosting). But the document...
Zwiebel asked 29/9, 2014 at 10:38

3

Solved

When importing sklearn datasets eg. from sklearn.datasets import fetch_mldata from sklearn.datasets import fetch_openml I get the error Traceback (most recent call last): File "numbers.py", l...
Thresher asked 11/3, 2019 at 19:19

2

Solved

I'm currently trying to train a linear model using sklearn in python but not with mean squared error (MSE) as error measure - but with mean absolute error (MAE). I specificially need a linear model...
Morpheus asked 17/5, 2018 at 13:31

5

Solved

Is there a way to retrieve the list of feature names used for training of a classifier, once it has been trained with the fit method? I would like to get this information before applying to unseen ...
Doolittle asked 8/11, 2016 at 11:6

4

I am trying to run some Machine learning algo on a dataset using scikit-learn. My dataset has some features which are like categories. Like one feature is A, which has values 1,2,3 specifying the q...

8

Solved

I am trying to plot a Receiver Operating Characteristics (ROC) curve with cross validation, following the example provided in sklearn's documentation. However, the following import gives an ImportE...
Salvo asked 20/2, 2020 at 13:44

2

Solved

I apply the decision tree classifier and the random forest classifier to my data with the following code: def decision_tree(train_X, train_Y, test_X, test_Y): clf = tree.DecisionTreeClassifier()...

3

I have a dataframe name data whose correlation matrix I computed by using corr = data.corr() If the correlation between two columns is greater than 0.75, I want to remove one of them from datafram...
Squires asked 3/7, 2017 at 15:39

© 2022 - 2024 — McMap. All rights reserved.