data-science Questions

1

Solved

I'm very confused about these two methods which are: stack() and unstack() I know that I should use them in the case of multi-Indexes however, I need to know the following: 1- I don't know where I ...
Bilberry asked 11/9, 2021 at 0:3

6

Solved

I want to use sklearn's StandardScaler. Is it possible to apply it to some feature columns but not others? For instance, say my data is: data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18,...
Cafard asked 17/7, 2016 at 11:47

5

Solved

I've searched the pandas documentation and cookbook recipes and it's clear you can round to the nearest decimal place easily using dataframe.columnName.round(decimalplace). How do you do this wit...
Mahaliamahan asked 23/12, 2017 at 1:35

2

Solved

How to compare column names of 2 different Pandas data frame. I want to compare train and test data frames where there are some columns missing in test Data frames??
Rump asked 6/5, 2018 at 19:31

2

I am working on a sequential labeling problem with unbalanced classes and I would like to use sample_weight to resolve the unbalance issue. Basically if I train the model for about 10 epochs, I get...
Eschalot asked 18/1, 2018 at 6:30

1

Solved

I've been trying to resolve an exercise from a textbook where I am faced with the challenge of count different events between sequential stages of an industrial process. Information related to the ...
Gladwin asked 10/7, 2021 at 19:12

3

I am trying to load a serialized xgboost model from a pickle file. import pickle def load_pkl(fname): with open(fname, 'rb') as f: obj = pickle.load(f) return obj model = load_pkl('model_0_unre...

4

Solved

I'm trying to centre and normalise a data set in python with the following code mean = np.mean(train, axis=0) std = np.std(train, axis=0) norm_train = (train - mean) / std The problem is that I ...
Samuel asked 7/4, 2016 at 20:4

4

Solved

Reading the Kubernetes "Run to Completion" documentation, it says that jobs can be run in parallel, but is it possible to chain together a series of jobs that should be run in sequential order (par...
Should asked 13/9, 2018 at 20:59

4

Solved

I am taking my first steps with scikit library and found myself in need of backfilling only some columns in my data frame. I have read carefully the documentation but I still cannot figure out how...
Marylyn asked 13/8, 2019 at 10:31

1

I want to build a website and deploy it to github pages or heroku. My question is: is it possible to embed a LIVE (where I can run code) Google Colab notebook in the website i'll be hosting? I want...
Hellenize asked 1/8, 2020 at 18:42

4

Solved

I have an example data as: datetime col1 col2 col3 2021-04-10 01:00:00 25. 50. 50 2021-04-10 02:00:00. 25. 50. 50 2021-04-10 03:00:00. 25. 100. 50 2021-04-10 04:00:00 50. 50. 100 2021-04-10 05:00:0...
Infralapsarian asked 6/5, 2021 at 15:3

4

Solved

I have a pandas dataframe with a column containing of list values with example data as: datetime. column1 2021-04-10 00:03 00. [20.0, 21.6, 30.7] 2021-04-10 00:06 00. [10.0, 20.6, 20.7] 2021-04-10 ...
Edgerton asked 21/4, 2021 at 18:49

3

So I've seen a few answers on here that helped a bit, but my dataset is larger than the ones that have been answered previously. To give a sense of what I'm working with, here's a link to the full ...
Aerial asked 15/6, 2018 at 21:46

1

Solved

I am working on a data science regression problem with around 90,000 rows on train set and 8500 on test set. There are 9 categorical columns and no missing data. for this case, I am applied a...
Chantalchantalle asked 2/3, 2021 at 8:32

2

Solved

I'm loving Altair for creating choropleth maps! My biggest problem, however, is I cannot figure out how to change the size of the legend. I've read through the documentation and tried several thing...
Sturgis asked 31/3, 2019 at 17:23

1

Solved

I'm trying to replicate R's fitdist() results (reference, cannot modify R code) in Python using scipy.stats. The results are totally different. Does anyone know why? How can I replicate R's results...
Rena asked 14/1, 2021 at 13:20

8

Solved

If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning? I read this documentation for it: http://keras.io/layers/normalization/ I don't se...

1

Solved

so i have pasted my complete code for your reference, i want to know what's the use of ppf and cdf here? can you explain it? i did some research and found out that ppf(percent point function) is an...
Paradies asked 27/12, 2020 at 16:40

3

Solved

import pyspark from pyspark.sql import SparkSession from pyspark.conf import SparkConf import findspark from pyspark.sql.functions import countDistinct spark = SparkSession.builder \ .master("local...
Roundfaced asked 10/2, 2019 at 9:49

3

Solved

I have multi variate time series data, want to detect the anomalies with isolation forest algorithm. want to get best parameters from gridSearchCV, here is the code snippet of gridSearch CV. input...
Builtup asked 10/5, 2019 at 13:36

3

Solved

Does shapley support logistic regression models? Running the following code i get: logmodel = LogisticRegression() logmodel.fit(X_train,y_train) predictions = logmodel.predict(X_test) explainer =...

3

Solved

I have dataframe total_year, which contains three columns (year, action, comedy). How can I plot two columns (action and comedy) on y-axis? My code plots only one: total_year[-15:].plot(x='year', ...
Aran asked 12/12, 2017 at 14:38

1

Solved

I'm preprocessing my data before implementing a machine learning model. Some of the features are with high cardinality, like country and language. Since encoding those features as one-hot-vector ca...
Pentlandite asked 2/12, 2020 at 12:46

3

Solved

What is the difference between Kernel Ridge (from sklearn.kernel_ridge) with polynomial kernel and using PolynomialFeatures + Ridge (from sklearn.linear_model)?
Superheat asked 29/9, 2018 at 23:0

© 2022 - 2024 — McMap. All rights reserved.