data-science - 4

1

Solved

why and when should use a stack() and unstack() methods?

I'm very confused about these two methods which are: stack() and unstack() I know that I should use them in the case of multi-Indexes however, I need to know the following: 1- I don't know where I ...

python pandas stack data-science

Bilberry asked 11/9, 2021 at 0:3

6

Solved

Apply StandardScaler to parts of a data set [duplicate]

I want to use sklearn's StandardScaler. Is it possible to apply it to some feature columns but not others? For instance, say my data is: data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18,...

python pandas scikit-learn scale data-science

Cafard asked 17/7, 2016 at 11:47

5

Solved

Round to nearest 1000 in pandas

I've searched the pandas documentation and cookbook recipes and it's clear you can round to the nearest decimal place easily using dataframe.columnName.round(decimalplace). How do you do this wit...

python pandas data-science

Mahaliamahan asked 23/12, 2017 at 1:35

2

Solved

Compare column names of Pandas Dataframe

How to compare column names of 2 different Pandas data frame. I want to compare train and test data frames where there are some columns missing in test Data frames??

python pandas numpy machine-learning data-science

Rump asked 6/5, 2018 at 19:31

2

Using sample_weight in Keras for sequence labelling

I am working on a sequential labeling problem with unbalanced classes and I would like to use sample_weight to resolve the unbalance issue. Basically if I train the model for about 10 epochs, I get...

python deep-learning keras data-science

Eschalot asked 18/1, 2018 at 6:30

1

Solved

Conting events between sequential stages in a process using R

I've been trying to resolve an exercise from a textbook where I am faced with the challenge of count different events between sequential stages of an industrial process. Information related to the ...

r algorithm count data-science data-wrangling

Gladwin asked 10/7, 2021 at 19:12

3

Loading XGBoost model from pickle file. Error: 'XGBClassifier' object has no attribute 'use_label_encoder'

I am trying to load a serialized xgboost model from a pickle file. import pickle def load_pkl(fname): with open(fname, 'rb') as f: obj = pickle.load(f) return obj model = load_pkl('model_0_unre...

python machine-learning data-science xgboost amazon-sagemaker

Jeramey asked 30/4, 2021 at 1:34

4

Solved

Normalisation with a zero in the standard deviation

I'm trying to centre and normalise a data set in python with the following code mean = np.mean(train, axis=0) std = np.std(train, axis=0) norm_train = (train - mean) / std The problem is that I ...

python numpy statistics data-science

Samuel asked 7/4, 2016 at 20:4

4

Solved

Kubernetes can analytical jobs be chained together in a workflow?

Reading the Kubernetes "Run to Completion" documentation, it says that jobs can be run in parallel, but is it possible to chain together a series of jobs that should be run in sequential order (par...

kubernetes workflow pipeline jobs data-science

Should asked 13/9, 2018 at 20:59

4

Solved

How to transform some columns only with SimpleImputer or equivalent

I am taking my first steps with scikit library and found myself in need of backfilling only some columns in my data frame. I have read carefully the documentation but I still cannot figure out how...

python pandas scikit-learn data-science imputation

Marylyn asked 13/8, 2019 at 10:31

1

How to Embed a LIVE Colab Notebook in a website?

I want to build a website and deploy it to github pages or heroku. My question is: is it possible to embed a LIVE (where I can run code) Google Colab notebook in the website i'll be hosting? I want...

apache-spark web deployment data-science embed

Hellenize asked 1/8, 2020 at 18:42

4

Solved

A new column in pandas which value depends on other columns

I have an example data as: datetime col1 col2 col3 2021-04-10 01:00:00 25. 50. 50 2021-04-10 02:00:00. 25. 50. 50 2021-04-10 03:00:00. 25. 100. 50 2021-04-10 04:00:00 50. 50. 100 2021-04-10 05:00:0...

python pandas numpy data-science

Infralapsarian asked 6/5, 2021 at 15:3

4

Solved

Selecting the last element of a list inside a pandas dataframe

I have a pandas dataframe with a column containing of list values with example data as: datetime. column1 2021-04-10 00:03 00. [20.0, 21.6, 30.7] 2021-04-10 00:06 00. [10.0, 20.6, 20.7] 2021-04-10 ...

python pandas numpy data-science

Edgerton asked 21/4, 2021 at 18:49

3

How to create a historical timeline with Python

So I've seen a few answers on here that helped a bit, but my dataset is larger than the ones that have been answered previously. To give a sense of what I'm working with, here's a link to the full ...

python matplotlib data-science

Aerial asked 15/6, 2018 at 21:46

1

Solved

How to improve the catboostregressor? [closed]

I am working on a data science regression problem with around 90,000 rows on train set and 8500 on test set. There are 9 categorical columns and no missing data. for this case, I am applied a...

machine-learning data-science catboost

Chantalchantalle asked 2/3, 2021 at 8:32

2

Solved

Changing Size of Legend in Altair

I'm loving Altair for creating choropleth maps! My biggest problem, however, is I cannot figure out how to change the size of the legend. I've read through the documentation and tried several thing...

python gis data-science vega-lite altair

Sturgis asked 31/3, 2019 at 17:23

1

Solved

Pareto distribution: R vs Python - different results

I'm trying to replicate R's fitdist() results (reference, cannot modify R code) in Python using scipy.stats. The results are totally different. Does anyone know why? How can I replicate R's results...

python r scipy data-science

Rena asked 14/1, 2021 at 13:20

8

Solved

Where do I call the BatchNormalization function in Keras?

If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning? I read this documentation for it: http://keras.io/layers/normalization/ I don't se...

python keras neural-network data-science batch-normalization

Balloon asked 11/1, 2016 at 7:47

1

Solved

norm.ppf vs norm.cdf in python's scipy.stats

so i have pasted my complete code for your reference, i want to know what's the use of ppf and cdf here? can you explain it? i did some research and found out that ppf(percent point function) is an...

python numpy data-science hypothesis-test scipy.stats

Paradies asked 27/12, 2020 at 16:40

3

Solved

Spark DataFrame limit function takes too much time to show

import pyspark from pyspark.sql import SparkSession from pyspark.conf import SparkConf import findspark from pyspark.sql.functions import countDistinct spark = SparkSession.builder \ .master("local...

python-3.x pyspark bigdata data-science

Roundfaced asked 10/2, 2019 at 9:49

3

Solved

Isolation Forest Parameter tuning with gridSearchCV

I have multi variate time series data, want to detect the anomalies with isolation forest algorithm. want to get best parameters from gridSearchCV, here is the code snippet of gridSearch CV. input...

python-3.x scikit-learn data-science

Builtup asked 10/5, 2019 at 13:36

3

Solved

Shapley for Logistic regression?

Does shapley support logistic regression models? Running the following code i get: logmodel = LogisticRegression() logmodel.fit(X_train,y_train) predictions = logmodel.predict(X_test) explainer =...

python machine-learning data-science logistic-regression shap

Aquacade asked 27/2, 2020 at 13:26

3

Solved

How to plot multiple pandas columns

I have dataframe total_year, which contains three columns (year, action, comedy). How can I plot two columns (action and comedy) on y-axis? My code plots only one: total_year[-15:].plot(x='year', ...

python pandas matplotlib plot data-science

Aran asked 12/12, 2017 at 14:38

1

Solved

Understanding FeatureHasher, collisions and vector size trade-off

I'm preprocessing my data before implementing a machine learning model. Some of the features are with high cardinality, like country and language. Since encoding those features as one-hot-vector ca...

python machine-learning data-science

Pentlandite asked 2/12, 2020 at 12:46

3

Solved

Kernel ridge and simple Ridge with Polynomial features

What is the difference between Kernel Ridge (from sklearn.kernel_ridge) with polynomial kernel and using PolynomialFeatures + Ridge (from sklearn.linear_model)?

python scikit-learn data-science

Superheat asked 29/9, 2018 at 23:0

data-science Questions

Recommended topics

Hot tags