apache-spark-mllib - 2

apache-spark-mllib Questions

How to evaluate a classifier with PySpark 2.4.5

I'm wondering what the best way is to evaluate a fitted binary classification model using Apache Spark 2.4.5 and PySpark (Python). I want to consider different metrics such as accuracy, precision, ...

python apache-spark pyspark apache-spark-mllib evaluation

Insidious asked 20/3, 2020 at 10:23

Strange performance issue Spark LSH MinHash approxSimilarityJoin

I'm joining 2 datasets using Apache Spark ML LSH's approxSimilarityJoin method, but I'm seeings some strange behaviour. After the (inner) join the dataset is a bit skewed, however every time one o...

apache-spark duplicates apache-spark-mllib minhash lsh

Silken asked 18/7, 2018 at 13:47

How to use QuantileDiscretizer across groups in a DataFrame?

I have a DataFrame with the following columns. scala> show_times.printSchema root |-- account: string (nullable = true) |-- channel: string (nullable = true) |-- show_name: string (nullable ...

scala apache-spark apache-spark-sql apache-spark-mllib

Biparty asked 2/5, 2017 at 16:27

Solved

Spark train test split

I am curious if there is something similar to sklearn's http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedShuffleSplit.html for apache-spark in the latest 2.0.1 rel...

apache-spark apache-spark-mllib train-test-split

Towhead asked 12/10, 2016 at 9:2

How to extract best parameters from a CrossValidatorModel

I want to find the parameters of ParamGridBuilder that make the best model in CrossValidator in Spark 1.4.x, In Pipeline Example in Spark documentation, they add different parameters (numFeatures,...

scala apache-spark pipeline cross-validation apache-spark-mllib

Wylie asked 31/7, 2015 at 15:12

Using Jackson 2.9.9 in java Spark

I am trying to use the MLLIB library (java) but one of my dependencies uses Jackson 2.9.9. I noticed that a pull request was made such that the master branch's dependency is upgraded to this partic...

java apache-spark jackson apache-spark-mllib

Spermatozoon asked 16/8, 2019 at 6:26

Solved

pyspark - Convert sparse vector obtained after one hot encoding into columns

I am using apache Spark ML lib to handle categorical features using one hot encoding. After writing the below code I am getting a vector c_idx_vec as output of one hot encoding. I do understand how...

pyspark apache-spark-sql apache-spark-mllib apache-spark-ml one-hot-encoding

Popgun asked 19/6, 2018 at 14:48

Solved

How can I build a CoordinateMatrix in Spark using a DataFrame?

I am trying to use the Spark implementation of the ALS algorithm for recommendation systems, so I built the DataFrame depicted below, as training data: |--------------|--------------|-------------...

pyspark apache-spark-sql apache-spark-mllib collaborative-filtering

Haiphong asked 28/6, 2017 at 12:57

"empty collection" error when trying to load a saved Spark model using pyspark

I'm building a Random Forest model using Spark and I want to save it to use again later. I'm running this on pyspark (Spark 2.0.1) without HDFS, so the files are saved to the local file system. I'...

python apache-spark pyspark apache-spark-mllib

Criss asked 26/1, 2017 at 19:15

Is there no "inverse_transform" method for a scaler like MinMaxScaler in spark?

When train a model, say linear regression, we may make a normalization, like MinMaxScaler, on the train an test dataset. After we got a trained model and use it to make predictions, and scale back...

apache-spark machine-learning normalization apache-spark-mllib inverse-transform

Yolanda asked 7/9, 2017 at 8:59

Using DataFrame with MLlib

Let's say I have a DataFrame (that I read in from a csv on HDFS) and I want to train some algorithms on it via MLlib. How do I convert the rows into LabeledPoints or otherwise utilize MLlib on this...

apache-spark apache-spark-mllib

Sidero asked 31/3, 2015 at 20:17

How to create a Row from a List or Array in Spark using java

In Java, I use RowFactory.create() to create a Row: Row row = RowFactory.create(record.getLong(1), record.getInt(2), record.getString(3)); where "record" is a record from a database, but I canno...

java apache-spark apache-spark-mllib

Crosley asked 26/9, 2016 at 6:52

How to use XGboost in PySpark Pipeline

I want to update my code of pyspark. In the pyspark, it must put the base model in a pipeline, the office demo of pipeline use the LogistictRegression as an base model. However, it seems not be abl...

apache-spark pyspark apache-spark-mllib xgboost apache-spark-ml

Loughlin asked 30/5, 2018 at 10:26

Solved

How to prepare data into a LibSVM format from DataFrame?

I want to make libsvm format, so I made dataframe to the desired format, but I do not know how to convert to libsvm format. The format is as shown in the figure. I hope that the desired libsvm type...

apache-spark apache-spark-sql apache-spark-mllib libsvm apache-spark-ml

Heald asked 1/1, 2017 at 14:44

Solved

Gaussian Mixture Models: Difference between Spark MLlib and scikit-learn

I'm trying to use Gaussian Mixture models on a sample of a dataset. I used bothMLlib (with pyspark) and scikit-learn and get very different results, the scikit-learn one looking more realistic. f...

python apache-spark scikit-learn pyspark apache-spark-mllib

Nakada asked 18/6, 2018 at 18:49

Failed to execute user defined function($anonfun$9: (string) => double) on using String Indexer for multiple columns

I am trying to apply string indexer on multiple columns. Here is my code val stringIndexers = Categorical_Model.map { colName =>new StringIndexer().setInputCol(colName).setOutputCol(colName + "...

scala apache-spark apache-spark-mllib

Newspaperwoman asked 22/7, 2019 at 9:40

Solved

What is the difference between HashingTF and CountVectorizer in Spark?

Trying to do doc classification in Spark. I am not sure what the hashing does in HashingTF; does it sacrifice any accuracy? I doubt it, but I don't know. The spark doc says it uses the "hashing tri...

apache-spark apache-spark-mllib apache-spark-ml

Piccalilli asked 4/2, 2016 at 16:6

Create Custom Cross Validation in Spark ML

I am new to both Spark and PySpark Data Frames and ML. How can I create a custom cross validation for the ML library. I want for example change the way the training folds are formed, e.g. stratifie...

python scala apache-spark apache-spark-mllib

Quenelle asked 4/11, 2015 at 0:12

Apache Spark MLlib: How to import model from PMML

I have a PMML file which encodes a logistic regression model that was NOT exported from MLlib. How can I import the model from PMML using MLlib in Java for evaluation/prediction? (I know that MLl...

java apache-spark-mllib pmml

Matteson asked 29/1, 2017 at 11:58

How to handle categorical features with spark-ml?

How do I handle categorical data with spark-ml and not spark-mllib ? Thought the documentation is not very clear, it seems that classifiers e.g. RandomForestClassifier, LogisticRegression, have a ...

apache-spark categorical-data apache-spark-ml apache-spark-mllib

Wheezy asked 28/8, 2015 at 18:28

How to set a custom loss function in Spark MLlib

I would like to use my own loss function instead of the squared loss for the linear regression model in spark MLlib. So far can't find any part in the documentation that mentions if it is even poss...

scala apache-spark machine-learning regression apache-spark-mllib

Metzgar asked 14/11, 2017 at 17:34

Solved

Comparing two arrays and getting the difference in PySpark

I have two array fields in a data frame. I have a requirement to compare these two arrays and get the difference as an array(new column) in the same data frame. Expected output is: Column B ...

python pyspark apache-spark-sql apache-spark-mllib

Enrika asked 27/10, 2017 at 11:15

Polynomial regression in spark/ or external packages for spark

After investing good amount of searching on net for this topic, I am ending up here if I can get some pointer . please read further After analyzing Spark 2.0 I concluded polynomial regression is n...

machine-learning regression apache-spark-mllib

Renown asked 10/8, 2016 at 13:58

Solved

Null values from a csv on Scala and Apache Spark

I'm using Apache Spark 2.3.0. When I upload a csv file and then I put df.show it shows me the table with all null values and I would like to know why because everything looks fine in the csv val d...

scala csv apache-spark apache-spark-mllib

Conformance asked 11/10, 2018 at 16:36

Solved

How to serve a Spark MLlib model?

I'm evaluating tools for production ML based applications and one of our options is Spark MLlib , but I have some questions about how to serve a model once its trained? For example in Azure ML, o...

apache-spark machine-learning apache-spark-mllib

Joliejoliet asked 10/11, 2016 at 17:24

<　Previous 2 Next　>

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

apache-spark-mllib Questions

Recommended topics

Hot tags