apache-spark-mllib - 5

3

Solved

Dealing with unbalanced datasets in Spark MLlib

I'm working on a particular binary classification problem with a highly unbalanced dataset, and I was wondering if anyone has tried to implement specific techniques for dealing with unbalanced data...

apache-spark machine-learning classification apache-spark-mllib

Butyraceous asked 27/10, 2015 at 16:4

1

Handling continuous data in Spark NaiveBayes

As per official documentation of Spark NaiveBayes: It supports Multinomial NB (see here) which can handle finitely supported discrete data. How can I handle continuous data (for example: perc...

apache-spark apache-spark-mllib naivebayes

Kibe asked 11/8, 2017 at 4:0

1

Solved

how to add a Incremental column ID for a table in spark SQL

I'm working on a spark mllib algorithm. The dataset I have is in this form Company":"XXXX","CurrentTitle":"XYZ","Edu_Title":"ABC","Exp_mnth":.(there are more values similar to these) Im trying t...

apache-spark apache-spark-sql apache-spark-mllib

Hanni asked 14/7, 2016 at 14:36

2

Solved

Spark Multiclass Classification Example

Do you guys know where can I find examples of multiclass classification in Spark. I spent a lot of time searching in books and in the web, and so far I just know that it is possible since the lates...

scala apache-spark apache-spark-mllib random-forest apache-spark-ml

Clapper asked 15/8, 2015 at 21:2

3

Solved

ClassCastException: org.apache.spark.ml.linalg.DenseVector cannot be cast to org.apache.spark.mllib.linalg.Vector

Can somebody please help me out with below error? I am trying to convert dataframe to rdd so that it can be used for regression model building. SPARK VERSION : 2.0.0 Error => ClassCastException...

apache-spark apache-spark-mllib

Till asked 18/10, 2016 at 13:35

1

Solved

what is raw prediction in Logistic Regression in spark mllib?

I have run binary logistic regression using spark mllib. As per documentation of spark mllib, RawPrediction are confidence values, which i assume probability for lcl and ucl. I am getting -ve value...

apache-spark apache-spark-mllib logistic-regression

Benedetto asked 30/4, 2017 at 18:32

2

Spark HashingTF result explanation

I tried standard spark HashingTF example on DataBricks. import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer} val sentenceData = spark.createDataFrame(Seq( (0, "Hi I heard about Spark")...

scala apache-spark apache-spark-mllib apache-spark-ml

Morbilli asked 14/12, 2016 at 22:13

1

Solved

How to access parameters of the underlying model in ML Pipeline?

I have a DataFrame that is processed with LinearRegression. If I do it directly, like below, I can display the details of the model: val lr = new LinearRegression() val lrModel = lr.fit(df) lrMod...

scala apache-spark apache-spark-mllib

Zachery asked 20/7, 2017 at 21:10

2

Solved

How to print the probability of prediction in LogisticRegressionWithLBFGS for pyspark

I am using Spark 1.5.1 and, In pyspark, after I fit the model using: model = LogisticRegressionWithLBFGS.train(parsedData) I can print the prediction using: model.predict(p.features) Is the...

apache-spark machine-learning pyspark apache-spark-mllib logistic-regression

Rockery asked 6/11, 2015 at 6:33

4

Solved

PySpark & MLLib: Class Probabilities of Random Forest Predictions

I'm trying to extract the class probabilities of a random forest object I have trained using PySpark. However, I do not see an example of it anywhere in the documentation, nor is it a a method of R...

apache-spark pyspark random-forest apache-spark-mllib

Trautman asked 2/3, 2015 at 20:15

1

What is the relation between numFeatures in HashingTF in Spark MLlib and actual number of terms in a document?

Is there any relation between numFeatures in HashingTF in Spark MLlib and the actual number of terms in a document(sentence)? List<Row> data = Arrays.asList( RowFactory.create(0.0, "Hi I he...

apache-spark machine-learning apache-spark-mllib tf-idf

Propylite asked 7/7, 2017 at 8:47

2

Is there any means to serialize custom Transformer in Spark ML Pipeline

I use ML pipeline with various custom UDF-based transformers. What I'm looking for is a way to serialize/deserialize this pipeline. I serialize the PipelineModel using ObjectOutputStream.write(...

serialization apache-spark apache-spark-sql apache-spark-mllib

Clastic asked 27/10, 2016 at 12:8

1

Solved

How to understand the format type of libsvm of Spark MLlib?

I am new for learning Spark MLlib. When I was reading about the example of Binomial logistic regression, I don't understand the format type of "libsvm". (Binomial logistic regression) The text loo...

apache-spark apache-spark-mllib libsvm apache-spark-ml

Burkle asked 7/7, 2017 at 7:39

1

Solved

Why does spark-ml ALS model returns NaN and negative numbers predictions?

Actually I'm trying to use ALS from spark-ml with implicit ratings. I noticed that some predictions given by my trained model are negative or NaN, why is it?

apache-spark pyspark apache-spark-mllib

Jerz asked 4/7, 2017 at 17:19

1

Solved

Spark ML Pipeline with RandomForest takes too long on 20MB dataset

I am using Spark ML to run some ML experiments, and on a small dataset of 20MB (Poker dataset) and a Random Forest with parameter grid, it takes 1h and 30 minutes to finish. Similarly with scikit-l...

apache-spark pyspark apache-spark-mllib apache-spark-ml

Rog asked 2/7, 2017 at 0:27

1

Solved

How to improve my recommendation result? I am using spark ALS implicit

First, I have some use history of user's app. For example: user1, app1, 3(launch times) user2, app2, 2(launch times) user3, app1, 1(launch times) I have basically two demands: Recommend some ...

apache-spark recommendation-engine apache-spark-mllib

Parliament asked 24/2, 2016 at 13:39

1

Solved

Matrix Math With Sparklyr

Looking to convert some R code to Sparklyr, functions such as lmtest::coeftest() and sandwich::sandwich(). Trying to get started with Sparklyr extensions but pretty new to the Spark API and having ...

r apache-spark apache-spark-mllib sparklyr

Sabelle asked 17/6, 2017 at 6:52

1

Solved

Join two Spark mllib pipelines together

I have two separate DataFrames which each have several differing processing stages which I use mllib transformers in a pipeline to handle. I now want to join these two pipelines together, keeping ...

python scala apache-spark apache-spark-mllib apache-spark-ml

Lind asked 15/6, 2017 at 14:27

1

Solved

Scala - How to split the probability column (column of vectors) that we obtain when we fit the GMM model to the data in to two separate columns? [duplicate]

scala apache-spark apache-spark-sql apache-spark-mllib

Opportuna asked 13/6, 2017 at 21:29

2

Solved

Spark custom estimator including persistence

I want to develop a custom estimator for spark which handles persistence of the great pipeline API as well. But as How to Roll a Custom Estimator in PySpark mllib put it there is not a lot of docum...

apache-spark apache-spark-sql pipeline apache-spark-mllib apache-spark-ml

Jaco asked 26/11, 2016 at 10:38

1

Solved

How does Spark's StreamingLinearRegressionWithSGD work?

I am working on StreamingLinearRegressionWithSGD which has two methods trainOn and predictOn. This class has a model object that is updated as training data arrives in the stream specified in train...

apache-spark linear-regression apache-spark-mllib

Roundworm asked 30/3, 2017 at 10:19

0

Non-linear SVM is not available in Apache Spark

Does avyone know the reason why the Non-Linear SVM has not been implemented in Apache Spark? I was reading this page: https://issues.apache.org/jira/browse/SPARK-4638 Look at the last comment. It ...

scala apache-spark svm apache-spark-mllib

Hophead asked 12/5, 2017 at 23:16

4

Solved

How to use mllib.recommendation if the user ids are string instead of contiguous integers?

I want to use Spark's mllib.recommendation library to build a prototype recommender system. However, the format of the user data I have is something of the following format: AB123XY45678 CD234WZ12...

apache-spark recommendation-engine apache-spark-mllib

Tyrus asked 5/1, 2015 at 2:46

2

How to extract a value from a Vector in a column of a Spark Dataframe [duplicate]

When using SparkML to predict labels the result Dataframe is: scala> result.show +-----------+--------------+ |probability|predictedLabel| +-----------+--------------+ | [0.0,1.0]| 0.0| |...

scala apache-spark dataframe apache-spark-sql apache-spark-mllib

Gargoyle asked 2/5, 2017 at 6:8

2

Solved

How to convert type Row into Vector to feed to the KMeans

when i try to feed df2 to kmeans i get the following error clusters = KMeans.train(df2, 10, maxIterations=30, runs=10, initializationMode="random") The error i get: Cannot convert type <cla...

apache-spark pyspark k-means apache-spark-mllib apache-spark-sql

Anthropomorphize asked 21/3, 2016 at 22:39

apache-spark-mllib Questions

Recommended topics

Hot tags