apache-spark-mllib Questions

3

Solved

I'm working on a particular binary classification problem with a highly unbalanced dataset, and I was wondering if anyone has tried to implement specific techniques for dealing with unbalanced data...
Butyraceous asked 27/10, 2015 at 16:4

1

As per official documentation of Spark NaiveBayes: It supports Multinomial NB (see here) which can handle finitely supported discrete data. How can I handle continuous data (for example: perc...
Kibe asked 11/8, 2017 at 4:0

1

Solved

I'm working on a spark mllib algorithm. The dataset I have is in this form Company":"XXXX","CurrentTitle":"XYZ","Edu_Title":"ABC","Exp_mnth":.(there are more values similar to these) Im trying t...
Hanni asked 14/7, 2016 at 14:36

2

Solved

Do you guys know where can I find examples of multiclass classification in Spark. I spent a lot of time searching in books and in the web, and so far I just know that it is possible since the lates...

3

Solved

Can somebody please help me out with below error? I am trying to convert dataframe to rdd so that it can be used for regression model building. SPARK VERSION : 2.0.0 Error => ClassCastException...
Till asked 18/10, 2016 at 13:35

1

Solved

I have run binary logistic regression using spark mllib. As per documentation of spark mllib, RawPrediction are confidence values, which i assume probability for lcl and ucl. I am getting -ve value...
Benedetto asked 30/4, 2017 at 18:32

2

I tried standard spark HashingTF example on DataBricks. import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer} val sentenceData = spark.createDataFrame(Seq( (0, "Hi I heard about Spark")...
Morbilli asked 14/12, 2016 at 22:13

1

Solved

I have a DataFrame that is processed with LinearRegression. If I do it directly, like below, I can display the details of the model: val lr = new LinearRegression() val lrModel = lr.fit(df) lrMod...
Zachery asked 20/7, 2017 at 21:10

2

Solved

I am using Spark 1.5.1 and, In pyspark, after I fit the model using: model = LogisticRegressionWithLBFGS.train(parsedData) I can print the prediction using: model.predict(p.features) Is the...

4

Solved

I'm trying to extract the class probabilities of a random forest object I have trained using PySpark. However, I do not see an example of it anywhere in the documentation, nor is it a a method of R...
Trautman asked 2/3, 2015 at 20:15

1

Is there any relation between numFeatures in HashingTF in Spark MLlib and the actual number of terms in a document(sentence)? List<Row> data = Arrays.asList( RowFactory.create(0.0, "Hi I he...
Propylite asked 7/7, 2017 at 8:47

2

I use ML pipeline with various custom UDF-based transformers. What I'm looking for is a way to serialize/deserialize this pipeline. I serialize the PipelineModel using ObjectOutputStream.write(...

1

Solved

I am new for learning Spark MLlib. When I was reading about the example of Binomial logistic regression, I don't understand the format type of "libsvm". (Binomial logistic regression) The text loo...

1

Solved

Actually I'm trying to use ALS from spark-ml with implicit ratings. I noticed that some predictions given by my trained model are negative or NaN, why is it?
Jerz asked 4/7, 2017 at 17:19

1

Solved

I am using Spark ML to run some ML experiments, and on a small dataset of 20MB (Poker dataset) and a Random Forest with parameter grid, it takes 1h and 30 minutes to finish. Similarly with scikit-l...

1

Solved

First, I have some use history of user's app. For example: user1, app1, 3(launch times) user2, app2, 2(launch times) user3, app1, 1(launch times) I have basically two demands: Recommend some ...
Parliament asked 24/2, 2016 at 13:39

1

Solved

Looking to convert some R code to Sparklyr, functions such as lmtest::coeftest() and sandwich::sandwich(). Trying to get started with Sparklyr extensions but pretty new to the Spark API and having ...
Sabelle asked 17/6, 2017 at 6:52

1

Solved

I have two separate DataFrames which each have several differing processing stages which I use mllib transformers in a pipeline to handle. I now want to join these two pipelines together, keeping ...

1

Solved

I am trying to do the following: +-----+-------------------------+----------+-------------------------------------------+ |label|features |prediction|probability | +-----+-------------------...
Opportuna asked 13/6, 2017 at 21:29

2

Solved

I want to develop a custom estimator for spark which handles persistence of the great pipeline API as well. But as How to Roll a Custom Estimator in PySpark mllib put it there is not a lot of docum...

1

Solved

I am working on StreamingLinearRegressionWithSGD which has two methods trainOn and predictOn. This class has a model object that is updated as training data arrives in the stream specified in train...
Roundworm asked 30/3, 2017 at 10:19

0

Does avyone know the reason why the Non-Linear SVM has not been implemented in Apache Spark? I was reading this page: https://issues.apache.org/jira/browse/SPARK-4638 Look at the last comment. It ...
Hophead asked 12/5, 2017 at 23:16

4

Solved

I want to use Spark's mllib.recommendation library to build a prototype recommender system. However, the format of the user data I have is something of the following format: AB123XY45678 CD234WZ12...

2

When using SparkML to predict labels the result Dataframe is: scala> result.show +-----------+--------------+ |probability|predictedLabel| +-----------+--------------+ | [0.0,1.0]| 0.0| |...

2

Solved

when i try to feed df2 to kmeans i get the following error clusters = KMeans.train(df2, 10, maxIterations=30, runs=10, initializationMode="random") The error i get: Cannot convert type <cla...
Anthropomorphize asked 21/3, 2016 at 22:39

© 2022 - 2024 — McMap. All rights reserved.