apache-spark-mllib Questions
3
Solved
I'm working on a particular binary classification problem with a highly unbalanced dataset, and I was wondering if anyone has tried to implement specific techniques for dealing with unbalanced data...
Butyraceous asked 27/10, 2015 at 16:4
1
As per official documentation of Spark NaiveBayes:
It supports Multinomial NB (see here) which can handle finitely
supported discrete data.
How can I handle continuous data (for example: perc...
Kibe asked 11/8, 2017 at 4:0
1
Solved
I'm working on a spark mllib algorithm. The dataset I have is in this form
Company":"XXXX","CurrentTitle":"XYZ","Edu_Title":"ABC","Exp_mnth":.(there are more values similar to these)
Im trying t...
Hanni asked 14/7, 2016 at 14:36
2
Solved
Do you guys know where can I find examples of multiclass classification in Spark. I spent a lot of time searching in books and in the web, and so far I just know that it is possible since the lates...
Clapper asked 15/8, 2015 at 21:2
3
Solved
Can somebody please help me out with below error? I am trying to convert dataframe to rdd so that it can be used for regression model building.
SPARK VERSION : 2.0.0
Error =>
ClassCastException...
Till asked 18/10, 2016 at 13:35
1
Solved
I have run binary logistic regression using spark mllib. As per documentation of spark mllib, RawPrediction are confidence values, which i assume probability for lcl and ucl. I am getting -ve value...
Benedetto asked 30/4, 2017 at 18:32
2
I tried standard spark HashingTF example on DataBricks.
import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}
val sentenceData = spark.createDataFrame(Seq(
(0, "Hi I heard about Spark")...
Morbilli asked 14/12, 2016 at 22:13
1
Solved
I have a DataFrame that is processed with LinearRegression. If I do it directly, like below, I can display the details of the model:
val lr = new LinearRegression()
val lrModel = lr.fit(df)
lrMod...
Zachery asked 20/7, 2017 at 21:10
2
Solved
I am using Spark 1.5.1 and,
In pyspark, after I fit the model using:
model = LogisticRegressionWithLBFGS.train(parsedData)
I can print the prediction using:
model.predict(p.features)
Is the...
Rockery asked 6/11, 2015 at 6:33
4
Solved
I'm trying to extract the class probabilities of a random forest object I have trained using PySpark. However, I do not see an example of it anywhere in the documentation, nor is it a a method of R...
Trautman asked 2/3, 2015 at 20:15
1
Is there any relation between numFeatures in HashingTF in Spark MLlib and the actual number of terms in a document(sentence)?
List<Row> data = Arrays.asList(
RowFactory.create(0.0, "Hi I he...
Propylite asked 7/7, 2017 at 8:47
2
I use ML pipeline with various custom UDF-based transformers. What I'm looking for is a way to serialize/deserialize this pipeline.
I serialize the PipelineModel using
ObjectOutputStream.write(...
Clastic asked 27/10, 2016 at 12:8
1
Solved
I am new for learning Spark MLlib. When I was reading about the example of Binomial logistic regression, I don't understand the format type of "libsvm". (Binomial logistic regression)
The text loo...
Burkle asked 7/7, 2017 at 7:39
1
Solved
Actually I'm trying to use ALS from spark-ml with implicit ratings.
I noticed that some predictions given by my trained model are negative or NaN, why is it?
Jerz asked 4/7, 2017 at 17:19
1
Solved
I am using Spark ML to run some ML experiments, and on a small dataset of 20MB (Poker dataset) and a Random Forest with parameter grid, it takes 1h and 30 minutes to finish. Similarly with scikit-l...
Rog asked 2/7, 2017 at 0:27
1
Solved
First, I have some use history of user's app.
For example:
user1, app1, 3(launch times)
user2, app2, 2(launch times)
user3, app1, 1(launch times)
I have basically two demands:
Recommend some ...
Parliament asked 24/2, 2016 at 13:39
1
Solved
Looking to convert some R code to Sparklyr, functions such as lmtest::coeftest() and sandwich::sandwich(). Trying to get started with Sparklyr extensions but pretty new to the Spark API and having ...
Sabelle asked 17/6, 2017 at 6:52
1
Solved
I have two separate DataFrames which each have several differing processing stages which I use mllib transformers in a pipeline to handle.
I now want to join these two pipelines together, keeping ...
Lind asked 15/6, 2017 at 14:27
1
Solved
I am trying to do the following:
+-----+-------------------------+----------+-------------------------------------------+
|label|features |prediction|probability |
+-----+-------------------...
Opportuna asked 13/6, 2017 at 21:29
2
Solved
I want to develop a custom estimator for spark which handles persistence of the great pipeline API as well. But as How to Roll a Custom Estimator in PySpark mllib put it there is not a lot of docum...
Jaco asked 26/11, 2016 at 10:38
1
Solved
I am working on StreamingLinearRegressionWithSGD which has two methods trainOn and predictOn. This class has a model object that is updated as training data arrives in the stream specified in train...
Roundworm asked 30/3, 2017 at 10:19
0
Does avyone know the reason why the Non-Linear SVM has not been implemented in Apache Spark?
I was reading this page:
https://issues.apache.org/jira/browse/SPARK-4638
Look at the last comment. It ...
Hophead asked 12/5, 2017 at 23:16
4
Solved
I want to use Spark's mllib.recommendation library to build a prototype recommender system. However, the format of the user data I have is something of the following format:
AB123XY45678
CD234WZ12...
Tyrus asked 5/1, 2015 at 2:46
2
When using SparkML to predict labels the result Dataframe is:
scala> result.show
+-----------+--------------+
|probability|predictedLabel|
+-----------+--------------+
| [0.0,1.0]| 0.0|
|...
Gargoyle asked 2/5, 2017 at 6:8
2
Solved
when i try to feed df2 to kmeans i get the following error
clusters = KMeans.train(df2, 10, maxIterations=30,
runs=10, initializationMode="random")
The error i get:
Cannot convert type <cla...
Anthropomorphize asked 21/3, 2016 at 22:39
© 2022 - 2024 — McMap. All rights reserved.