apache-spark-mllib Questions
1
Solved
How do I get the mapping out of a trained Spark MLlib StringIndexerModel?
val stringIndexer = new StringIndexer()
.setInputCol("myCol")
.setOutputCol("myColIdx")
val stringIndexerModel = stringI...
Infraction asked 23/4, 2017 at 19:9
3
I need some suggestions to build a good model to make recommendation by using Collaborative Filtering of spark. There is a sample code in the official website. I also past it following:
from pyspar...
Notional asked 12/4, 2016 at 13:46
1
Solved
What is difference between pyspark mllib and pyspark ml packages ? :
https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html
https://spark.apache.org/docs/latest/api/python/pyspark.ml....
Osy asked 5/4, 2017 at 19:59
3
Solved
I have encountered the "all-pairs similarity" problem in my recommendation system. Thanks to this databricks blog, it seems RowMatrix may come to help.
However, RowMatrix is a matrix type without ...
Tesstessa asked 25/4, 2015 at 2:55
2
Solved
Is there a way to train a LDA model in an online-learning fashion, ie. loading a previously train model, and update it with new documents ?
Hamil asked 8/3, 2017 at 18:11
1
Solved
I have trained a model in python using sklearn. How we can use same model to load in Spark and generate predictions on a spark RDD ?
Siobhansion asked 19/3, 2017 at 14:15
1
I am setting up a very simple logistic regression problem in scikit-learn and in spark.ml, and the results diverge: the models they learn are different, but I can't figure out why (data is the same...
Largehearted asked 10/3, 2017 at 23:28
1
I have a DataFrame with a column named a.b. When I specify a.b as the input column name to a StringIndexer, AnalysisException with the message "cannot resolve 'a.b' given input columns a.b". I'm us...
Cotswolds asked 22/1, 2016 at 18:22
2
Solved
I'm trying to build a very simple scala standalone app using the Mllib, but I get the following error when trying to bulid the program:
Object Mllib is not a member of package org.apache.spark
T...
Laundry asked 12/12, 2014 at 6:50
2
Solved
I have several categorical features and would like to transform them all using OneHotEncoder. However, when I tried to apply the StringIndexer, there I get an error:
stringIndexer = StringIndexer(...
Colly asked 4/3, 2016 at 19:42
2
Solved
I want to evaluate a random forest being trained on some data. Is there any utility in Apache Spark to do the same or do I have to perform cross validation manually?
Fluid asked 24/9, 2015 at 19:37
1
I am new to Spark 2.
I tried Spark tfidf example
sentenceData = spark.createDataFrame([
(0.0, "Hi I heard about Spark")
], ["label", "sentence"])
tokenizer = Tokenizer(inputCol="sentence", outp...
Opinicus asked 16/2, 2017 at 20:17
0
I'm trying to obtain ROC Curve for GBTClassifier.
One way is to reuse BinaryClassificationMetrics, however the path given in the documentation (https://spark.apache.org/docs/latest/mllib-evaluati...
Maxinemaxiskirt asked 16/2, 2017 at 15:7
2
Solved
I'm using RandomForest.featureImportances but I don't understand the output result.
I have 12 features, and this is the output I get.
I get that this might not be an apache-spark specific que...
Pleiad asked 17/6, 2016 at 9:54
1
I was trying to build Logistic regression model on a sample data.
The output from the model we can get are the weights of features used to build the model.
I could not find Spark API for standard...
Maui asked 14/6, 2016 at 15:49
2
Solved
Given my pyspark Row object:
>>> row
Row(clicked=0, features=SparseVector(7, {0: 1.0, 3: 1.0, 6: 0.752}))
>>> row.clicked
0
>>> row.features
SparseVector(7, {0: 1.0, 3: ...
Levitate asked 10/12, 2016 at 9:46
1
Solved
I'm predicting ratings in between processes that batch train the model. I'm using the approach outlined here: ALS model - how to generate full_u * v^t * v?
! rm -rf ml-1m.zip ml-1m
! wget --quiet ...
Endometrium asked 10/1, 2017 at 12:32
1
Solved
I'd like to make sure I'm training on a stratified sample of my data.
It seems this is supported by Spark 2.1 and earlier versions via JavaPairRDD.sampleByKey(...) and JavaPairRDD.sampleByKeyExact...
Frightfully asked 16/1, 2017 at 9:10
2
Solved
I am preparing a DataFrame with an id and a vector of my features to be used later for doing predictions. I do a groupBy on my dataframe, and in my groupBy I am merging couple of columns as lists i...
Martijn asked 14/9, 2016 at 15:41
1
I'm using MLlib's matrix factorization to recommend items to users. I have about a big implicit interaction matrix of M=20 million users and N=50k items. After training the model I want to get a sh...
Horacehoracio asked 23/8, 2016 at 15:5
2
Solved
KMeans has several parameters for its training, with initialization mode defaulted to kmeans||. The problem is that it marches quickly (less than 10min) to the first 13 stages, but then hangs compl...
Iva asked 1/9, 2016 at 0:5
2
Solved
I have a DenseVector RDD like this
>>> frequencyDenseVectors.collect()
[DenseVector([1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0]), DenseVector([1.0, 1.0, 1.0, 0.0, 1....
Andvari asked 26/12, 2016 at 9:5
1
Solved
I have a pyspark data frame whih has a column containing strings. I want to split this column into words
Code:
>>> sentenceData = sqlContext.read.load('file://sample1.csv', format='com.d...
Lewd asked 22/12, 2016 at 12:43
1
I am using Spark cluster 2.0 and I would like to convert a vector from org.apache.spark.mllib.linalg.VectorUDT to org.apache.spark.ml.linalg.VectorUDT.
# Import LinearRegression class
from pyspark...
Lacey asked 13/12, 2016 at 17:22
2
Solved
I need addition of two matrices that are stored in two files.
The content of latest1.txt and latest2.txt has the next str:
1 2 3
4 5 6
7 8 9
I am reading those files as follows:
scala> val...
Anton asked 30/1, 2015 at 9:29
© 2022 - 2024 — McMap. All rights reserved.