apache-spark-mllib - 6

1

Solved

Retrieve Spark Mllib StringIndexer column mapping

How do I get the mapping out of a trained Spark MLlib StringIndexerModel? val stringIndexer = new StringIndexer() .setInputCol("myCol") .setOutputCol("myColIdx") val stringIndexerModel = stringI...

scala apache-spark apache-spark-mllib apache-spark-ml

Infraction asked 23/4, 2017 at 19:9

3

how to make RMSE(root mean square error) small when use ALS of spark?

I need some suggestions to build a good model to make recommendation by using Collaborative Filtering of spark. There is a sample code in the official website. I also past it following: from pyspar...

apache-spark pyspark apache-spark-mllib collaborative-filtering

Notional asked 12/4, 2016 at 13:46

1

Solved

`pyspark mllib` versus `pyspark ml` packages

What is difference between pyspark mllib and pyspark ml packages ? : https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html https://spark.apache.org/docs/latest/api/python/pyspark.ml....

python python-3.x apache-spark pyspark apache-spark-mllib

Osy asked 5/4, 2017 at 19:59

3

Solved

spark-How can I retrieve item-pair after calculating similarity using RowMatrix

I have encountered the "all-pairs similarity" problem in my recommendation system. Thanks to this databricks blog, it seems RowMatrix may come to help. However, RowMatrix is a matrix type without ...

apache-spark apache-spark-mllib

Tesstessa asked 25/4, 2015 at 2:55

2

Solved

Online learning of LDA model in Spark

Is there a way to train a LDA model in an online-learning fashion, ie. loading a previously train model, and update it with new documents ?

apache-spark machine-learning apache-spark-mllib lda apache-spark-ml

Hamil asked 8/3, 2017 at 18:11

1

Solved

How to do prediction with Sklearn Model inside Spark?

I have trained a model in python using sklearn. How we can use same model to load in Spark and generate predictions on a spark RDD ?

python apache-spark scikit-learn pyspark apache-spark-mllib

Siobhansion asked 19/3, 2017 at 14:15

1

Spark.ml regressions do not calculate same models as scikit-learn

I am setting up a very simple logistic regression problem in scikit-learn and in spark.ml, and the results diverge: the models they learn are different, but I can't figure out why (data is the same...

apache-spark scikit-learn apache-spark-mllib

Largehearted asked 10/3, 2017 at 23:28

1

Spark ML indexer cannot resolve DataFrame column name with dots?

I have a DataFrame with a column named a.b. When I specify a.b as the input column name to a StringIndexer, AnalysisException with the message "cannot resolve 'a.b' given input columns a.b". I'm us...

java apache-spark apache-spark-mllib apache-spark-ml

Cotswolds asked 22/1, 2016 at 18:22

2

Solved

Mllib dependency error

I'm trying to build a very simple scala standalone app using the Mllib, but I get the following error when trying to bulid the program: Object Mllib is not a member of package org.apache.spark T...

scala apache-spark apache-spark-mllib

Laundry asked 12/12, 2014 at 6:50

2

Solved

apply OneHotEncoder for several categorical columns in SparkMlib

I have several categorical features and would like to transform them all using OneHotEncoder. However, when I tried to apply the StringIndexer, there I get an error: stringIndexer = StringIndexer(...

python apache-spark pyspark apache-spark-mllib apache-spark-ml

Colly asked 4/3, 2016 at 19:42

2

Solved

How to cross validate RandomForest model?

I want to evaluate a random forest being trained on some data. Is there any utility in Apache Spark to do the same or do I have to perform cross validation manually?

apache-spark random-forest cross-validation apache-spark-ml apache-spark-mllib

Fluid asked 24/9, 2015 at 19:37

1

How Spark HashingTF works

I am new to Spark 2. I tried Spark tfidf example sentenceData = spark.createDataFrame([ (0.0, "Hi I heard about Spark") ], ["label", "sentence"]) tokenizer = Tokenizer(inputCol="sentence", outp...

apache-spark pyspark apache-spark-mllib tf-idf apache-spark-ml

Opinicus asked 16/2, 2017 at 20:17

0

Common way to plot a ROC Curve

I'm trying to obtain ROC Curve for GBTClassifier. One way is to reuse BinaryClassificationMetrics, however the path given in the documentation (https://spark.apache.org/docs/latest/mllib-evaluati...

apache-spark machine-learning apache-spark-mllib roc

Maxinemaxiskirt asked 16/2, 2017 at 15:7

2

Solved

Understanding Spark RandomForest featureImportances results

I'm using RandomForest.featureImportances but I don't understand the output result. I have 12 features, and this is the output I get. I get that this might not be an apache-spark specific que...

apache-spark classification random-forest apache-spark-mllib

Pleiad asked 17/6, 2016 at 9:54

1

Calculating standard error of estimate, Wald-Chi Square statistic, p-value with logistic regression in Spark

I was trying to build Logistic regression model on a sample data. The output from the model we can get are the weights of features used to build the model. I could not find Spark API for standard...

pyspark logistic-regression apache-spark-mllib standard-error

Maui asked 14/6, 2016 at 15:49

2

Solved

Cannot convert type <class 'pyspark.ml.linalg.SparseVector'> into Vector

Given my pyspark Row object: >>> row Row(clicked=0, features=SparseVector(7, {0: 1.0, 3: 1.0, 6: 0.752})) >>> row.clicked 0 >>> row.features SparseVector(7, {0: 1.0, 3: ...

apache-spark pyspark apache-spark-sql apache-spark-mllib apache-spark-ml

Levitate asked 10/12, 2016 at 9:46

1

Solved

ALS model - predicted full_u * v^t * v ratings are very high

I'm predicting ratings in between processes that batch train the model. I'm using the approach outlined here: ALS model - how to generate full_u * v^t * v? ! rm -rf ml-1m.zip ml-1m ! wget --quiet ...

apache-spark apache-spark-mllib apache-spark-ml

Endometrium asked 10/1, 2017 at 12:32

1

Solved

Stratified sampling with Spark and Java

I'd like to make sure I'm training on a stratified sample of my data. It seems this is supported by Spark 2.1 and earlier versions via JavaPairRDD.sampleByKey(...) and JavaPairRDD.sampleByKeyExact...

java apache-spark machine-learning apache-spark-mllib

Frightfully asked 16/1, 2017 at 9:10

2

Solved

Spark DataFrames when udf functions do not accept large enough input variables

I am preparing a DataFrame with an id and a vector of my features to be used later for doing predictions. I do a groupBy on my dataframe, and in my groupBy I am merging couple of columns as lists i...

scala apache-spark dataframe apache-spark-sql apache-spark-mllib

Martijn asked 14/9, 2016 at 15:41

1

Speed up collaborative filtering for large dataset in Spark MLLib

I'm using MLlib's matrix factorization to recommend items to users. I have about a big implicit interaction matrix of M=20 million users and N=50k items. After training the model I want to get a sh...

scala apache-spark apache-spark-mllib collaborative-filtering

Horacehoracio asked 23/8, 2016 at 15:5

2

Solved

Is Spark's KMeans unable to handle bigdata?

KMeans has several parameters for its training, with initialization mode defaulted to kmeans||. The problem is that it marches quickly (less than 10min) to the first 13 stages, but then hangs compl...

python apache-spark k-means apache-spark-mllib bigdata

Iva asked 1/9, 2016 at 0:5

2

Solved

How to convert RDD of dense vector into DataFrame in pyspark?

I have a DenseVector RDD like this >>> frequencyDenseVectors.collect() [DenseVector([1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0]), DenseVector([1.0, 1.0, 1.0, 0.0, 1....

apache-spark pyspark apache-spark-mllib apache-spark-ml apache-spark-2.0

Andvari asked 26/12, 2016 at 9:5

1

Solved

Split Contents of String column in PySpark Dataframe

I have a pyspark data frame whih has a column containing strings. I want to split this column into words Code: >>> sentenceData = sqlContext.read.load('file://sample1.csv', format='com.d...

apache-spark pyspark apache-spark-sql apache-spark-mllib

Lewd asked 22/12, 2016 at 12:43

1

How to convert from org.apache.spark.mllib.linalg.VectorUDT to ml.linalg.VectorUDT

I am using Spark cluster 2.0 and I would like to convert a vector from org.apache.spark.mllib.linalg.VectorUDT to org.apache.spark.ml.linalg.VectorUDT. # Import LinearRegression class from pyspark...

apache-spark machine-learning pyspark apache-spark-mllib apache-spark-ml

Lacey asked 13/12, 2016 at 17:22

2

Solved

Addition of two RDD[mllib.linalg.Vector]'s

I need addition of two matrices that are stored in two files. The content of latest1.txt and latest2.txt has the next str: 1 2 3 4 5 6 7 8 9 I am reading those files as follows: scala> val...

scala apache-spark apache-spark-mllib

Anton asked 30/1, 2015 at 9:29

apache-spark-mllib Questions

Recommended topics

Hot tags