apache-spark-mllib - 4

1

Is there any pre-built Outlier Detection Algorithm/Interquartile Range identification methods available in Spark 2.0.0 ? I found some code here but i dont think this is available yet in spark2.0.0...

apache-spark machine-learning apache-spark-mllib outliers

Trader asked 8/10, 2016 at 7:13

1

Solved

pyspark : ml + streaming

According to Combining Spark Streaming + MLlib it is possible to make a prediction over a stream of input in spark. The issue with the given example (which works on my cluster) is that the testDat...

python apache-spark spark-streaming apache-spark-mllib

Petras asked 17/2, 2018 at 23:18

1

Solved

How to convert a mllib matrix to a spark dataframe?

I want to pretty print the result of a correlation in a zeppelin notebook: val Row(coeff: Matrix) = Correlation.corr(data, "features").head One of the ways to achieve this is to convert the resu...

scala apache-spark matrix apache-spark-mllib apache-zeppelin

Subjectify asked 25/2, 2018 at 18:50

3

Solved

MatchError while accessing vector column in Spark 2.0

I am trying to create a LDA model on a JSON file. Creating a spark context with the JSON file : import org.apache.spark.sql.SparkSession val sparkSession = SparkSession.builder .master("loc...

scala apache-spark apache-spark-sql apache-spark-mllib apache-spark-ml

Badtempered asked 7/8, 2016 at 21:48

1

Solved

Vector assembler in Pyspark is creating tuple of multiple vectors instead of a single vector, how to solve the issue? [duplicate]

My python version is 3.6.3 and spark version is 2.2.1. Here is my code: from pyspark.ml.linalg import Vectors from pyspark.ml.feature import VectorAssembler from pyspark import SparkContext,...

python apache-spark pyspark apache-spark-mllib

Heartstrings asked 6/2, 2018 at 9:1

3

How to encode categorical features in Apache Spark

I have a set of data based on which I want to create a classification model. Each row has the following form: user1,class1,product1 user1,class1,product2 user1,class1,product5 user2,class1,product...

scala apache-spark apache-spark-mllib apache-spark-1.2

Cinerarium asked 7/8, 2015 at 7:53

2

Solved

How to score all user-product combinations in Spark MatrixFactorizationModel?

Given a MatrixFactorizationModel what would be the most efficient way to return the full matrix of user-product predictions (in practice, filtered by some threshold to maintain sparsity)? Via the ...

apache-spark apache-spark-mllib matrix-factorization

Sharyl asked 12/10, 2014 at 15:21

4

How to transform a categorical variable in Spark into a set of columns coded as {0,1}?

I'm trying to perform a logistic regression (LogisticRegressionWithLBFGS) with Spark MLlib (with Scala) on a dataset which contains categorical variables. I discover Spark was not able to work with...

scala apache-spark bigdata apache-spark-mllib categorical-data

Sicard asked 7/5, 2015 at 14:56

2

Solved

KMeans clustering in PySpark

I have a spark dataframe 'mydataframe' with many columns. I am trying to run kmeans on only two columns: lat and long (latitude & longitude) using them as simple values). I want to extract 7 cl...

machine-learning pyspark k-means apache-spark-mllib apache-spark-ml

Bonaparte asked 1/12, 2017 at 2:22

1

Solved

Running KMeans clustering in PySpark

it's my very first time trying to run KMeans cluster analysis in Spark, so, I am sorry for a stupid question. I have a spark dataframe mydataframe with many columns. I want to run kmeans on only t...

pyspark k-means apache-spark-mllib

Makalu asked 1/12, 2017 at 1:14

2

Solved

(Spark) object {name} is not a member of package org.apache.spark.ml

I'm trying to run self-contained application using scala on apache spark based on example here: http://spark.apache.org/docs/latest/ml-pipeline.html Here's my complete code: import org.apache.spa...

scala apache-spark sbt apache-spark-mllib

Fetiparous asked 27/10, 2016 at 10:7

2

Solved

SPARK, ML, Tuning, CrossValidator: access the metrics

In order to build a NaiveBayes multiclass classifier, I am using a CrossValidator to select the best parameters in my pipeline: val cv = new CrossValidator() .setEstimator(pipeline) .setEstimato...

apache-spark apache-spark-mllib apache-spark-ml

Underproof asked 8/1, 2016 at 13:59

3

Solved

How do I convert an RDD with a SparseVector Column to a DataFrame with a column as Vector

I have an RDD with a tuple of values (String, SparseVector) and I want to create a DataFrame using the RDD. To get a (label:string, features:vector) DataFrame which is the Schema required by most o...

apache-spark pyspark apache-spark-sql apache-spark-mllib apache-spark-ml

Pagas asked 23/9, 2015 at 16:47

1

Solved

Calculate Cosine Similarity Spark Dataframe

I am using Spark Scala to calculate cosine similarity between the Dataframe rows. Dataframe format is below root |-- SKU: double (nullable = true) |-- Features: vector (nullable = true) Samp...

scala apache-spark apache-spark-sql apache-spark-mllib

Illfated asked 30/10, 2017 at 7:38

2

Solved

Optimal way to create a ml pipeline in Apache Spark for dataset with high number of columns

I am working with Spark 2.1.1 on a dataset with ~2000 features and trying to create a basic ML Pipeline, consisting of some Transformers and a Classifier. Let's assume for the sake of simplicity th...

scala apache-spark apache-spark-mllib

Lilithe asked 11/5, 2017 at 9:35

3

Solved

Creating Spark dataframe from numpy matrix

it is my first time with PySpark, (Spark 2), and I'm trying to create a toy dataframe for a Logit model. I ran successfully the tutorial and would like to pass my own data into it. I've tried thi...

numpy apache-spark pyspark apache-spark-sql apache-spark-mllib

Toothache asked 12/7, 2017 at 16:55

1

Solved

How handle categorical features in the latest Random Forest in Spark?

In the Mllib version of Random Forest there was a possibility to specify the columns with nominal features (numerical but still categorical variables) with parameter categoricalFeaturesInfo What's...

apache-spark apache-spark-mllib random-forest apache-spark-ml feature-engineering

Tm asked 15/10, 2017 at 20:42

4

What is the right way to save\load models in Spark\PySpark

I'm working with Spark 1.3.0 using PySpark and MLlib and I need to save and load my models. I use code like this (taken from the official documentation ) from pyspark.mllib.recommendation import A...

python apache-spark pyspark apache-spark-mllib

Finegan asked 25/3, 2015 at 12:3

1

Solved

How to extract vocabulary from Pipeline

I can extract vocabulary from CountVecotizerModel by the following way fl = StopWordsRemover(inputCol="words", outputCol="filtered") df = fl.transform(df) cv = CountVectorizer(inputCol="filtered",...

python apache-spark pyspark apache-spark-mllib

Pug asked 12/10, 2017 at 17:27

1

Solved

Spark LinearRegressionSummary "normal" summary

According to LinearRegressionSummary (Spark 2.1.0 JavaDoc), p-values are only available for the "normal" solver. This value is only available when using the "normal" solver. What the hell is t...

apache-spark-mllib

Ambrosio asked 11/10, 2017 at 19:49

1

Solved

How to resolve a maven dependency with a name that is not compliant with the java 9 module system? [duplicate]

I am trying to build a demo project in java 9 with maven that uses the dependency: <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-mllib_2.10<...

java maven apache-spark-mllib java-9 java-module

Telescopic asked 11/10, 2017 at 8:36

2

Relation between Word2Vec vector size and total number of words scanned?

What is the optimum number of vector size to be set in word2vec algorithm if the total number of unique words is greater than 1 billion? I am using Apache Spark Mllib 1.6.0 for word2vec. Sample ...

machine-learning apache-spark-mllib word2vec

Eyeopener asked 4/10, 2017 at 8:58

4

Error ExecutorLostFailure when running a task in Spark

when I am trying to run it on this folder it is throwing me ExecutorLostFailure everytime Hi I am a beginner in Spark. I am trying to run a job on Spark 1.4.1 with 8 slave nodes with 11.7 GB memory...

apache-spark pyspark apache-spark-mllib collect

Godred asked 21/7, 2015 at 2:51

2

Solved

How can I evaluate the implicit feedback ALS algorithm for recommendations in Apache Spark?

How can you evaluate the implicit feedback collaborative filtering algorithm of Apache Spark, given that the implicit "ratings" can vary from zero to anything, so a simple MSE or RMSE does not have...

apache-spark apache-spark-mllib

Clinometer asked 28/9, 2017 at 6:36

1

SparkError: Total size of serialized results of XXXX tasks (2.0 GB) is bigger than spark.driver.maxResultSize (2.0 GB)

Error: ERROR TaskSetManager: Total size of serialized results of XXXX tasks (2.0 GB) is bigger than spark.driver.maxResultSize (2.0 GB) Goal: Obtain recommendation for all the users using the mo...

scala apache-spark apache-spark-mllib

Snappish asked 2/12, 2015 at 5:25

apache-spark-mllib Questions

Recommended topics

Hot tags