ClassCastException: org.apache.spark.ml.linalg.DenseVector cannot be cast to org.apache.spark.mllib.linalg.Vector
Asked Answered
T

3

8

Can somebody please help me out with below error? I am trying to convert dataframe to rdd so that it can be used for regression model building.

SPARK VERSION : 2.0.0

Error => ClassCastException: org.apache.spark.ml.linalg.DenseVector cannot be cast to org.apache.spark.mllib.linalg.Vector

Code =>

import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.rdd.RDD
import org.apache.spark.sql._
import org.apache.spark.sql.Row

val binarizer2: Binarizer = new Binarizer()
    .setInputCol("repay_amt").setOutputCol("label").setThreshold(20.00)

df = binarizer2.transform(df)

val assembler = new VectorAssembler()
.setInputCols(Array("tot_txns", "avg_unpaiddue", "max_unpaiddue", "sale_txn", "max_amt", "tot_sale_amt")).setOutputCol("features")

df = assembler.transform(df)

df.write.mode(SaveMode.Overwrite).parquet("lazpay_final_data.parquet")

val df2 = spark.read.parquet("lazpay_final_data.parquet/")
val df3= df2.rdd.map(r => LabeledPoint(r.getDouble(0),r.getAs("features")))

Data =>

enter image description here

Till answered 18/10, 2016 at 13:35 Comment(2)
Possible duplicate of MatchError while accessing vector column in Spark 2.0Lathan
I am using spark 2.0.0Till
M
7

I solved this issue by first converting the ml SparseVector to Dense Vector then to mllib Vector.

Eg:

val denseVector = r.getAs[org.apache.spark.ml.linalg.SparseVector]("features").toDense
  org.apache.spark.mllib.linalg.Vectors.fromML(denseVector)
Matri answered 27/7, 2017 at 11:17 Comment(2)
This looks like the cleanest approachCutout
Hi, could you make it a more concrete example? it seems this code exaple r is a row for logic rdd map, so how can it be a variable , I am confused here. Thanks!Encircle
W
4

I ran in to the same issue and created a function to manually cast the values:

public static Function<Row, org.apache.spark.mllib.linalg.Vector> rowToVector = new Function<Row, org.apache.spark.mllib.linalg.Vector>() {
    public org.apache.spark.mllib.linalg.Vector call(Row row) throws Exception {
        Object features = row.getAs(0);
        org.apache.spark.ml.linalg.DenseVector dense = null;

        if (features instanceof org.apache.spark.ml.linalg.DenseVector){
            dense = (org.apache.spark.ml.linalg.DenseVector)features;
        }
        else if(features instanceof org.apache.spark.ml.linalg.SparseVector){
            org.apache.spark.ml.linalg.SparseVector sparse = (org.apache.spark.ml.linalg.SparseVector)features;
            dense = sparse.toDense();
        }else{
            RuntimeException e = new RuntimeException("Cannot convert to "+ features.getClass().getCanonicalName());
            LOGGER.error(e.getMessage());
            throw e;
        }
        org.apache.spark.mllib.linalg.Vector vec = org.apache.spark.mllib.linalg.Vectors.dense(dense.toArray());
        return vec;
    }

};
Whitlow answered 13/1, 2017 at 16:13 Comment(0)
T
4

Since your using Spark 2.0 or higher, Instead of import org.apache.spark.mllib.linalg.Vectors use import org.apache.spark.ml.linalg.Vectors

Topdrawer answered 12/7, 2017 at 6:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.