value toDS is not a member of org.apache.spark.rdd.RDD
Asked Answered
S

3

8

I am trying to write sample Apache Spark program that converts RDD to Dataset. But in that process, I am getting compile time error.

Here is my sample code and error:

code:

import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import org.apache.spark.sql.Dataset

object Hello {

  case class Person(name: String, age: Int)

  def main(args: Array[String]){
    val conf = new SparkConf()
      .setAppName("first example")
      .setMaster("local")
    val sc = new SparkContext(conf)
    val peopleRDD: RDD[Person] = sc.parallelize(Seq(Person("John", 27)))
    val people = peopleRDD.toDS
  }
}

and my error is:

value toDS is not a member of org.apache.spark.rdd.RDD[Person]

I have added Spark core and spark SQL jars.

and my versions are:

Spark 1.6.2

scala 2.10

Spermaceti answered 16/6, 2017 at 16:38 Comment(0)
T
11

Spark version < 2.x

toDS is available with sqlContext.implicits._

val sqlContext = new SQLContext(sc);
import sqlContext.implicits._
val people = peopleRDD.toDS()

Spark version >= 2.x

val spark: SparkSession = SparkSession.builder
  .config(conf)
  .getOrCreate;

import spark.implicits._
val people = peopleRDD.toDS()

HIH

Tuff answered 16/6, 2017 at 16:47 Comment(0)
A
11

There are two mistakes I can see in your code.

First you have to import sqlContext.implicits._ as toDS and toDF are defined in implicits of sqlContext.

Second is that case class should be defined outside class scope where the case class is being used otherwise task not serializable exception will occur

Complete solution is as following

    import org.apache.spark.SparkConf
    import org.apache.spark.rdd.RDD
    import org.apache.spark.SparkContext
    import org.apache.spark.sql.Dataset

    object Hello {
      def main(args: Array[String]){
      val conf = new SparkConf()
      .setAppName("first example")
      .setMaster("local")
      val sc = new SparkContext(conf)
      val sqlContext = new SQLContext(sc)

      import sqlContext.implicits._
      val peopleRDD: RDD[Person] = sc.parallelize(Seq(Person("John", 27)))
      val people = peopleRDD.toDS
      people.show(false)
      }
    }
    case class Person(name: String, age: Int)
Allegra answered 16/6, 2017 at 16:58 Comment(2)
Defining case class inside the scope may be a really annoying issue in this case. Thanks for the tip!Bookrest
import spark.implicits._ failed for me until I moved the case class outside of the object. Thanks!Soniferous
B
0

The exact answer is you importing both,

import spark.implicits._ 

import sqlContext.implicits._ 

this is causing the issue, remove any 1 of those, you wont face issue like this

Brandi answered 23/8, 2021 at 15:48 Comment(1)
Hi, Jodu. Please wrap code in code blocks for clarity (see documentation).Cumbrance

© 2022 - 2024 — McMap. All rights reserved.