Encoders.product[of a scala trait ].schema in spark
Asked Answered
L

3

3

How to create a schema for spark from a trait? Considering a trait:

trait A{
val name:String
val size:String
}

As :

Encoders.product[A].schema

gives:

Error:type arguments do not conform to method product's type parameter bounds [T <: Product]

Also the number of fields will be more then the limit of case class parameters > 200

Leahy answered 9/10, 2020 at 9:4 Comment(2)
Make A extend Product.Paragon
Also you can nest case classes into your case class if number of fields is a problem.Paragon
T
1

Case class do supports more than 22 columns, try creating outside all other class/object. If your need is to create a dataframe schema with large number of fields, this should work.

val schema: StructType = StructType(
    Array(
      StructField(name = "name", StringType),
      StructField(name = "size", StringType)
    )
 )
val data = Seq(Row("Ramanan","29"))
spark.createDataFrame(spark.sparkContext.parallelize(data),schema).show()
Trembles answered 9/10, 2020 at 13:28 Comment(0)
K
0

I cannot give you all the details why this is not working but I am proposing a slightly alternative solution that we frequently use in our Scala Spark projects.

The signature of Encoders.product looks like

product[T <: scala.Product](implicit evidence$5 : scala.reflect.runtime.universe.TypeTag[T])

which means tt expects a class that extends Product trait and an implicit TypeTag.

Instead of a trait, you could create a case class as case classes are extending Product(and Serializable) automatically.

In order to get a schema you could do:

case class A (
  val name: String,
  val size: String
)

def createSchema[T <: Product]()(implicit tag: scala.reflect.runtime.universe.TypeTag[T]) = Encoders.product[T].schema
val schema = createSchema[A]()
schema.printTreeString()

/*
root
 |-- name: string (nullable = true)
 |-- size: string (nullable = true)
*/

As said in the beginning, I can't explain all the details, just provide a working solution and hoping it fit your needs.

Kenwrick answered 9/10, 2020 at 10:11 Comment(3)
Thanks @mike, but as case class has some limit in the number of parameters, is there a way to get the schema for large number of fieldsLeahy
We are using case classes with around 200 fields. Not sure if that is the optimal solution, but the limit of 22 fields is only in Scala 2.10.x and not with 2.11 anymore. We just had to adjust the JVM parameter when compiling with Maven: jvmArg>-Xss2048K</jvmArg>Kenwrick
yes @mike, but actually the number of fields were more then 200Leahy
E
0

Very simple, just make sure your trait inherits Product and Serializable, same as case class is by default

abstract class MoneV2Base (val x: String) extends Product with Serializable
Eohippus answered 6/2, 2023 at 12:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.