For a custom Estimator`s transformSchema method I need to be able to compare the schema of a input data frame to the schema defined in a case class. Usually this could be performed like Generate a Spark StructType / Schema from a case class as outlined below. However, the wrong nullability is used:
The real schema of the df inferred by spark.read.csv().as[MyClass]
might look like:
root
|-- CUSTOMER_ID: integer (nullable = false)
And the case class:
case class MySchema(CUSTOMER_ID: Int)
To compare I use:
val rawSchema = ScalaReflection.schemaFor[MySchema].dataType.asInstanceOf[StructType]
if (!rawSchema.equals(rawDf.schema))
Unfortunately this always yields false
, as the new schema manually inferred from the case class is setting nullable to true
(because ja java.Integer actually might be null)
root
|-- CUSTOMER_ID: integer (nullable = true)
How can I specify nullable = false
when creating the schema?
Option[Int]
instead ofInt
, thanks! – Taking