Instead of manually reproducing the logic for creating the implicit Encoder
object that gets passed to toDF
, one can use that directly (or, more precisely, implicitly in the same way as toDF
):
// spark: SparkSession
import spark.implicits._
implicitly[Encoder[MyCaseClass]].schema
Unfortunately, this actually suffers from the same problem as using org.apache.spark.sql.catalyst
or Encoders
as in the other answers: the Encoder
trait is experimental.
How does this work? The toDF
method on Seq
comes from a DatasetHolder
, which is created via the implicit localSeqToDatasetHolder
that is imported via spark.implicits._
. That function is defined like:
implicit def localSeqToDatasetHolder[T](s: Seq[T])(implicit arg0: Encoder[T]): DatasetHolder[T]
As you can see, it takes an implicit
Encoder[T]
argument, which, for a case class
, can be computed via newProductEncoder
(also imported via spark.implicits._
). We can reproduce this implicit logic to get an Encoder
for our case class, via the convenience scala.Predef.implicitly
(in scope by default, because it's from Predef
) that will just returns its requested implicit argument:
def implicitly[T](implicit e: T): T
o.a.s.sql.catalyst
yet. And had I been thinking straight I would have started withcreateDataFrame
just like you did.:-(
– Araceli