Spark 2.0 with Hive
Let's say I am trying to write a spark dataframe, irisDf
to orc and save it to the hive metastore
In Spark I would do that like this,
irisDf.write.format("orc")
.mode("overwrite")
.option("path", "s3://my_bucket/iris/")
.saveAsTable("my_database.iris")
In sparklyr
I can use the spark_write_table
function,
data("iris")
iris_spark <- copy_to(sc, iris, name = "iris")
output <- spark_write_table(
iris
,name = 'my_database.iris'
,mode = 'overwrite'
)
But this doesn't allow me to set path
or format
I can also use spark_write_orc
spark_write_orc(
iris
, path = "s3://my_bucket/iris/"
, mode = "overwrite"
)
but it doesn't have the saveAsTable
option
Now, I CAN use invoke
statements to replicate the Spark code,
sdf <- spark_dataframe(iris_spark)
writer <- invoke(sdf, "write")
writer %>%
invoke('format', 'orc') %>%
invoke('mode', 'overwrite') %>%
invoke('option','path', "s3://my_bucket/iris/") %>%
invoke('saveAsTable',"my_database.iris")
But I am wondering if there is anyway to instead pass the format
and path
options into spark_write_table
or the saveAsTable
option into spark_write_orc
?