sparklyr write data to hdfs or hive
Asked Answered
C

3

6

I tried using sparklyr to write data to hdfs or hive , but was unable to find a way . Is it even possible to write a R dataframe to hdfs or hive using sparklyr ? Please note , my R and hadoop are running on two different servers , thus I need a way to write to a remote hdfs from R .

Regards Rahul

Chitkara answered 27/6, 2017 at 21:58 Comment(1)
Have you tried to run Spark in yarn mode? This post might be helpful.Elurd
C
6

Writing Spark table to hive using Sparklyr:

iris_spark_table <- copy_to(sc, iris, overwrite = TRUE)
sdf_copy_to(sc, iris_spark_table)
DBI::dbGetQuery(sc, "create table iris_hive as SELECT * FROM iris_spark_table")
Cristycriswell answered 1/2, 2018 at 16:11 Comment(2)
thanks for sharing. this loads the data into hive's default database. do you know how to change the hive database for which to export the file?Pavilion
@Pavilion You can use syntax database.table in the SQL passed to DBI.Play
L
5

As of latest sparklyr you can use spark_write_table. pass in the format database.table_name to specify a database

iris_spark_table <- copy_to(sc, iris, overwrite = TRUE)
spark_write_table(
  iris_spark_table, 
  name = 'my_database.iris_hive ', 
  mode = 'overwrite'
)

Also see this SO post here where i got some input on more options

Limemann answered 16/8, 2018 at 23:24 Comment(0)
N
0

You can use sdf_copy_to to copy a dataframe into Spark, lets say tempTable. Then use DBI::dbGetQuery(sc, "INSERT INTO TABLE MyHiveTable SELECT * FROM tempTable") to insert the dataframe records in a hive table.

Nullifidian answered 5/9, 2017 at 13:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.