How to pass multiple statements into Spark SQL HiveContext
Asked Answered
P

3

9

For example I have few Hive HQL statements which I want to pass into Spark SQL:

set parquet.compression=SNAPPY;
create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE;
select * from MY_TABLE limit 5;

Following doesn't work:

hiveContext.sql("set parquet.compression=SNAPPY; create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE; select * from MY_TABLE limit 5;")

How to pass the statements into Spark SQL?

Peaked answered 29/4, 2016 at 12:29 Comment(2)
Just like you would do in Java (or any other language that uses a database driver): execute each statement in order. That's what the Hive CLI does, actually, parsing its input into individual HQL statements -- plus doing some variable substitution and stuff like that.Darbies
@SamsonScharfrichter Thank you. I think this would be acceptable answer.Peaked
P
4

Thank you to @SamsonScharfrichter for the answer.

This will work:

hiveContext.sql("set spark.sql.parquet.compression.codec=SNAPPY")
hiveContext.sql("create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE")
val rs = hiveContext.sql("select * from MY_TABLE limit 5")

Please note that in this particular case instead of parquet.compression key we need to use spark.sql.parquet.compression.codec

Peaked answered 1/5, 2016 at 10:45 Comment(1)
Is it a bug from Spark SQL parser? Because this is possible with Hive already.Hanger
M
6

I worked on a scenario where i needed to read a sql file and run all the; separated queries present in that file.

One simple way to do it is like this:

val hsc = new org.apache.spark.sql.hive.HiveContext(sc)
val sql_file = "/hdfs/path/to/file.sql"
val f = sc.wholeTextFiles(s"$sql_file")
val queries = f.take(1)(0)._2
Predef.refArrayOps(queries.split(';')).map(query => hsc.sql(query))
Maffa answered 14/11, 2017 at 13:1 Comment(1)
Hello @kuldeep singh, please format your code, here a link to help you with that: stackoverflow.com/help/formattingCabinda
P
4

Thank you to @SamsonScharfrichter for the answer.

This will work:

hiveContext.sql("set spark.sql.parquet.compression.codec=SNAPPY")
hiveContext.sql("create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE")
val rs = hiveContext.sql("select * from MY_TABLE limit 5")

Please note that in this particular case instead of parquet.compression key we need to use spark.sql.parquet.compression.codec

Peaked answered 1/5, 2016 at 10:45 Comment(1)
Is it a bug from Spark SQL parser? Because this is possible with Hive already.Hanger
S
0

This function helped me get around Spark limitation:

def execute_query(spark, sql_script):
    sql_arr = sql_script.split(";")
    #print(sql_arr)
    for sql in sql_arr:
        if len(sql.strip()) > 0:
            print(sql)
            #spark.sql(sql.strip())
            print(spark.sql(sql))
Since answered 16/2, 2022 at 5:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.