create hive external table with schema in spark

Asked 27/7, 2016 at 16:51 Answered 16/8, 2016 at 8:22

Solved apache-spark hive apache-spark-sql spark-avro

I am using spark 1.6 and I aim to create external hive table like what I do in hive script. To do this, I first read in the partitioned avro file and get the schema of this file. Now I stopped here, I get no idea how to apply this schema to my creating table. I use scala. Need help guys.

Ashur answered 27/7, 2016 at 16:51 Comment(0)

finally, I make it myself with old-fashioned way. With the help of code below:

val rawSchema = sqlContext.read.avro("Path").schema
val schemaString = rawSchema.fields.map(field => field.name.replaceAll("""^_""", "").concat(" ").concat(field.dataType.typeName match {
        case "integer" => "int"
        case smt => smt
      })).mkString(",\n")

      val ddl =
      s"""
         |Create external table $tablename ($schemaString) \n
         |partitioned by (y int, m int, d int, hh int, mm int) \n
         |Stored As Avro \n
         |-- inputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' \n
         | -- outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' \n
         | Location 'hdfs://$path'
       """.stripMargin

take care no column name can start with _ and hive can't parse integer. I would like to say that this way is not flexible but work. if anyone get better idea, plz comment.

Ashur answered 16/8, 2016 at 8:22 Comment(0)

I didn't see a way to automatically infer schema for external tables. So I created case for the string type. You could add case for your data type. But I'm not sure how many columns you have. I apologize as this might not be a clean approach.

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{Row, SaveMode};
import org.apache.spark.sql.types.{StructType,StructField,StringType};

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val results = hiveContext.read.format("com.databricks.spark.avro").load("people.avro")


val schema = results.schema.map( x => x.name.concat(" ").concat( x.dataType.toString() match { case "StringType" => "STRING"} ) ).mkString(",")

val hive_sql = "CREATE EXTERNAL TABLE people_and_age (" + schema + ")                  ROW FORMAT DELIMITED                 FIELDS TERMINATED BY ','                LOCATION '/user/ravi/people_age'"

hiveContext.sql(hive_sql)
results.saveAsTable("people_age",SaveMode.Overwrite)
hiveContext.sql("select * from people_age").show()

Beaman answered 27/7, 2016 at 18:49 Comment(1)

hx for you help. But in this way the table will not be the partitioned one. any idea? – Ashur 28/7, 2016 at 16:7

-1

Try the below code.

val htctx= new HiveContext(sc)
htctx.sql(create extetnal table tablename schema partitioned by attribute row format serde serde.jar field terminated by value location path)

Faust answered 27/7, 2016 at 18:50 Comment(1)

can you say more about which one is variable, Thx – Ashur 28/7, 2016 at 16:6

Recommended topics

Hot tags