I have data in Avro format in HDFS in file paths like: /data/logs/[foldername]/[filename].avro
. I want to create a Hive table over all these log files, i.e. all files of the form /data/logs/*/*
. (They're all based on the same Avro schema.)
I'm running the below query with flag mapred.input.dir.recursive=true
:
CREATE EXTERNAL TABLE default.testtable
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'hdfs://.../data/*/*'
TBLPROPERTIES (
'avro.schema.url'='hdfs://.../schema.avsc')
The table ends up being empty unless I change LOCATION
to be less nested, i.e. to be 'hdfs://.../data/[foldername]/'
with a certain foldername. This worked no-problem with a less nested path for LOCATION
.
I'd like to be able to source data from all these different [foldername] folders. How do I make the recursive input selection go further in my nested directories?
hive.input.dir.recursive
?hive.supports.subdirectories
?It seems you have copied it from other (wrong) answers.I suggest doing some research and testing – Barometrograph