How to merge schema while loading avro in spark dataframe?
Asked Answered
E

1

7

I am trying to read avro files using https://github.com/databricks/spark-avro and the avro schema evolved over time. I read like this with mergeSchema option set to true hoping that it would merge schema itself but it didn't work.

sqlContext.read.format("com.databricks.spark.avro").option("mergeSchema", "true").load('s3://xxxx/d=2015-10-27/h=*/')

What is the work around ?

Equanimity answered 30/12, 2015 at 10:51 Comment(3)
I have the same problem. Could you resolve it? Is it a bug? Or could it be an unplemented feature?Overlarge
How do you know "but it didn't work."? What's the error/exception?Unwholesome
@Zer001, it doesnt work for me neither, did you found a solution for that?Sawyor
S
0

Merging schema is not implemented for avro files in spark and there is no easy workaround. One solution would be to read your avro data file-by-file (or partition-by-partition) as separate data sets and then union those data sets. But that can be terribly slow.

Swamp answered 14/10, 2020 at 12:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.