How to merge schema while loading avro in spark dataframe?

About

Asked 30/12, 2015 at 10:51 Answered 14/10, 2020 at 12:18

I am trying to read avro files using https://github.com/databricks/spark-avro and the avro schema evolved over time. I read like this with mergeSchema option set to true hoping that it would merge schema itself but it didn't work.

sqlContext.read.format("com.databricks.spark.avro").option("mergeSchema", "true").load('s3://xxxx/d=2015-10-27/h=*/')

What is the work around ?

Equanimity answered 30/12, 2015 at 10:51 Comment(3)

I have the same problem. Could you resolve it? Is it a bug? Or could it be an unplemented feature? – Overlarge 27/10, 2016 at 8:12

How do you know "but it didn't work."? What's the error/exception? – Unwholesome 16/12, 2017 at 11:1

@Zer001, it doesnt work for me neither, did you found a solution for that? – Sawyor 3/1, 2019 at 8:30

Merging schema is not implemented for avro files in spark and there is no easy workaround. One solution would be to read your avro data file-by-file (or partition-by-partition) as separate data sets and then union those data sets. But that can be terribly slow.

Swamp answered 14/10, 2020 at 12:18 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags