Trouble with Avro serialization of json documents missing fields
Asked Answered
S

1

7

I'm trying to use Apache Avro to enforce a schema on data exported from Elastic Search into a lot of Avro documents in HDFS (to be queried with Drill). I'm having some trouble with Avro defaults

Given this schema:

{    
  "namespace" : "avrotest",    
  "type" : "record",    
  "name" : "people",                                                                                                   
  "fields" : [                                                                                                         
    {"name" : "firstname", "type" : "string"},                                                                        
    {"name" : "age", "type" :"int", "default": -1}                                                                     
  ]                                                                                                                    
} 

I'd expect that a json document such as {"firstname" : "Jane"} would be serialized using the default value of -1 for the age field.

default: A default value for this field, used when reading instances that lack this field (optional).

However, this doesn't seem to happen

java -jar avro-tools-1.8.0.jar fromjson --schema-file p2.avsc jane.json > jane.avro

Exception in thread "main" org.apache.avro.AvroTypeException: Expected int. Got END_OBJECT
    at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
    at org.apache.avro.io.JsonDecoder.readInt(JsonDecoder.java:172)
    at org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
    at org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:511)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:182)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
    at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
    at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:99)
    at org.apache.avro.tool.Main.run(Main.java:87)
    at org.apache.avro.tool.Main.main(Main.java:76)

Is this possible, or am I missing something ?

Staple answered 3/3, 2016 at 16:50 Comment(3)
i have the same problemFulfill
Yeah, tell me about it :(Satirist
Looks like prior to this commit github.com/apache/avro/commit/… (issues.apache.org/jira/browse/AVRO-388) Avro GenericDatumReader was able to use default values for skipped fields, but no longer can do it.Fancied
S
0

The point is, if you declare your field in the schema like this:

{"name": "fieldName", "type": ["int", "null"], default: null }

It's not enough to use a field like optional, try declaring it like this:

{"name": "fieldName", "type": ["null", "int"], default: null }
Scaly answered 1/12, 2017 at 1:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.