Accessing nested fields in AVRO GenericRecord (Java/Scala)
Asked Answered
J

3

5

I have a GenericRecord with nested fields. When I use genericRecord.get(1) it returns an Object that contains the nested AVRO data.

I want to be able to access that object like genericRecord.get(1).get(0), but I can't because AVRO returns an Object.

Is there an easy way around this?

When I do something like returnedObject.get("item") it says item not a member of returnedObject.

Jephthah answered 1/3, 2016 at 17:19 Comment(1)
I know I'm able to access the schema for the nested type by doing something like: parsedSchema.getField("toplevel").schema Am I able to then use that to decode the nested Object that GenericRecord returns?Jephthah
J
7

I figured out one way to do it. Cast the returned Object as a GenericRecord.

Example (scala):

val data_nestedObj = (data.get("nestedObj")).asInstanceOf[GenericRecord]

Then I can access a nested field within that new GenericRecord by doing:

data_nestedObj.get("nestedField")

This works well enough for me.

Jephthah answered 1/3, 2016 at 18:21 Comment(3)
For nested arrays, you have to cast as GenericData.Array[GenericRecord]Jephthah
This doesn't work for me. It says it cannot cast utf8 to GenericRecordRaymund
@Raymund that means your value is a string and not an object, so it can't cast to the GenericRecordMarchelle
B
3

You could use an avro serialization library to help you. For example https://github.com/sksamuel/avro4s (I am the author) but there are others.

You just need to define a case class for the type of data you are getting, and this can include nested case classes. For example,

case class Boo(d: Boolean)
case class Foo(a: String, b: Int, c: Boo)

Then you create an instance of the RecordFormat typeclass.

val format = RecordFormat[Foo]

Then finally, you can use that to extract records or create records.

val record = format.to(someFoo)

or

val foo = format.from(someRecord)
Bullington answered 1/3, 2016 at 18:28 Comment(3)
I'd prefer to avoid using another library for something (seemingly) so simple. Thank you for posting the link and the explanation!Jephthah
If you want to keep deps down then sure, but Avro is a bit annoying to work with in this regard (lots of boilerplate to write converting from GenericRecords etc :(Bullington
I completely agree...it'd be nice if they added something like (or parts of) what you're working on.Jephthah
M
0

@rye's answer is correct and works fine, but if you can avoid the use of asInstanceOf then you should. So I wrote the following method to retrieve nested fields.

  /**
    * Get the value of the provided property. If the property contains `.` it assumes the property is nested and
    * parses the avroRecord with respective number of nested levels and retrieves the value at that level.
    */
  def getNestedProperty(property: String, avroRecord: GenericRecord): Option[Object] = {
    val tokens = property.split("\\.")

    tokens.foldLeft[Tuple2[GenericRecord, Option[Object]]]((avroRecord,None)){(tuple, token) =>
      tuple._1.get(token) match {
        case value: GenericRecord =>
          (value, tuple._2)
        case value @ (_:CharSequence | _:Number | _: ByteBuffer) =>
          (tuple._1, Option(value))
        case _ =>
          (tuple._1, None)
      }
    }._2
  }
Marchelle answered 30/4, 2019 at 16:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.