Json String to Java Object Avro
Asked Answered
H

6

25

I am trying to convert a Json string into a generic Java Object, with an Avro Schema.

Below is my code.

String json = "{\"foo\": 30.1, \"bar\": 60.2}";
String schemaLines = "{\"type\":\"record\",\"name\":\"FooBar\",\"namespace\":\"com.foo.bar\",\"fields\":[{\"name\":\"foo\",\"type\":[\"null\",\"double\"],\"default\":null},{\"name\":\"bar\",\"type\":[\"null\",\"double\"],\"default\":null}]}";

InputStream input = new ByteArrayInputStream(json.getBytes());
DataInputStream din = new DataInputStream(input);

Schema schema = Schema.parse(schemaLines);

Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);

DatumReader<Object> reader = new GenericDatumReader<Object>(schema);
Object datum = reader.read(null, decoder);

I get "org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_NUMBER_FLOAT" Exception.

The same code works, if I don't have unions in the schema. Can someone please explain and give me a solution.

Hump answered 19/12, 2014 at 4:4 Comment(2)
From avro.apache.org/docs/1.7.6/spec.html#json_encoding, I understand that Json encoding for unions is different, but I am trying figure out if there is any way, by which I can convert the json string to object.Hump
FYI, an overload of jsonDecoder() accepts a json String; there is no need to convert it into a Stream.Porty
C
1

Your schema does not match the schema of the json string. You need to have a different schema that does not have a union in the place of the error but a decimal number. Such schema should then be used as a writer schema while you can freely use the other one as the reader schema.

Chanachance answered 9/1, 2015 at 12:5 Comment(4)
Alternatively, tell Avro which one you're using, like this: String json = "{\"foo\":{\"double\":30.1},\"bar\":{\"double\":60.2}}";Borries
That would be the way avro would serialize the record with the given schema.Chanachance
Thanks Miljanm and Keegan. Yes, I understand that json encoding for unions is different from avro.apache.org/docs/1.7.6/spec.html#json_encoding. But I was looking for an open source library, that can internally change my json string to the avro specific schema then parse it. is something like that available?Hump
I'm not aware of any such tool. Why don't you want to make a different schema? It seems like much easier solution to this problem, and the schema would be compatible with the current one.Chanachance
P
21

For anyone who uses Avro - 1.8.2, JsonDecoder is not directly instantiable outside the package org.apache.avro.io now. You can use DecoderFactory for it as shown in the following code:

String schemaStr = "<some json schema>";
String genericRecordStr = "<some json record>";
Schema.Parser schemaParser = new Schema.Parser();
Schema schema = schemaParser.parse(schemaStr);
DecoderFactory decoderFactory = new DecoderFactory();
Decoder decoder = decoderFactory.jsonDecoder(schema, genericRecordStr);
DatumReader<GenericData.Record> reader =
            new GenericDatumReader<>(schema);
GenericRecord genericRecord = reader.read(null, decoder);
Pearlypearman answered 8/5, 2018 at 12:57 Comment(0)
C
14

Thanks to Reza. I found this webpage. It introduces how to convert a Json string into an avro object.

http://rezarahim.blogspot.com/2013/06/import-org_26.html

The key of his code is:

static byte[] fromJsonToAvro(String json, String schemastr) throws Exception {
  InputStream input = new ByteArrayInputStream(json.getBytes());
  DataInputStream din = new DataInputStream(input);

  Schema schema = Schema.parse(schemastr);

  Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);

  DatumReader<Object> reader = new GenericDatumReader<Object>(schema);
  Object datum = reader.read(null, decoder);

  GenericDatumWriter<Object>  w = new GenericDatumWriter<Object>(schema);
  ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

  Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);

  w.write(datum, e);
  e.flush();

  return outputStream.toByteArray();
}

String json = "{\"username\":\"miguno\",\"tweet\":\"Rock: Nerf paper, scissors is fine.\",\"timestamp\": 1366150681 }";

String schemastr ="{ \"type\" : \"record\", \"name\" : \"twitter_schema\", \"namespace\" : \"com.miguno.avro\", \"fields\" : [ { \"name\" : \"username\", \"type\" : \"string\", \"doc\"  : \"Name of the user account on Twitter.com\" }, { \"name\" : \"tweet\", \"type\" : \"string\", \"doc\"  : \"The content of the user's Twitter message\" }, { \"name\" : \"timestamp\", \"type\" : \"long\", \"doc\"  : \"Unix epoch time in seconds\" } ], \"doc:\" : \"A basic schema for storing Twitter messages\" }";

byte[] avroByteArray = fromJsonToAvro(json,schemastr);

Schema schema = Schema.parse(schemastr);
DatumReader<Genericrecord> reader1 = new GenericDatumReader<Genericrecord>(schema);

Decoder decoder1 = DecoderFactory.get().binaryDecoder(avroByteArray, null);
GenericRecord result = reader1.read(null, decoder1);
Canner answered 10/4, 2015 at 1:0 Comment(2)
This code won't solve the problem. This doesn't work when schema contains unions.Knuckle
Any solution for when my schema contains unions? I get Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_STRING...Sticker
B
9

With Avro 1.4.1, this works:

private static GenericData.Record parseJson(String json, String schema)
    throws IOException {
  Schema parsedSchema = Schema.parse(schema);
  Decoder decoder = new JsonDecoder(parsedSchema, json);

  DatumReader<GenericData.Record> reader =
      new GenericDatumReader<>(parsedSchema);
  return reader.read(null, decoder);
}

Might need some tweaks for later Avro versions.

Bellay answered 6/10, 2016 at 19:28 Comment(0)
M
2

Problem is not the code, but the wrong format of the json

String json = "{"foo": {"double": 30.1}, "bar": {"double": 60.2}}";

Militant answered 15/8, 2020 at 12:49 Comment(0)
M
2

As it was already mentioned here in the comments, JSON that is understood by AVRO libs is a bit different from a normal JSON object. Specifically, UNION type is wrapped into a nested object structure: "union_field": {"type": "value"}.

So if you want to convert "normal" JSON to AVRO you'll have to use 3rd-party library. For now at least.

Mannerly answered 2/2, 2021 at 10:12 Comment(0)
C
1

Your schema does not match the schema of the json string. You need to have a different schema that does not have a union in the place of the error but a decimal number. Such schema should then be used as a writer schema while you can freely use the other one as the reader schema.

Chanachance answered 9/1, 2015 at 12:5 Comment(4)
Alternatively, tell Avro which one you're using, like this: String json = "{\"foo\":{\"double\":30.1},\"bar\":{\"double\":60.2}}";Borries
That would be the way avro would serialize the record with the given schema.Chanachance
Thanks Miljanm and Keegan. Yes, I understand that json encoding for unions is different from avro.apache.org/docs/1.7.6/spec.html#json_encoding. But I was looking for an open source library, that can internally change my json string to the avro specific schema then parse it. is something like that available?Hump
I'm not aware of any such tool. Why don't you want to make a different schema? It seems like much easier solution to this problem, and the schema would be compatible with the current one.Chanachance

© 2022 - 2024 — McMap. All rights reserved.